In our first article about quality assurance, we talked about the importance of consistent, valid and complete data. As data is the core requirement for enabling digital business, relying on the quality of data to evaluate the state of your business and make informed decisions is critical. The second part of the quality assurance series focuses on describing how the monitoring dataflow works for data stored in Google BigQuery.
As a part of quality assurance monitoring in Google Cloud, you can automatically get alerted about anomalies or problems. That way, you can immediately find out if one of your critical data processes goes down and quickly take action.
The process that helps us assure the quality of data is the monitoring dataflow. It tells us whether some information is missing, incomplete or whether some processes could not be executed as expected. As part of the monitoring process, we can check whether a table contains a number from a given range (other than null) or exceeds a certain value. Also, whether the email address is valid, unique or whether the table is updated every day.
Data is the element you built your business credibility on. Neglecting the quality of your data and processes can have a significant impact on the efficiency and performance of your business.
The monitoring dataflow performs pre-configured checks on BigQuery tables in repeated intervals. Check results are stored and reported with the frequency configured (checks not passed, and checks passed during the last execution). There are three steps in the dataflow:
The first step is to read the configuration table to get all the active checks. The status of the checks can be easily set to “active” or “inactive” based on what needs to happen. For every check we must validate the frequency, it means not all checks need to be executed all times. The frequency setting can be controlled in the field for each one of them.
For the checks that apply, in step two, the rule is validated and a table in BigQuery stores the results. This last action is the final step of monitoring dataflow; after that, a new scheduled procedure is executed. All the results are compiled in a report and emailed to selected recipients from the business. It can look like this:
We have created the colour coding to identify the status of the check results quickly. But the report can be easily configured to include more than that. Each customer can suggest an action to be put in the report based on their preference and needs, such as what action to take when an error is detected.
The result value shows relevant information about the result of the check and the query that was executed to validate it. Once we have this information, the query can be copied into a console to check what went wrong easily. The corrective actions can be taken internally by the business who receives the email or resolved by the development team.
Anomalies or downtimes can not only negatively affect your business bottom line but can also hurt your reputation. Crystalloids provides quality assurance to help you find errors early before they affect your business.
Crystalloids helps companies improve their customer experiences and build marketing technology. Founded in 2006 in the Netherlands, Crystalloids builds crystal-clear solutions that turn customer data into information and knowledge into wisdom. As a leading Google Cloud Partner, Crystalloids combines experience in software development, data science, and marketing, making them one of a kind IT company. Using the Agile approach Crystalloids ensures that use cases show immediate value to their clients and frees their time to focus on decision making and less on programming.