Insights

Google Cloud Dataprep: Speeding up data wrangling

To enable Crystalloids’ management to make wise decisions for our company, data needs to be collected and wrangled. Graphical data preparation without coding speeds up the process, and it enhances the data quality, which leads to more meaningful analysis and massive time savings. Google Cloud Dataprep speeds up the wrangling process. In just a few clicks, it provides us with only the data we need. Read our story here.

Richard Small



At Crystalloids, we face the same challenges as our customers when it comes to a data-driven approach to the decision-making process. That’s why we understand our customers’ pain. Our data enters the company through various systems. Combining this data makes it a lot more meaningful. It can help us make forecasts, budgets, and make predictions for the future, so we can make the right decisions. Combining data can be hard though. It can be a hassle and is time-consuming. In this article, you will learn how to combine only the data you need and make it ready for analysis.

How we used to handle data-driven decision-making

An integral view of the data coming from the various systems would mean downloading all the reports from the separate solutions and combining them in a spreadsheet. This works, but it’s time-consuming and cumbersome. In the last few years, our company grew fast. We started to reach the boundaries of what project administration, invoicing, and bookkeeping modules can accommodate for us. When it comes to reporting, we have seen a lack of flexibility.

Difficulties to get to the right data to set meaningful KPIs

In our company, we use KPIs to manage our business, but it became difficult to attribute revenue to the KPI’s or to implement the KPI’s correctly in Exact Online. In order to create an interesting KPI, we had to add new classifications and copy and paste ourselves towards the desired outcome. Of course, this wasn’t an ideal situation. We thought of different solutions. Should we automate further with a combination of macros and an app script in Google Sheets? Or instead, should we use a combination of Google Sheets and Python? That would mean there was programming involved. Since time is a big constraint in every company, we asked ourselves: “is there a way to do this faster, preferably without code?”

A more detailed view of the problem

Let me illustrate this with an example. As a consultancy company, it is very important to know the occupancy of our people working on projects and in teams to serve our customers. In the standard Exact Online overview, you can see this particular employee has a percentage above 100%. This is caused by an “out of office” service task for this employee. This should be excluded from his occupancy calculation in projects. In Exact, there is no way to book these hours. Not through the system, nor is it possible to exclude this from the reports.  

ExactOnlinedata

So far, we took the data to a Google Sheet and pushed the data into the desired outcome by deleting, merging, classification, aggregation, and pivoting. For every quarterly meeting, a new document was created, creating a monthly or weekly report would be too big an effort. So far, we extracted the data manually, but we wondered if there would be a solution to fetch the data exactly as we need it and as quickly as possible, so we could use meaningful data whenever we needed it? There is! It’s called data wrangling.

Data wrangling, what’s that?

If data is incomplete or unreliable, then analyses will be too. This means that the value of any insight will be lost. Data wrangling seeks to remove that risk by ensuring the data is in a reliable state before it is analyzed. Data wrangling refers to a variety of processes designed to transform raw data into ready-to-use data formats. The process depends on the data you want to leverage and the goal you are trying to achieve. A great tool that can assist us with data wrangling is Google Cloud Dataprep.

Google Cloud Dataprep

Google Cloud Dataprep is one of Google’s data services for cleaning and preparing structured and unstructured data for reporting, machine learning, and analysis. Dataprep interprets the data transformation through a proprietary inference algorithm. A ranked set of suggestions and patterns will be returned and displayed in a graphical user interface. With a click on the mouse we can apply aggregation, joins, unions, pivot and unpivot, regular expressions, extraction, merge, and much more to the data. Let’s see how we can use Google Cloud Dataprep is a solution.

Crystalloids

In order to create this overview, three extracts from the Exact Online system are needed. 

  • The list of all current active employees. 
  • Overview of worked hours per project.
  • Overview of all available hours per month and absence hours per month.

To calculate the % Nett Billable, we will need the following formula: 

Accountability = IF(Schedule<Final;Final/Schedule;1-(Out of office)/Schedule)

Cloud Data Prep

In the image above you see the Dataprep UI. With only two lines of codes that are generated by the graphical user interface, the first steps to more meaningful data are made. We can easily expand the same file with more data or repeat the same query. A perfect example of the beauty of automation. We can get to exactly that data we needed in just a few clicks, without any coding!

The image below shows how Dataprep simplifies the data lifecycle and smoothens the way to more meaningful data.

Dataprep Sketchnote

Conclusion

We have seen that combining data into one meaningful dataset is time-consuming. It involves programming or lots of manual actions, and it takes the attention away from what is really important, to analyze the data in such a way that the right business decision can be made on the basis of the right data. Data preparation without coding speeds up the process, and it increases the quality of the data, which leads to more meaningful analysis. 

Google Cloud Dataprep makes this process easier. In just a few clicks, it provides us with only the data we need. Are you struggling with your data? At Crystalloids, we have lots of experience with Google Cloud Dataprep. Book an online appointment for a consultation.

Contact Us


ABOUT CRYSTALLOIDS

Crystalloids help companies improve their customer experiences and build marketing technology. Founded in 2006 in the Netherlands, Crystalloids builds crystal-clear solutions that turn customer data into information and knowledge into wisdom. As a leading Google Cloud Partner, Crystalloids combines experience in software development, data science, and marketing, making them one of a kind IT company. Using the Agile approach Crystalloids ensures that use cases show immediate value to their clients and frees their time to focus on decision making and less on programming.