Share this
Google Cloud Dataprep: Speeding up data wrangling
by Richard Verhoeff on Jul 13, 2021 3:34:44 PM
To enable Crystalloids’ management to make wise decisions for our company, data needs to be collected and wrangled. Graphical data preparation without coding speeds up the process, and it enhances the data quality, which leads to more meaningful analysis and massive time savings. Google Cloud Dataprep speeds up the wrangling process. In just a few clicks, it provides us with only the data we need. Read our story here.
At Crystalloids, we face the same challenges as our customers when it comes to a data-driven approach to the decision-making process. That’s why we understand our customers’ pain. Our data enters the company through various systems. Combining this data makes it a lot more meaningful. It can help us make forecasts, budgets, and make predictions for the future, so we can make the right decisions. Combining data can be hard though. It can be a hassle and is time-consuming. In this article, you will learn how to combine only the data you need and make it ready for analysis.
How we used to handle data-driven decision-making
An integral view of the data coming from the various systems would mean downloading all the reports from the separate solutions and combining them in a spreadsheet. This works, but it’s time-consuming and cumbersome. In the last few years, our company grew fast. We started to reach the boundaries of what project administration, invoicing, and bookkeeping modules can accommodate for us. When it comes to reporting, we have seen a lack of flexibility.
Difficulties to get to the right data to set meaningful KPIs
In our company, we use KPIs to manage our business, but it became difficult to attribute revenue to the KPI’s or to implement the KPI’s correctly in Exact Online. In order to create an interesting KPI, we had to add new classifications and copy and paste ourselves towards the desired outcome. Of course, this wasn’t an ideal situation. We thought of different solutions. Should we automate further with a combination of macros and an app script in Google Sheets? Or instead, should we use a combination of Google Sheets and Python? That would mean there was programming involved. Since time is a big constraint in every company, we asked ourselves: “is there a way to do this faster, preferably without code?”
A more detailed view of the problem
Let me illustrate this with an example. As a consultancy company, it is very important to know the occupancy of our people working on projects and in teams to serve our customers. In the standard Exact Online overview, you can see this particular employee has a percentage above 100%. This is caused by an “out of office” service task for this employee. This should be excluded from his occupancy calculation in projects. In Exact, there is no way to book these hours. Not through the system, nor is it possible to exclude this from the reports.
So far, we took the data to a Google Sheet and pushed the data into the desired outcome by deleting, merging, classification, aggregation, and pivoting. For every quarterly meeting, a new document was created, creating a monthly or weekly report would be too big an effort. So far, we extracted the data manually, but we wondered if there would be a solution to fetch the data exactly as we need it and as quickly as possible, so we could use meaningful data whenever we needed it? There is! It’s called data wrangling.
Data wrangling, what’s that?
If data is incomplete or unreliable, then analyses will be too. This means that the value of any insight will be lost. Data wrangling seeks to remove that risk by ensuring the data is in a reliable state before it is analyzed. Data wrangling refers to a variety of processes designed to transform raw data into ready-to-use data formats. The process depends on the data you want to leverage and the goal you are trying to achieve. A great tool that can assist us with data wrangling is Google Cloud Dataprep.
Google Cloud Dataprep
Google Cloud Dataprep is one of Google’s data services for cleaning and preparing structured and unstructured data for reporting, machine learning, and analysis. Dataprep interprets the data transformation through a proprietary inference algorithm. A ranked set of suggestions and patterns will be returned and displayed in a graphical user interface. With a click on the mouse we can apply aggregation, joins, unions, pivot and unpivot, regular expressions, extraction, merge, and much more to the data. Let’s see how we can use Google Cloud Dataprep is a solution.
In order to create this overview, three extracts from the Exact Online system are needed.
- The list of all current active employees.
- Overview of worked hours per project.
- Overview of all available hours per month and absence hours per month.
To calculate the % Nett Billable, we will need the following formula:
Accountability = IF(Schedule<Final;Final/Schedule;1-(Out of office)/Schedule)
In the image above you see the Dataprep UI. With only two lines of codes that are generated by the graphical user interface, the first steps to more meaningful data are made. We can easily expand the same file with more data or repeat the same query. A perfect example of the beauty of automation. We can get to exactly that data we needed in just a few clicks, without any coding!
The image below shows how Dataprep simplifies the data lifecycle and smoothens the way to more meaningful data.
Conclusion
We have seen that combining data into one meaningful dataset is time-consuming. It involves programming or lots of manual actions, and it takes the attention away from what is really important, to analyze the data in such a way that the right business decision can be made on the basis of the right data. Data preparation without coding speeds up the process, and it increases the quality of the data, which leads to more meaningful analysis.
Google Cloud Dataprep makes this process easier. In just a few clicks, it provides us with only the data we need. Are you struggling with your data? At Crystalloids, we have lots of experience with Google Cloud Dataprep. Book an online appointment for a consultation.
ABOUT CRYSTALLOIDS
Crystalloids help companies improve their customer experiences and build marketing technology. Founded in 2006 in the Netherlands, Crystalloids builds crystal-clear solutions that turn customer data into information and knowledge into wisdom. As a leading Google Cloud Partner, Crystalloids combines experience in software development, data science, and marketing, making them one of a kind IT company. Using the Agile approach Crystalloids ensures that use cases show immediate value to their clients and frees their time to focus on decision making and less on programming.
Share this
- December 2024 (1)
- November 2024 (5)
- October 2024 (2)
- September 2024 (1)
- August 2024 (1)
- July 2024 (4)
- June 2024 (2)
- May 2024 (1)
- April 2024 (4)
- March 2024 (2)
- February 2024 (2)
- January 2024 (4)
- December 2023 (1)
- November 2023 (4)
- October 2023 (4)
- September 2023 (4)
- June 2023 (2)
- May 2023 (2)
- April 2023 (1)
- March 2023 (1)
- January 2023 (4)
- December 2022 (3)
- November 2022 (5)
- October 2022 (3)
- July 2022 (1)
- May 2022 (2)
- April 2022 (2)
- March 2022 (5)
- February 2022 (3)
- January 2022 (5)
- December 2021 (5)
- November 2021 (4)
- October 2021 (2)
- September 2021 (2)
- August 2021 (3)
- July 2021 (4)
- May 2021 (2)
- April 2021 (2)
- February 2021 (2)
- January 2021 (1)
- December 2020 (1)
- October 2020 (2)
- September 2020 (1)
- August 2020 (2)
- July 2020 (2)
- June 2020 (1)
- March 2020 (2)
- February 2020 (1)
- January 2020 (1)
- December 2019 (1)
- November 2019 (3)
- October 2019 (2)
- September 2019 (3)
- August 2019 (2)
- July 2019 (3)
- June 2019 (5)
- May 2019 (2)
- April 2019 (4)
- March 2019 (2)
- February 2019 (2)
- January 2019 (4)
- December 2018 (2)
- November 2018 (2)
- October 2018 (1)
- September 2018 (2)
- August 2018 (3)
- July 2018 (3)
- May 2018 (2)
- April 2018 (4)
- March 2018 (5)
- February 2018 (2)
- January 2018 (3)
- November 2017 (2)
- October 2017 (2)