Share this
Data Mesh: When To Adopt, What It Offers, And How To Implement It
by Jan Hendrik Fleury & Veronika Schipper on Dec 22, 2021 1:08:02 PM
Data mesh is hot in the world of data platforms. It's a big deal because it helps solve an old problem: making big data systems grow smoothly with your organization. This means data mesh could really help your team do better.
In this post, I'll talk about when your organization might need a data mesh architecture. You'll learn about the benefits of data mesh and what it brings to the table. I'll also touch on what it takes to start using it. It's all about understanding the good stuff and the challenges, so you can make smart choices about your data architecture. Stick around to see if principles of data mesh are right for you!
What Is Data Mesh?
Data mesh is an approach to data architecture where distributed data products are created and managed by skilled data engineers and dedicated data product owners within domain-specific teams.
This system relies on a shared data infrastructure to host, prepare, and provide access to existing data. As centralized data teams often find themselves hitting limits, data mesh has emerged as a significant trend in the data platform world. But how did we get here?
To understand this, let's look at the main features of a data platform, or a data lake:
- It's a scalable cloud service with separate storage and computing power.
- It allows direct and interactive work with data.
- All architectural components support native data and their interactions.
- It provides tools for both Analytics and AI.
- It includes unified data management for effective data governance.
Unified central data platforms, like a data lake, are incredibly valuable. They drive data and digital transformation significantly. Rituals, for example, uses data platform to improve data access and analysis capabilities to improve their business performance.
However, in complex or international companies, functional scalability challenges can arise. This is where the benefits of data mesh become particularly relevant, offering a more flexible and decentralized approach to managing and governing data.
lex or international companies, functional scalability challenges arise.
Why Use Data Mesh?
In certain scenarios, particularly within complex or international organizations, data platforms, including data warehouses, confront scalability challenges. Picture this as visiting a library to collect books for research, but facing issues like unavailable books, no catalog, uncertain authorship, and difficulties in accessing specific information.
Data mesh provides solutions to these functional and technical challenges in data management:
- Newly acquired books (new data) are promptly added to the system.
- A well-organized catalog helps guide you to the right books (data pipeline), ensuring efficient retrieval.
- Access to all books (data) simplifies the combination and analysis of information.
- Clear authorship and standardized definitions enhance the quality and understanding of data.
- Expert librarians (data management professionals) are available to assist in navigating through complex data landscapes.
- A separate section for private information (sensitive data) with clear access instructions, ensuring data governance and security.
This approach not only streamlines the process of accessing and using data but also transforms the data warehouse into an efficient, analytical data platform, making it easier for users to derive valuable insights.
Differences Between Data Lakehouse and Data Mesh
The organizational problem for knowledge and staffing has not yet been solved with a data lakehouse. With a larger data platform, a larger central data team is still needed with centrally collected knowledge of data engineering: scale-up.
This is why IT environments at enterprises often create vertical splits, with data engineers and data analysts working in different teams. The disadvantage of this split is that different teams are needed for each data product.
Data ownership
Data ownership is a crucial aspect of implementing data mesh, which greatly affects how you manage data within an organization:
Efficiency:
- It becomes easier to identify the owner of a domain data set if changes or issues arise, enhancing the speed and accuracy of managing data.
- Engaging only the relevant stakeholders for a particular data domain minimizes confusion and streamlines decision-making.
- Users can find things faster and have a clear historical trail to follow, which is particularly beneficial for data consumers who rely on accuracy and speed.
Transparency:
- Clear visibility into the origins of domain data ensures that all data consumers understand its source and context.
- Transparency in the decision-making process regarding datasets fosters trust and collaboration among different domains.
Upkeep:
- With well-defined data ownership, less time is spent on documentation in the future, as the responsible parties are clear from the outset.
In contrast to a data lakehouse, which I've previously discussed in more detail, both a data fabric and a data mesh provide architectures to access data across various technologies and platforms. However, while a data fabric is technology-centric, a data mesh emphasizes organizational change, focusing on domain data and how to manage, maintain, and make it accessible for all data consumers effectively.
A data fabric and a data mesh both provide an architecture to access data across multiple technologies and platforms, but a data fabric is technology-centric, while a data mesh focuses on organizational change.
The Benefits of a Data Mesh
Implementing a data mesh can significantly transform how organizations handle their data, offering an array of advantages while also solving complex organizational problems:
1. Controlled by Many Teams:
When you implement a data mesh, various teams within the company manage their own multiple data products. This decentralization speeds up processes and makes problem-solving more efficient. Each domain team operates within its area of expertise, leading to more tailored and effective solutions.
Domain teams know better than anyone else the definitions of products or customers and they can also shape these entities. With the right standards, tools, and knowledge, domain teams are able to supply data products themselves and offer them centrally.
-
-
- The domain team manages the data quality and can monitor and improve it well;
- The domain team knows the right definitions and can apply and share them well;
- The domain team knows the data users and can serve them well and unburden them.
-
2. Data Treated as Important Product
In a data mesh approach, data is viewed as a valuable product needing regular maintenance and updates. In many organizations, establishing a “single source of truth” or “authoritative data source” is challenging due to the repeated extraction and transformation of data across the organization without clear ownership responsibilities over the newly created data.
In the data mesh, the authoritative data source is the Data Product published by the source domain, with a clearly assigned Data Owner and Steward who is responsible for that data.
3. Grows with Your Company
As your company expands, so does the data mesh, adapting to increased demands without the common slowdowns of a centralized data platform. This scalability is a significant advantage, allowing organizations to grow their data architecture in line with their overall growth.
4. Easy to Find and Use Data
Data Mesh makes it easier to find and use data by organizing and explaining it well. Domain teams manage the quality of multiple data products, ensuring they're easy to monitor, improve, and utilize. This improved accessibility is crucial for data consumers who rely on accurate and timely information.
5. Better Teamwork and New Ideas
Easy access to and understanding of data across domains lead to enhanced collaboration and innovation. Data scientists and domain experts, familiar with their data users, can effectively meet their needs and foster an environment where new ideas are encouraged and developed.
6. Quick to Change
Domain teams can swiftly make changes and updates relevant to their specific areas. This agility helps the entire company adapt quickly to new opportunities and challenges. They know the right definitions, can apply and share them effectively, and are well-equipped to manage and adjust their data products, including real-time data.
7. Rules and Safety
Implementing a data mesh ensures data is used safely and in compliance with regulations by clearly defining responsibilities. It addresses the challenge of establishing a "single source of truth" by designating authoritative data sources with assigned Data Owners and Stewards who are accountable for that data. This clarity helps prevent the issues that often arise with a centralized data platform.
8. Less Work for the Main IT Team
By empowering domain teams to manage their data, the central IT team's workload is reduced. This allows them to concentrate on enhancing the overall data infrastructure and capabilities, thus enabling data to be more effectively used across the organization.
The challenges of a Data Mesh
The challenges associated with adopting a data model are worth considering. It's essential to address critical questions before implementing a data mesh approach:
- How (de)centralized is my organization set up?
- What is the size of my organization?
Implementation of data mesh only makes sense if the benefits of decentralization outweigh the investment in setting up the platform and standards. That is why data mesh is a suitable solution for (especially) organizations with multiple divisions and/or an international character.
-
Data Need Assessment: Understand your organization's data needs thoroughly. Determine which domains require a more self-serve data platform and assess the extent to which decentralized data management is necessary.
-
Organizational Structure: Examine how (de)centralized your organization is structured. A successful data mesh implementation relies on alignment with your organization's existing structure and culture.
-
Organization Size: Consider the size of your organization. Data mesh is particularly suitable for larger organizations with multiple divisions and international operations. Smaller organizations may not benefit as significantly from this approach.
-
Operational and Analytical Data: Distinguish between operational and analytical data needs within your organization. Assess how data mesh can effectively cater to both types of data requirements.
The new role of IT teams
The adoption of a data mesh principles also necessitates a new role for IT teams, one that combines support and control functions. These IT teams play a crucial role in ensuring the successful implementation and operation of the data mesh model.
Supportive Role:
In the supportive capacity, IT teams are responsible for assisting domain teams in various ways:
-
Data as a Product: They help establish standards for the accessible description of data products, ensuring that data is treated as a valuable product within the organization.
-
Modern Tooling: IT teams provide support for modern tools and technologies, enabling domain teams to work efficiently and effectively.
-
Data Transformation Standards: They promote and facilitate understandable data transformation standards, ensuring that data is processed consistently and accurately.
Control Function:
To maintain control in an environment with multiple independent domain teams, IT teams adopt a control-oriented role:
-
Standardization: They establish standards to prevent the proliferation of code and data descriptions. These standards help maintain consistency and quality across the organization.
-
Policy Enforcement: IT teams enforce well-defined policies when managing domain teams. These policies ensure that code and documentation adhere to standards regarding naming conventions, structure, and tagging. This rigorous policy enforcement guarantees data quality and consistency throughout the organization.
This new role for IT teams is crucial in ensuring that the data mesh model operates effectively and efficiently. It strikes a balance between providing support to domain teams and maintaining control and standardization across the enterprise data landscape, including the management of operational data within the central data lake.
Enhance data literacy in the business domains
To fully harness the potential of a data platform, an organization requires individuals who are data-fluent. According to Gartner's definition, data literacy encompasses "the ability to read, write, and communicate data in context, including an understanding of data sources and constructs, analytical methods and techniques applied, and the ability to describe the use case, application, and resulting value."
Data-fluent employees possess the following capabilities:
-
Data-Driven Thinking: They can engage in critical thinking and analysis using data as a foundation, enabling them to draw informed conclusions.
-
Informed Decision-Making: Data-fluent individuals rely on data to make decisions, prioritizing data-backed insights over experiences or intuition.
-
Data-Enabled Innovation: They leverage data to communicate ideas effectively and contribute to the creation of new products, business models, workflows, and strategies.
-
Understanding Data Visualizations: Data-fluent employees are proficient in understanding and interpreting data visualizations, ensuring that insights are effectively communicated.
Without data fluency, data assets within an organization may not yield their full potential value. It becomes essential to support, train, and coach business domains to develop data fluency. In an upcoming blog post, we will delve deeper into the organizational aspects of the data mesh framework, exploring how to foster a data-fluent culture that maximizes the benefits of data assets.
Conclusion
Building a data mesh is not just a cloud service that you switch on or off. It is a combination of a good approach with the right tools.
You can simultaneously use a data mesh and a data fabric, and even a data hub. First, they are concepts, not things. A data hub as an architectural concept is different from a data hub as a database. Second, they are components, not alternatives. It is practical for architecture to include both data fabric and data mesh. They are not mutually exclusive.
Finally, they are architectural frameworks, not architectures. You don’t have architecture until the frameworks are adapted and customized to your needs, your data, your processes, and your terminology.
Both data meshes and data fabrics have a seat at the data table. In the search for architectural concepts and architectures to support data projects, it all comes down to finding what works best for your own specific needs. Crystalloids is ready to guide you.
Share this
- November 2024 (3)
- October 2024 (2)
- September 2024 (1)
- August 2024 (1)
- July 2024 (4)
- June 2024 (2)
- May 2024 (1)
- April 2024 (4)
- March 2024 (2)
- February 2024 (2)
- January 2024 (4)
- December 2023 (1)
- November 2023 (4)
- October 2023 (4)
- September 2023 (4)
- June 2023 (2)
- May 2023 (2)
- April 2023 (1)
- March 2023 (1)
- January 2023 (4)
- December 2022 (3)
- November 2022 (5)
- October 2022 (3)
- July 2022 (1)
- May 2022 (2)
- April 2022 (2)
- March 2022 (5)
- February 2022 (3)
- January 2022 (5)
- December 2021 (5)
- November 2021 (4)
- October 2021 (2)
- September 2021 (2)
- August 2021 (3)
- July 2021 (4)
- May 2021 (2)
- April 2021 (2)
- February 2021 (2)
- January 2021 (1)
- December 2020 (1)
- October 2020 (2)
- September 2020 (1)
- August 2020 (2)
- July 2020 (2)
- June 2020 (1)
- March 2020 (2)
- February 2020 (1)
- January 2020 (1)
- December 2019 (1)
- November 2019 (3)
- October 2019 (2)
- September 2019 (3)
- August 2019 (2)
- July 2019 (3)
- June 2019 (5)
- May 2019 (2)
- April 2019 (4)
- March 2019 (2)
- February 2019 (2)
- January 2019 (4)
- December 2018 (2)
- November 2018 (2)
- October 2018 (1)
- September 2018 (2)
- August 2018 (3)
- July 2018 (3)
- May 2018 (2)
- April 2018 (4)
- March 2018 (5)
- February 2018 (2)
- January 2018 (3)
- November 2017 (2)
- October 2017 (2)