Data mesh is hot in the world of data platforms. It's a big deal because it helps solve an old problem: making big data systems grow smoothly with your organization. This means data mesh could really help your team do better.
In this post, I'll talk about when your organization might need a data mesh architecture. You'll learn about the benefits of data mesh and what it brings to the table. I'll also touch on what it takes to start using it. It's all about understanding the good stuff and the challenges, so you can make smart choices about your data architecture. Stick around to see if principles of data mesh are right for you!
Data mesh is an approach to data architecture where distributed data products are created and managed by skilled data engineers and dedicated data product owners within domain-specific teams.
This system relies on a shared data infrastructure to host, prepare, and provide access to existing data. As centralized data teams often find themselves hitting limits, data mesh has emerged as a significant trend in the data platform world. But how did we get here?
To understand this, let's look at the main features of a data platform, or a data lake:
Unified central data platforms, like a data lake, are incredibly valuable. They drive data and digital transformation significantly. Rituals, for example, uses data platform to improve data access and analysis capabilities to improve their business performance.
However, in complex or international companies, functional scalability challenges can arise. This is where the benefits of data mesh become particularly relevant, offering a more flexible and decentralized approach to managing and governing data.
lex or international companies, functional scalability challenges arise.
In certain scenarios, particularly within complex or international organizations, data platforms, including data warehouses, confront scalability challenges. Picture this as visiting a library to collect books for research, but facing issues like unavailable books, no catalog, uncertain authorship, and difficulties in accessing specific information.
Data mesh provides solutions to these functional and technical challenges in data management:
This approach not only streamlines the process of accessing and using data but also transforms the data warehouse into an efficient, analytical data platform, making it easier for users to derive valuable insights.
The organizational problem for knowledge and staffing has not yet been solved with a data lakehouse. With a larger data platform, a larger central data team is still needed with centrally collected knowledge of data engineering: scale-up.
This is why IT environments at enterprises often create vertical splits, with data engineers and data analysts working in different teams. The disadvantage of this split is that different teams are needed for each data product.
Data ownership is a crucial aspect of implementing data mesh, which greatly affects how you manage data within an organization:
In contrast to a data lakehouse, which I've previously discussed in more detail, both a data fabric and a data mesh provide architectures to access data across various technologies and platforms. However, while a data fabric is technology-centric, a data mesh emphasizes organizational change, focusing on domain data and how to manage, maintain, and make it accessible for all data consumers effectively.
A data fabric and a data mesh both provide an architecture to access data across multiple technologies and platforms, but a data fabric is technology-centric, while a data mesh focuses on organizational change.
Implementing a data mesh can significantly transform how organizations handle their data, offering an array of advantages while also solving complex organizational problems:
When you implement a data mesh, various teams within the company manage their own multiple data products. This decentralization speeds up processes and makes problem-solving more efficient. Each domain team operates within its area of expertise, leading to more tailored and effective solutions.
Domain teams know better than anyone else the definitions of products or customers and they can also shape these entities. With the right standards, tools, and knowledge, domain teams are able to supply data products themselves and offer them centrally.
In a data mesh approach, data is viewed as a valuable product needing regular maintenance and updates. In many organizations, establishing a “single source of truth” or “authoritative data source” is challenging due to the repeated extraction and transformation of data across the organization without clear ownership responsibilities over the newly created data.
In the data mesh, the authoritative data source is the Data Product published by the source domain, with a clearly assigned Data Owner and Steward who is responsible for that data.
As your company expands, so does the data mesh, adapting to increased demands without the common slowdowns of a centralized data platform. This scalability is a significant advantage, allowing organizations to grow their data architecture in line with their overall growth.
Data Mesh makes it easier to find and use data by organizing and explaining it well. Domain teams manage the quality of multiple data products, ensuring they're easy to monitor, improve, and utilize. This improved accessibility is crucial for data consumers who rely on accurate and timely information.
Easy access to and understanding of data across domains lead to enhanced collaboration and innovation. Data scientists and domain experts, familiar with their data users, can effectively meet their needs and foster an environment where new ideas are encouraged and developed.
Domain teams can swiftly make changes and updates relevant to their specific areas. This agility helps the entire company adapt quickly to new opportunities and challenges. They know the right definitions, can apply and share them effectively, and are well-equipped to manage and adjust their data products, including real-time data.
Implementing a data mesh ensures data is used safely and in compliance with regulations by clearly defining responsibilities. It addresses the challenge of establishing a "single source of truth" by designating authoritative data sources with assigned Data Owners and Stewards who are accountable for that data. This clarity helps prevent the issues that often arise with a centralized data platform.
By empowering domain teams to manage their data, the central IT team's workload is reduced. This allows them to concentrate on enhancing the overall data infrastructure and capabilities, thus enabling data to be more effectively used across the organization.
The challenges associated with adopting a data model are worth considering. It's essential to address critical questions before implementing a data mesh approach:
Implementation of data mesh only makes sense if the benefits of decentralization outweigh the investment in setting up the platform and standards. That is why data mesh is a suitable solution for (especially) organizations with multiple divisions and/or an international character.
Data Need Assessment: Understand your organization's data needs thoroughly. Determine which domains require a more self-serve data platform and assess the extent to which decentralized data management is necessary.
Organizational Structure: Examine how (de)centralized your organization is structured. A successful data mesh implementation relies on alignment with your organization's existing structure and culture.
Organization Size: Consider the size of your organization. Data mesh is particularly suitable for larger organizations with multiple divisions and international operations. Smaller organizations may not benefit as significantly from this approach.
Operational and Analytical Data: Distinguish between operational and analytical data needs within your organization. Assess how data mesh can effectively cater to both types of data requirements.
To fully harness the potential of a data platform, an organization requires individuals who are data-fluent. According to Gartner's definition, data literacy encompasses "the ability to read, write, and communicate data in context, including an understanding of data sources and constructs, analytical methods and techniques applied, and the ability to describe the use case, application, and resulting value."
Data-fluent employees possess the following capabilities:
Data-Driven Thinking: They can engage in critical thinking and analysis using data as a foundation, enabling them to draw informed conclusions.
Informed Decision-Making: Data-fluent individuals rely on data to make decisions, prioritizing data-backed insights over experiences or intuition.
Data-Enabled Innovation: They leverage data to communicate ideas effectively and contribute to the creation of new products, business models, workflows, and strategies.
Understanding Data Visualizations: Data-fluent employees are proficient in understanding and interpreting data visualizations, ensuring that insights are effectively communicated.
Without data fluency, data assets within an organization may not yield their full potential value. It becomes essential to support, train, and coach business domains to develop data fluency. In an upcoming blog post, we will delve deeper into the organizational aspects of the data mesh framework, exploring how to foster a data-fluent culture that maximizes the benefits of data assets.
Building a data mesh is not just a cloud service that you switch on or off. It is a combination of a good approach with the right tools.
You can simultaneously use a data mesh and a data fabric, and even a data hub. First, they are concepts, not things. A data hub as an architectural concept is different from a data hub as a database. Second, they are components, not alternatives. It is practical for architecture to include both data fabric and data mesh. They are not mutually exclusive.
Finally, they are architectural frameworks, not architectures. You don’t have architecture until the frameworks are adapted and customized to your needs, your data, your processes, and your terminology.
Both data meshes and data fabrics have a seat at the data table. In the search for architectural concepts and architectures to support data projects, it all comes down to finding what works best for your own specific needs. Crystalloids is ready to guide you.