A Step-by-Step Guide to Building an Effective Data Quality Strategy from Scratch | by David Rubio | Aug, 2023


All the actions and goals defined in your data quality strategy need to be actively monitored. Utilising monitoring tools that can build alerts and communicate through various channels is essential for early detection.

Also it is crucial to log your incidents, and categorise them based on their impacted dimensions. This practice allows you to focus your attention on specific areas and identify potential gaps in your strategy. Even better, if you maintain an incident report, it enables you to reflect on how your work in specific areas contributes to reducing the number of incidents over time.

Incident log for month and by data quality dimension. In the stickers would be a brief description of the incident (image by the author)

Periodical revisions of the framework

Your team must review the incident log periodically and update your data quality framework accordingly to fill the identified gaps. This ensures your actions and goals reflect reality and are up to date.

Service Level Indicators and Transparency

It is essential to measure the fulfilment of your Service Level Objectives. For every SLO, you should have a Service Level Indicator (SLI) that shows the fulfilment of the SLO. For instance, in our example you could have a SLI that shows the percentage of success in the last X days of not having data that is older than 6 hours in production (timeliness dimension). This helps users understand how the data behaves and builds trust in its quality.

Service Level Indicators for our data quality dimensions (image by the author)

Transparency in practice is key to increase user adoption and Service Level Indicators are the ones in charge of providing this transparency.

For sharing our data quality metrics (SLIs), I really like embracing data product concept within a data-mesh implementation.

Our data quality strategy has these characteristics:

  • It is domain specific as the objectives comes from a business need
  • Transparent as we can share and want to share it with users
  • Visible as our data quality framework is easy to interpret

This aligns perfectly with the definition data-mesh gives to data products. I totally recommend using a data-mesh approach encapsulating data and its quality metrics into data products to enhance transparency.

Why data products for sharing our data quality metrics

Per definition, a data product in data-mesh is a self-contained, domain-specific unit of data capabilities. They encapsulate data, processing logic and data quality checks, promoting decentralised data ownership and seamless integration into the broader data ecosystem. They are designed to serve specific business needs in a specific domain. They are easily findable and transparent. As integral components of our data quality framework, data products ensure that our strategy aligns precisely with the unique requirements of each domain, providing visibility and transparency for domain-specific data quality.

One of the key advantages of data products in the context of data quality is their ability to hold their own SLIs. By integrating data quality indicators directly into the data products and making them visible through a user-friendly catalog, we empower users to search, request access, and explore data with full knowledge of its quality. This transparency and visibility enhance user confidence and encourage greater adoption.

Source link

This post originally appeared on TechToday.