Building Durable Data Pipelines. Data engineering techniques for robust… | by 💡Mike Shakhomirov | Mar, 2024

March 3, 2024
by Mike Shakhomirov
AI, Syndicated
1270 Views

Data engineering techniques for robust and sustainable ETL

Data durability in data pipeline design is a well-known pain point in the data engineering space. It is a well-known fact that data availability and data quality issues can lead to a significant increase in time on non-value-added tasks. In this story, I would like to speak about data engineering design patterns for data pipelines to ensure data is always there. We will speak about techniques that might help us to build a sustainable data transformation process where data is always delivered on time and our data pipeline can be described as robust, durable and maybe even self-fixing.

If a data pipeline fails employees most likely will have to perform a set of manual tasks including unnecessary data sourcing, aggregation and processing to get to the desired outcome.

Data durability is a renowned risk factor in data engineering. In my opinion, it is the least discussed topic online at the moment. However, simply because you don’t see the problem it doesn’t mean it is not there. Data engineers might not speak of it often. The issue though exists, seeding fear among data practitioners and turning data pipeline design into a real challenge.

Data availability and data quality issues might lead to further delays in data delivery and other reporting failures. According to McKinsey report, time spent by employees on non-value-adding tasks can increase drastically due to these factors:

Time spent by employees on non-value-added tasks due to data quality. Source: McKinsey Global Data Transformation Survey, 2019

This would typically include not-required data investigations including extra data sourcing, data cleansing, reconciliation, and aggreagtion resulting in lots of manual tasks.

These manual tasks are absolutely unnecessary

So how do we build robust, durable and self-fixing pipelines?

What is a data pipeline?

There is a data pipeline whenever there is data processing between points A and B. Once can be considered as the source and the other as a destination:

Source link

AI Data availability Data durability Data engineering data pipeline Data quality design Global survey Transformation

Microsoft at NVIDIA GTC: New solutions for Microsoft

Microsoft combines accelerated computing with cloud scale engineering to bring advanced AI capabilities to our customers. For years, we’ve worked

accelerated computing, AI, Azure, Azure AI, Azure Arc, capabilities, cloud scale engineering, Featured, Foundry Agent Service, Foundry Local, Microsoft, Microsoft Fabric, Microsoft Foundry, Nvidia, Physical AI, Security, Syndicated, The Official Microsoft Blog

Building Durable Data Pipelines. Data engineering techniques for robust… | by 💡Mike Shakhomirov | Mar, 2024

Data engineering techniques for robust and sustainable ETL

What is a data pipeline?

About Us

Our Services

Latest QSOL IT News

Building Durable Data Pipelines. Data engineering techniques for robust… | by 💡Mike Shakhomirov | Mar, 2024

Data engineering techniques for robust and sustainable ETL

What is a data pipeline?

Related Post

Microsoft at NVIDIA GTC: New solutions for Microsoft

The Human Layer in an Autonomous World: Why

The Human Layer in an Autonomous World: Why

How Technium Builds Exceptional Networks to Power Real