How we think about Data Pipelines is changing | by Hugo Lu | Nov, 2023

November 8, 2023
by Hugo Lu
AI, Syndicated
426 Views

The goal is to reliably and efficiently release data into production

Data Pipelines are series of tasks organised in a directed acyclic graph or “DAG”. Historically, these are run on open-source workflow orchestration packages like Airflow or Prefect, and require infrastructure managed by data engineers or platform teams. These data pipelines typically run on a schedule, and allow data engineers to update data in locations such as data warehouses or data lakes.

This is now changing. There is a great shift in mentality happening. As the data engineering industry matures, mindsets are shifting from a “move data to serve the business at all costs” mindset to “reliability and efficiency” / “software engineering” mindset.

Continuous Data Integration and Delivery

I’ve written before about how Data Teams ship data whereas software teams ship code.

This is a process called “Continuous Data Integration and Delivery”, and is the process of reliably and efficiently releasing data into production. There are subtle differences with the definition of “CI/CD” as used in Software Engineer, illustrated below.

In software engineering, Continuous Delivery is non-trivial because of the importance of having a near exact replica for code to operate in a staging environment.

Within Data Engineering, this is not necessary because the good we ship is data. If there is a table of data, and we know that as long as a few conditions are satisfied, the data is of a sufficient quality to be used, then that is sufficient for it to be “released” into production, so to speak.

The process of releasing data into production — the analog for Continuous Delivery — is very simple, as it simply relates to copying or cloning a dataset.

Furthermore, a key pillar of data engineering is reacting to new data as it arrives or checking to see if new data exists. There is no analog for this in software engineering — software applications do not need to…

Source link

Resilience Engineering Lessons from James Kretchmer: Learning from

When people opened their devices on October 20, 2025, the internet felt strangely still. The cloud, usually buzzing quietly in

Amazon Web Services, Business, Cloud, Collaboration, Culture, Digital Governance, Digital Transformation, Internet, regulatory compliance, Security, Security and Compliance, Service Management, Syndicated, Technology, Unified Communications, Workplace Management

Business, cloud outage, Collaboration, connectivity, Culture, Digital Governance, Digital Transformation, Endpoints, Finance, redundancy, regulatory compliance, resolution failure, Security and Compliance, Service Management, Syndicated, Technology, UCaaS, Unified Communications

How we think about Data Pipelines is changing | by Hugo Lu | Nov, 2023

The goal is to reliably and efficiently release data into production

Continuous Data Integration and Delivery

About Us

Our Services

Latest QSOL IT News

How we think about Data Pipelines is changing | by Hugo Lu | Nov, 2023

The goal is to reliably and efficiently release data into production

Continuous Data Integration and Delivery

Related Post

Resilience Engineering Lessons from James Kretchmer: Learning from

Unlocking GenAI: Building a Secure and Reliable Network

How GammaUCX Transforms Migration for IT Leaders—and Ends

What the AWS Outage Tells Leaders About Risk,