Call of Data: Dev Ops

What are Data Ops?

As a company or a team embeds itself in data, it is vital that considerations are made to ensure robustness, efficiency and flexibility of the infrastructure that they build. One can have salient data to work with, but if this data is siloed, messy or untimely it mitigates its usefulness and its ability to maximise its commercial value.

Data Ops is a catch-all term to describe the procedures and standards set in place to ensure data robustness and fluency. It borrows the software engineering concept of dev ops and shares a lot of common principles. It helps apply structure to your pipelines and avoids "vibes" based development.

There are three main principles that lay the groundwork of Data Ops:

Agile Development: This involves working in sprints to get iterations of work completed regularly rather than getting too many things pushed at once with too much of a wait between iterations.
CI/CD: This refers to "Continuous Integration/Continuous Deployment". This helps minimise the need for manual monitoring and assessment. If a data load issue occurs upstream, your pipeline should catch this and perform appropriate tasks to prevent this load issue from affecting dashboards downstream.
SPC: This stands for "Statistical Process Control". This involves constantly monitoring the health of your data. Streamlining procedures to pick up oddities in your data sources and trigger alerts can allow you to identify idiosyncrasies and address them promptly.

So, what?

There are numerous benefits to implementing data ops into your pipelines:

Reduce errors: Automating the identification of wonky data can catch these errors before they reach stakeholders and impact/interfere with business decisions.
Speed up innovation: It saves time and energy for developers to spend more time on development and less time putting out fires.
Innovative collaboration: It allows you to establish a more fluid relationship between developers and stakeholders to bridge the business-data gap.

How can I start?

Some examples of ways to implement data ops include:

Test your data: Perform tests on your data to catch null values, negative values where they don't make sense, incorrect string formats etc. Some tools like DBT allow you to configure these tests to automatically run whenever your model runs.
Use version control: Using a version control platform like Github or Gitlab allows you to create a lineage of development which synchronises development on project across large teams and documents the history of what you are working on which you can reference at any time.
Communication: Ensuring consistent and clear communication between data engineers and business analysts allows developers to continually develop with business end-goals and front end objectives in mind.

Author:

Caolan Daly

View Profile