Software Engineering

How to use ZenML and DBT together

Hamza Tahir
Jun 21, 2024
1 min

Today, Javier from Wayflyer asked about using ZenML And DBT together on Slack. Got me thinking: That seems like quite something that might be useful to many people.

Why use DBT and ZenML together?

ZenML is used for ML workflows, while DBT is used for data transformations. This goes hand in hand when you have use cases like:

  • Running a data transformation after training a model
  • Doing post-batch-inference data transformations (That’s Javier’s use case)
  • Triggering a training/inference/deployment ML workflow after a data transformation is complete

How I’d do it

My suggestion to Javier was to do the transformation as a ZenML success hook:

import requests
from zenml import step

@step(on_success=trigger_dbt)
def run_batch_inference(data: pd.DataFrame): 
  # run batch inference
  return results
  

def trigger_dbt():
  data = {}
  headers = {
    'Authorization': f'token {GITHUB_TOKEN}',
    'Accept': 'application/vnd.github.everest-preview+json'
  }
  url = f'https://api.github.com/repos/{GITHUB_REPO}/dispatches'
  payload = {
      'event_type': 'trigger-action',
      'client_payload': data
  }
  response = requests.post(url, json=payload, headers=headers)
  if response.status_code == 204:
      return jsonify({'message': 'GitHub Action triggered successfully'}), 200
  else:
      return jsonify({'message': 'Failed to trigger GitHub Action'}), response.status_code

The above code simply triggers a GitHub action in a repo where you have the DBT code. As DBT supports function invocation now (as Javier notes) , you could then have a github action that triggers the dbt transformation:

from dbt.cli.main import dbtRunner, dbtRunnerResult

# initialize
dbt = dbtRunner()

# create CLI args as a list of strings
cli_args = ["run", "--select", "tag:my_tag"]

# run the command
res: dbtRunnerResult = dbt.invoke(cli_args)

# inspect the results
for r in res.result:
    print(f"{r.node.name}: {r.status}")

That’s it - You’ve hooked up your ML pipelines with your dbt transformations.

So what?

Establishing a link between the modern data stack and the MLOps world is a challenge. Data and ML people often think differently and want to use their own tools for their daily work. Having more well defined interfaces between the worlds might lead to better outcomes overall.

The above is just an example and is an interesting place to start. Let me know if you do try it - and share your thoughts on the use-case in general over on Slack.

Looking to Get Ahead in MLOps & LLMOps?

Subscribe to the ZenML newsletter and receive regular product updates, tutorials, examples, and more articles like this one.
We care about your data in our privacy policy.