ZenML
Blog How to use ZenML and DBT together
ZenML 1 min

How to use ZenML and DBT together

How to use ZenML and dbt together, all powered by ZenML's built-in success hooks that run whenever your pipeline successfully completes.

How to use ZenML and DBT together
On this page

Today, Javier from Wayflyer asked about using ZenML And DBT together on Slack. Got me thinking: That seems like quite something that might be useful to many people.

Why use DBT and ZenML together?

ZenML is used for ML workflows, while DBT is used for data transformations. This goes hand in hand when you have use cases like:

  • Running a data transformation after training a model
  • Doing post-batch-inference data transformations (That’s Javier’s use case)
  • Triggering a training/inference/deployment ML workflow after a data transformation is complete

How I’d do it

My suggestion to Javier was to do the transformation as a ZenML success hook:

import requests
from zenml import step

@step(on_success=trigger_dbt)
def run_batch_inference(data: pd.DataFrame): 
  # run batch inference
  return results
  

def trigger_dbt():
  data = {}
  headers = {
    'Authorization': f'token {GITHUB_TOKEN}',
    'Accept': 'application/vnd.github.everest-preview+json'
  }
  url = f'https://api.github.com/repos/{GITHUB_REPO}/dispatches'
  payload = {
      'event_type': 'trigger-action',
      'client_payload': data
  }
  response = requests.post(url, json=payload, headers=headers)
  if response.status_code == 204:
      return jsonify({'message': 'GitHub Action triggered successfully'}), 200
  else:
      return jsonify({'message': 'Failed to trigger GitHub Action'}), response.status_code

The above code simply triggers a GitHub action in a repo where you have the DBT code. As DBT supports function invocation now (as Javier notes) , you could then have a github action that triggers the dbt transformation:

from dbt.cli.main import dbtRunner, dbtRunnerResult

# initialize
dbt = dbtRunner()

# create CLI args as a list of strings
cli_args = ["run", "--select", "tag:my_tag"]

# run the command
res: dbtRunnerResult = dbt.invoke(cli_args)

# inspect the results
for r in res.result:
    print(f"{r.node.name}: {r.status}")

That’s it - You’ve hooked up your ML pipelines with your dbt transformations.

So what?

Establishing a link between the modern data stack and the MLOps world is a challenge. Data and ML people often think differently and want to use their own tools for their daily work. Having more well defined interfaces between the worlds might lead to better outcomes overall.

The above is just an example and is an interesting place to start. Let me know if you do try it - and share your thoughts on the use-case in general over on Slack.

Start deploying AI workflows in production today

Enterprise-grade AI platform trusted by thousands of companies in production

Continue Reading

How to train and deploy a machine learning model on AWS Sagemaker with ZenML and BentoML

How to train and deploy a machine learning model on AWS Sagemaker with ZenML and BentoML

Learn how to use ZenML pipelines and BentoML to easily deploy machine learning models, be it on local or cloud environments. We will show you how to train a model using ZenML, package it with BentoML, and deploy it to a local machine or cloud provider. By the end of this post, you will have a better understanding of how to streamline the deployment of your machine learning models using ZenML and BentoML.

11 Mins Read