Integrations
Databricks
and
ZenML logo in purple, representing machine learning pipelines and MLOps framework.
Harness the Power of Databricks for Scalable ML Pipelines with ZenML
Databricks
All integrations

Databricks

Harness the Power of Databricks for Scalable ML Pipelines with ZenML
Add to ZenML

Harness the Power of Databricks for Scalable ML Pipelines with ZenML

Seamlessly integrate ZenML with Databricks to leverage its distributed computing capabilities for efficient and scalable machine learning workflows. This integration enables data scientists and engineers to run their ZenML pipelines on Databricks, taking advantage of its optimized environment for big data processing and ML workloads.

Features with ZenML

  • Effortlessly orchestrate ZenML pipelines on Databricks infrastructure
  • Leverage Databricks' distributed computing power for large-scale ML tasks
  • Seamlessly integrate with other Databricks services and tools
  • Monitor and manage pipeline runs through the Databricks UI
  • Schedule pipelines using Databricks' native scheduling capabilities

Main Features

  • Optimized for big data processing and machine learning workloads
  • Collaborative environment for data scientists, engineers, and analysts
  • Scalable and high-performance distributed computing
  • Integrated with popular data and ML frameworks (e.g., Spark, TensorFlow, PyTorch)
  • Comprehensive security and governance features

How to use ZenML with
Databricks

from zenml.integrations.databricks.flavors.databricks_orchestrator_flavor import DatabricksOrchestratorSettings

databricks_settings = DatabricksOrchestratorSettings(
    spark_version="15.3.x-scala2.12",
    num_workers="3",
    node_type_id="Standard_D4s_v5",
    policy_id=POLICY_ID,
    autoscale=(2, 3),
)

@pipeline(
    settings={
        "orchestrator.databricks": databricks_settings,
    }
)
def my_pipeline():
    load_data()
    preprocess_data()
    train_model()
    evaluate_model()

my_pipeline().run()

This code example demonstrates how to configure the Databricks orchestrator settings in ZenML. The DatabricksOrchestratorSettings object is used to specify the Spark version, number of workers, node type, autoscaling settings, and other configuration options. These settings are then passed to the @pipeline decorator using the settings parameter. Finally, the pipeline is defined with its steps and executed using my_pipeline().run().

Additional Resources
GitHub: ZenML Databricks Integration Example
ZenML Databricks Orchestrator Documentation

Harness the Power of Databricks for Scalable ML Pipelines with ZenML

Seamlessly integrate ZenML with Databricks to leverage its distributed computing capabilities for efficient and scalable machine learning workflows. This integration enables data scientists and engineers to run their ZenML pipelines on Databricks, taking advantage of its optimized environment for big data processing and ML workloads.
Databricks

Start Your Free Trial Now

No new paradigms - Bring your own tools and infrastructure
No data leaves your servers, we only track metadata
Free trial included - no strings attached, cancel anytime
Dashboard displaying machine learning models, including versions, authors, and tags. Relevant to model monitoring and ML pipelines.

Connect Your ML Pipelines to a World of Tools

Expand your ML pipelines with Apache Airflow and other 50+ ZenML Integrations
PyTorch
Hugging Face (Inference Endpoints)
Kaniko
Amazon S3
Pillow
Google Cloud Vertex AI Pipelines
Prodigy
Kubeflow
Neptune
NeuralProphet
GitHub Container Registry