Integrations
Databricks
and
ZenML
Harness the Power of Databricks for Scalable ML Pipelines with ZenML
Databricks
All integrations

Databricks

Harness the Power of Databricks for Scalable ML Pipelines with ZenML
Add to ZenML

Harness the Power of Databricks for Scalable ML Pipelines with ZenML

Seamlessly integrate ZenML with Databricks to leverage its distributed computing capabilities for efficient and scalable machine learning workflows. This integration enables data scientists and engineers to run their ZenML pipelines on Databricks, taking advantage of its optimized environment for big data processing and ML workloads.

Features with ZenML

  • Effortlessly orchestrate ZenML pipelines on Databricks infrastructure
  • Leverage Databricks' distributed computing power for large-scale ML tasks
  • Seamlessly integrate with other Databricks services and tools
  • Monitor and manage pipeline runs through the Databricks UI
  • Schedule pipelines using Databricks' native scheduling capabilities

Main Features

  • Optimized for big data processing and machine learning workloads
  • Collaborative environment for data scientists, engineers, and analysts
  • Scalable and high-performance distributed computing
  • Integrated with popular data and ML frameworks (e.g., Spark, TensorFlow, PyTorch)
  • Comprehensive security and governance features

How to use ZenML with
Databricks

from zenml.integrations.databricks.flavors.databricks_orchestrator_flavor import DatabricksOrchestratorSettings

databricks_settings = DatabricksOrchestratorSettings(
    spark_version="15.3.x-scala2.12",
    num_workers="3",
    node_type_id="Standard_D4s_v5",
    policy_id=POLICY_ID,
    autoscale=(2, 3),
)

@pipeline(
    settings={
        "orchestrator.databricks": databricks_settings,
    }
)
def my_pipeline():
    load_data()
    preprocess_data()
    train_model()
    evaluate_model()

my_pipeline().run()

This code example demonstrates how to configure the Databricks orchestrator settings in ZenML. The DatabricksOrchestratorSettings object is used to specify the Spark version, number of workers, node type, autoscaling settings, and other configuration options. These settings are then passed to the @pipeline decorator using the settings parameter. Finally, the pipeline is defined with its steps and executed using my_pipeline().run().

Additional Resources
GitHub: ZenML Databricks Integration Example
ZenML Databricks Orchestrator Documentation

Harness the Power of Databricks for Scalable ML Pipelines with ZenML

Seamlessly integrate ZenML with Databricks to leverage its distributed computing capabilities for efficient and scalable machine learning workflows. This integration enables data scientists and engineers to run their ZenML pipelines on Databricks, taking advantage of its optimized environment for big data processing and ML workloads.
Databricks

Start Your Free Trial Now

No new paradigms - Bring your own tools and infrastructure
No data leaves your servers, we only track metadata
Free trial included - no strings attached, cancel anytime
Alt text: "Dashboard displaying a list of machine learning models with details on versioning, authors, and tags for insights and predictions."

Connect Your ML Pipelines to a World of Tools

Expand your ML pipelines with Apache Airflow and other 50+ ZenML Integrations
HyperAI
Amazon S3
AzureML Pipelines
Argilla
Kaniko
Apache Airflow
Pillow
Tekton
Discord
Microsoft Azure
Google Cloud Storage (GCS)