Harness the Power of Databricks for Scalable ML Pipelines with ZenML
Seamlessly integrate ZenML with Databricks to leverage its distributed computing capabilities for efficient and scalable machine learning workflows. This integration enables data scientists and engineers to run their ZenML pipelines on Databricks, taking advantage of its optimized environment for big data processing and ML workloads.
Features with ZenML
- Effortlessly orchestrate ZenML pipelines on Databricks infrastructure
- Leverage Databricks' distributed computing power for large-scale ML tasks
- Seamlessly integrate with other Databricks services and tools
- Monitor and manage pipeline runs through the Databricks UI
- Schedule pipelines using Databricks' native scheduling capabilities
Main Features
- Optimized for big data processing and machine learning workloads
- Collaborative environment for data scientists, engineers, and analysts
- Scalable and high-performance distributed computing
- Integrated with popular data and ML frameworks (e.g., Spark, TensorFlow, PyTorch)
- Comprehensive security and governance features
How to use ZenML with
Databricks
from zenml.integrations.databricks.flavors.databricks_orchestrator_flavor import DatabricksOrchestratorSettings
databricks_settings = DatabricksOrchestratorSettings(
spark_version="15.3.x-scala2.12",
num_workers="3",
node_type_id="Standard_D4s_v5",
policy_id=POLICY_ID,
autoscale=(2, 3),
)
@pipeline(
settings={
"orchestrator.databricks": databricks_settings,
}
)
def my_pipeline():
load_data()
preprocess_data()
train_model()
evaluate_model()
my_pipeline().run()
This code example demonstrates how to configure the Databricks orchestrator settings in ZenML. The DatabricksOrchestratorSettings object is used to specify the Spark version, number of workers, node type, autoscaling settings, and other configuration options. These settings are then passed to the @pipeline decorator using the settings parameter. Finally, the pipeline is defined with its steps and executed using my_pipeline().run().
Additional Resources
GitHub: ZenML Databricks Integration Example
ZenML Databricks Orchestrator Documentation