Whether you're an ML engineer focusing on infrastructure or a data scientist diving into model development, the combination of ZenML and SkyPilot offers a robust solution for managing ML workflows. This integration bridges the gap between rapid experimentation and scalable cloud execution.
Best part? Both tools are free and open source!
Why ZenML + SkyPilot?
SkyPilot brings its own set of powerful capabilities to the world of MLOps/LLMOps. As an open-source orchestration framework, SkyPilot excels in cloud-agnostic workloads, allowing users to run AI jobs on any cloud with minimal code changes. It offers intelligent cloud selection based on cost and availability, automatic spot instance handling for cost savings, and efficient management of cloud storage. SkyPilot's ability to easily launch, scale, and manage cloud resources makes it an ideal complement to ZenML's MLOps functionalities.
ZenML is an open source MLOps framework, that also abstracts away infrastructure complexity for cloud-agnostic ML workloads, but has less of a focus on the actual orchestration itself. Rather, it focuses on observability, reproducibility, and emphasizes the production stage of ML development.
Therefore, both products have clear synergies. Here are the advantages of using both together:
- Python-Centric Workflows: Define pipelines in Python, even within notebooks, instead of YAML.
- Abstracted Orchestration: Hide infrastructure details, focusing on ML logic.
- Flexible Execution: Switch between local and cloud runs with minimal changes.
- Comprehensive Tracking: Automatically version code, metadata, and data.
- Automated Containerization: Simplify dependency management and reproducibility.
Implementation Example
A good example to see the difference from good-old plain Skypilot, would be to take the quickstart training example, and see how it would work with ZenML.
First, install the required packages:
Now, start writing your workflows. Here's a multi-step pipeline for fine-tuning a BERT model on the GLUE MRPC dataset:
Running the Pipeline
1. Local Execution:
2. SkyPilot Execution:
This demonstrates the ease of transitioning from local to cloud execution without altering the core pipeline logic. In both cases, this is how it will show up on the dashboard:
Notice how much more shared, collaborative, and observable this run is, vs. having it run ad-hoc. This is the power of having a shared MLOps framework.
MLOps Platform Perspective: Enhancing Team Productivity at Scale
Integrating ZenML with SkyPilot offers significant advantages for scaling ML operations across larger data science organizations:
- Resource Optimization: Centralized tracking of cloud resource usage across all ML projects enables better allocation and cost management.
- Standardization: Enforce consistent workflows and best practices across diverse teams and projects.
- Collaboration: Improved visibility into model development processes and results fosters knowledge sharing and reduces redundant work.
- Unified Interface: A single platform for managing ML experiments, models, and deployments streamlines operations.
- Scalability: Seamlessly transition from experimentation to production-scale workflows without changing tools.
Instead of fragmented tooling and ad-hoc scripts, ZenML provides a centralized interface that tracks experiments, models, and metrics. This comprehensive view enables data science leaders to make informed decisions about resource allocation and project priorities, while the underlying SkyPilot integration ensures efficient use of cloud resources.
Key Advantages
- Clear Separation of Concerns: Isolated steps improve maintainability and reusability.
- Flexible Resource Configuration: Adjust cloud resources via simple ZenML settings.
- Version Control: Automatic tracking of data, code, and model versions.
- Cost Optimization: Leverage SkyPilot's spot instance and multi-region pricing features.
- Reproducibility: Containerized environments ensure consistent execution across different environments.
Conclusion
The ZenML + SkyPilot integration offers a powerful solution for ML teams, from individual contributors to large-scale data science operations. It combines the simplicity of ZenML's pipeline abstraction with the efficiency of SkyPilot's cloud orchestration. This approach maintains agility throughout the ML lifecycle while providing the structure necessary for scaling ML operations.
By abstracting infrastructure complexities, this integration allows data scientists and ML engineers to focus on model development and experimentation. Simultaneously, it gives MLOps teams the tools to standardize practices, optimize resources, and foster collaboration across the organization. The seamless transition between local and cloud environments, coupled with comprehensive versioning and tracking, makes this an invaluable asset for modern ML workflows in organizations of any size.
Try out ZenML with Skypilot today with the starter guide, and let us know on Slack how it went!