Why do we need ZenML? [Read More]

Create Reproducible ML pipelines with ZenML

1. Start with a simple python function

  • Use a simple decorator to go from a Python function to a pipeline step.
  • Stay within a paradigm that is familiar to data scientists.
  • Run pipelines from Jupyter notebooks.

2. Connect multiple steps as a pipeline

  • Simple API to define flexible pipelines.
  • pip install zenml the only requirement to run pipelines locally.
  • Code, data, configuration all automatically versioned.

3. Set a schedule

  • Set up continuous training or inference jobs on a schedule.
  • Visualize results with powerful built-in visualizers.
  • Compare between runs to gain cross-training evaluation insights.

4. Create a production MLOps stack

  • When ready, deploy pipelines to more complicated MLOps stacks.
  • Automations using popular ML libraries (statistics calculation, drift detection, hyperparameter tuning etc.)
  • Simple API to fetch pipeline runs and artifacts locally.

5. Deploy pipeline to the cloud

  • Cloud agnostic and extensible.
  • Separate configuration and code for robust pipelines.
  • Abstract infrastructure: Simply specify resources required per step.

6. Attach powerful training backends WIP*

  • Attach GPUs to pipelines with simple commands.
  • Enable distributed training and hyper-parameter tuning.
  • Reproduce results across your team from any point in history.

7. Enable distributed processing WIP*

  • Apache Beam integration allow easy syntax to write distributable code.
  • Configure a backend of choice and let the framework handle the rest.
  • Scale up and down according to requirements.

8. Test and validate model and data WIP*

  • Built in report generation mechanism for model and data drift detection.
  • Set up tests in your pipeline to ensure model quality.
  • Trace each model with full lineage and provenance.

9. Automate model deployments WIP*

  • Simple integrations into deployment libraries to setup deployer steps.
  • Built-in materializers for common data science packages
  • Schedule automated deployments based on validation tests.

10. Leverage artifact lineage with caching WIP*

  • Reuse pipeline states across users and pipelines.
  • Save on heavy preprocessing computing expense.
  • Clearly trace which pipeline steps lead to which results.

Benefits

»  Reproduce results of pipeline via metadata tracking
»  Standardized data science workflows
»  Quickly switch between local and cloud environment
»  Tooling and infrastructure agnostic: no vendor lock-in
»  Pre-built helpers to visualize parameters and results
»  Cached pipeline states for quick experiment iterations

Frequently asked questions:

Q: Why did you build ZenML?

We built it because we scratched our own itch while deploying multiple ML models in production for the last 3 years. Our team struggled to find a simple yet production-ready solution whilst developing large scale ML pipelines, and built a solution for it that we are now proud to share with all of you!

Q: Does X integrate with ZenML?

We are constantly working to include more tools and integrations with ZenML (check the roadmap for more details). You can upvote the features you'd like or build your own custom integrations!

Q: I would like to contribute to the repo.

Great to hear and we welcome your contribution! Please check out our contribution guidelines to get started.

Q: My question is not answered yet!

Then connect with us using Slack - simply join us via this invite.

Join us - on Slack!

Become part of our growing community of domain experts, developers - and our own team members. We're happy to hear your questions, ideas and look forward to connect!

Join our Slack

Or follow our journey - on SubStack!

We're building ZenML out in public - and we're using our SubStack newsletter to share the journey as we progress. Sign up now!

Join our SubStack