Reproducible ML pipelines for production

Key benefits of ZenML

»  Guaranteed reproducibility of training experiments via:
  • Versioned data, code and models
  • Automatically tracked experiments
  • Declarative pipeline configs
»  Guaranteed comparability between experiments
»  Ability to quickly switch between local and cloud environment (e.g. orchestrate pipelines on kubernetes)
»  Built-in and extensible abstractions for:
  • Distributed pre-processing on large datasets
  • Cloud-based training jobs
  • Model serving
»  Pre-built helpers to compare and visualize parameters and results:
  • Automated evaluation of each pipeline run with tensorboard + TFMA
  • Automated statistics visualization of each pipeline run with TFDV
»  Cached pipeline states for faster experiment iterations

1. Connect your data

  • Choose a prebuilt connector for common sources (S3, Google Storage, BigQuery, SQL)
  • Write your own connector to your data or feature store
  • Automatic versioning and caching of your data for faster pipeline start - built-in!

2. Splitting

  • All common splitting methods supported.
  • Natively distributable.
  • A multitude of custom data splits.
  • All common data types supported, including time-series.
  • Auto-format data to TFRecords for 7x faster training.
  • Automatic caching of split results for faster starts of consecutive pipelines.

3. Transform

  • Distributable data preprocessing for lightning-fast pipelines.
  • All common preprocessing methods supported - including time-series.
  • Support for custom tf.functions for custom preprocessing.
  • All transforms are embedded in the training graph for seamless serving.

4. Train

  • Training on pre-configured GPU containers.
  • Hyperparameter-tuning natively baked in.
  • Distributable training for large model architectures.
  • Automated resource provisioning.
  • Leverage cloud resources on GCP, AWS, Azure.

5. Evaluate

  • Automated evaluation for every pipeline.
  • Clearly trace which pipeline steps lead to which results.
  • Absolute freedom - access raw results from Jupyter Notebooks.
  • Bring-your-own-tooling: Evaluate with your own metrics and tools.
  • Compare between pipelines to gain cross-training evaluation insights.

6. Serve

  • Every pipeline yields a serveable model - guaranteed.
  • Preprocessing, including custom functions, is embedded in the graph automatically.
  • Trace each model with full lineage.

7. Integrations

  • Powerful, out-of-the-box integrations to various backends like Kubernetes, Dataflow, Cortex, Sagemaker, Google AI Platform, and more.
  • Support for remote and local artifact stores
  • Easy integration of centralized Metadata Stores (MySQL).
  • Extensible Interfaces to build your own custom integrations.

8. Collaborate across your organization

  • Execute distributed data pipelines with a simple configuration.
  • Separate configuration and code for robust pipelines.
  • Reuse pipeline states across users and pipelines.
  • Clearly trace which pipeline steps lead to which results.
  • Compare results of training pipelines over time and across pipelines.
  • Share pipeline layouts with your team.

Frequently asked questions:

Q: Why did you build ZenML?

We built it because we scratched our own itch while deploying multiple ML models in production for the last 3 years. Our team struggled to find a simple yet production-ready solution whilst developing large scale ML pipelines, and built a solution for it that we are now proud to share with all of you!

Q: Can I integrate my own, custom processing backend?

Absolutely. We have a clever design for our integration interfaces, so you can simply add your own!

Q: I would like a more convenient way to collaborate with my team!

Fear not, we’re building a ZenML Cloud offering. Workloads will still be running under your control, and we don’t get access to your actual data, but we’ll be your centralized pipeline registry and metadata store. And we’ll throw in a nice UI, too. Sign up for our newsletter to stay in the loop!

Q: I would like to contribute to the repo.

Great to hear! Please check out our contribution guidelines, or simply hop on over to our Slack and chat us up :).

Q: My question is not answered yet!

Then connect with us using Slack - simply join us via this invite.

Join us - on Slack!

Become part of our growing community of domain experts, developers - and our own team members. We're happy to hear your questions, ideas and look forward to connect!

Join our Slack