Unleashing More Power and Flexibility with ZenML's New Pipeline and Step Syntax

The 0.40.0 release introduces a completely reworked interface for developing your ZenML steps and pipelines.

In our continuous efforts to simplify and enhance your experience with ZenML, we’re thrilled to roll out a significant update that relates to our pipeline and step definition syntax. This substantial modification, the culmination of user feedback and internal testing, is designed to make working with ZenML much more natural, intuitive, and enjoyable.

At the core of this change we wanted to make it more flexible to work with two of ZenML’s core building blocks: pipelines and steps. We’ve overhauled the syntax with the primary aim to get out of your way and allow you to focus on what really matters: building efficient, reproducible, and robust machine learning pipelines.

We believe these improvements will make a considerable difference in your ZenML experience. Let’s dive into the new features that you will love using! First things first: new <code>Pipeline</code> goodies:

Pipeline Definitions

Use External Artifacts

External artifacts can be used to pass values to steps that are not produced by an upstream step. This common use case provides more flexibility when working with external data or models:

from zenml.steps.external_artifact import ExternalArtifact

@pipeline
def my_pipeline(lr: float):
    data = process_data()
    trainer(data=data, start_model=ExternalArtifact(svc.SVC(...)))

Instead of having to define an initial dataloader step, you can now just use these <code>ExternalArtifact</code> objects directly within your pipeline definition.

Pipelines take Input Parameters

Pipelines now support input parameters, making it easier to pass values to your steps. You can use the step directly in the pipeline function and pass pipeline parameters or raw input parameters:

@pipeline
def my_pipeline(lr: float):
    data = process_data()
    trainer(data=data, lr=lr, gamma=0.0002)

This allows you to configure (and run using) flexible hyperparameters. These input parameters become supercharged with our next feature: pipelines within pipelines!

Step parameters have also been improved: in previous versions, you had to define a separate class for step parameters using BaseParameters. This is no longer necessary, although it is still supported for backward compatibility. You can now pass parameters directly in the step function:

@step 
def trainer(data: pd.Dataframe, lr: float = 0.1, gamma: Optional[float] = 0.02) -> ...:
    print(lr)
    print(gamma)

Pipeline-ception!

You can now call pipelines within other pipelines. This does not execute the inner pipeline but instead adds its steps to the parent pipeline, allowing you to create modular and reusable workflows:

@pipeline(enable_cache=False)
def my_pipeline(a: int = 1):
    p1_output = subpipeline(pipeline_param=22)
    step_2(a=a, b=p1_output)

We’ve heard from lots of users that they’d like to have this feature, which might neatly combine with the ability to pass in input parameters to your pipeline. As always, you can definitely go overboard with the layers of abstraction used, but at least now you have the power to tackle some of those more complicated workflows.

Defining Inputs and Outputs for Pipelines

Pipelines can now define inputs and outputs, providing a clearer interface for working with data and dependencies between pipelines.

@pipeline(enable_cache=False)
def subpipeline(pipeline_param: int):
    out = step_1(k=None)
    step_2(a=3, b=pipeline_param)
    return 17

This would be useful, for example, when running an embedded pipeline that needed to pass some value to either a step or another pipeline. Really the sky’s the limit with these new flexible features!

Calling Steps Multiple Times Inside a Pipeline

You can now call steps multiple times inside a pipeline, allowing you to create more complex workflows and reuse steps with different parameters:

@pipeline
def my_pipeline(step_count: int) -> None:
    data = load_data_step()
    after = []
    for i in range(step_count):
        train_step(data, learning_rate=i * 0.0001, name=f"train_step_{i}")
        after.append(f"train_step_{i}")
    model = select_model_step(..., after=after)

This was also a much-requested feature from our users and community members that the new release now unlocks.

You’ll not only want to configure context-specific hyperparameters for your pipelines, but infrastructure-specific configuration is also important. We have a new way to do that:

Configuring Pipelines with <code>.with_options()</code>

When creating a pipeline, you should now use the <code>.with_options()</code> method to configure it:

if __name__ == "__main__":
    pipeline_copy = my_pipeline.with_options(
        enable_cache=False,
    )
    pipeline_copy()

Using your Pipelines

We added some quality-of-life improvements to how you can work with pipelines and steps:

Simplified Pipeline Execution

You no longer need to create a pipeline instance and then run it separately. You can now pass parameters directly at pipeline instance creation and execute the pipeline in a single step:

my_pipeline(lr=0.000001)

This not only makes ZenML a little more Pythonic but it makes it easier to use because you can run our pipeline and steps just like you would imagine they’d work. To that end, you can now also call steps directly outside of a pipeline, making it easier to test and debug your code:

Note that this just runs the function so if you want your artifacts tracked, your code and runs versioned (i.e. all the benefits that ZenML brings) you’ll want to run these steps as part of a pipeline.

General Improvements

We thought about how to make working with pipelines and steps cleaner and easier so here are two other small improvements:

Cleaner Imports

We have made the imports cleaner by removing the need to import <code>BaseParameters</code> and <code>step</code> separately. Now, you can simply import <code>step</code>and pipeline from <code>zenml</code>:

from zenml import step, pipeline

Enhanced Type Annotations for Step Inputs/Outputs

Steps can now have <code>Optional</code>, <code>Union</code>, and <code>Any</code> type annotations for their inputs and outputs. This allows you to pass different types of values at runtime, choose not to pass a value at all, or pass <code>None</code>. You can also return any type and specify a materializer for it, or use the default <code>cloudpickle</code> materializer:

@step
def trainer(data: pd.Dataframe, start_model: Union[svm.SVC, svm.SVR], coef0: Optional[int] = None) -> Any:
    #...your code goes here...

Migrating to the New Interface

The new interface is backwards-compatible, so you don’t need to worry about it breaking your existing code. However, we recommend switching to the new way of doing things for a more streamlined experience and we do consider the old way deprecated. (It will be removed in the future).

To migrate, simply update your imports, remove the <code>BaseParameters</code> class, pass parameters directly in the step function, and update your pipeline definition and execution as shown in the examples above.

To get started, simply import the new <code>@step</code> and <code>@pipeline</code> decorator and check out our new starter guide for more information.

from zenml import step, pipeline

@step
def my_step(...):
    ...

@pipeline
def my_pipeline(...):
    ...

We hope you enjoy the improvements in ZenML 0.40.0 and find it easier to create and manage your pipelines. As always, we welcome your feedback and suggestions for future updates.

If you run into any issues or want to discuss a specific use case, please reach out to us on Slack.