Integrate Sagemaker Pipelines with ZenML - Orchestrator Integrations

Orchestrate production ZenML pipelines with Amazon SageMaker

Streamline your machine learning workflows by running ZenML pipelines as Amazon SageMaker Pipelines, a serverless ML orchestrator from AWS. This integration enables you to leverage SageMaker's scalability, robustness, and built-in features to manage your ML pipelines efficiently in production environments.

Features with ZenML

Seamlessly execute ZenML pipelines as SageMaker Pipelines
Effortlessly scale pipeline execution with SageMaker's serverless infrastructure
Monitor and track pipeline runs using SageMaker's UI
Customize instance types and resources for the entire pipeline
Seamlessly leverage other Stack Components (S3, ECR, etc.)

‍

Main Features

Serverless and scalable orchestration of ML workflows
Built-in data processing and model training capabilities
Visual interface for monitoring and observing the pipelines
Integration with other AWS services for end-to-end ML solutions

‍

How to use ZenML with

Sagemaker Pipelines


# Step 1: Register a new Sagemaker orchestrator
>>> zenml orchestrator register <ORCHESTRATOR_NAME> \
    --flavor=sagemaker \
    --execution_role=<YOUR_IAM_ROLE_ARN>
    
# Step 2: Authernticate Sagemaker orchestrator
# Option 1 (recomended): Service Connector
>>> zenml orchestrator connect <ORCHESTRATOR_NAME> --connector <CONNECTOR_NAME>

# Option 2 (not recommended): Explicit authentication
>>> zenml orchestrator register <ORCHESTRATOR_NAME> \
    --flavor=sagemaker \
    --execution_role=<YOUR_IAM_ROLE_ARN> \ 
    --aws_access_key_id=...
    --aws_secret_access_key=...
    --region=...

# Option 3 (strictly not recommended): Implicit authentication
# Nothing needed, auth settings will be used from the running
# environment implicitely

# Step 3: Update your stack to use the Sagemaker orchestrator
>>> zenml stack update -o <ORCHESTRATOR_NAME>


from zenml import step, pipeline
from zenml.integrations.aws.flavors.sagemaker_orchestrator_flavor import (
    SagemakerOrchestratorSettings,
)


@step
def preprocess_data() -> int:
    return 1


@step
def train_model(data: int) -> str:
    return str(data)


@pipeline(
    settings={
        "orchestrator.sagemaker": SagemakerOrchestratorSettings(
            instance_type="ml.m5.large",
            volume_size_in_gb=30,
        ),
    }
)
def ml_pipeline():
    input_data = preprocess_data()
    train_model(input_data)


if __name__ == "__main__":
    ml_pipeline()

This code snippet demonstrates how to configure ZenML steps to run on SageMaker Pipelines orchestrator. It showcases customizing instance types and resources for individual steps, as well as utilizing data stored in Amazon S3 for input and output. The preprocess_data step runs on a CPU-based instance, while the train_model step uses a GPU-enabled instance and accesses data from S3.

Additional Resources

Read the full SageMaker Pipelines integration documentation

Learn more about Amazon SageMaker Pipelines

Orchestrate production ZenML pipelines with Amazon SageMaker

Sagemaker Pipelines

Orchestrate production ZenML pipelines with Amazon SageMaker

Features with ZenML

Main Features

Orchestrate production ZenML pipelines with Amazon SageMaker

Unify Your ML and LLM Workflows

Connect Your ML Pipelines to a World of Tools

Connect Your ML Pipelines to a World of Tools