Infrastructure as Code (IaC) for MLOps with Terraform & ZenML

Infrastructure-as-code (IaC) refers to using a dynamic codebase to provision and manage infrastructure, rather than deferring to manual processes. Before IaC took over as a mainstay in the DevOps world, there was no one standard for automating the management of infrastructure resources, especially using the hyperscalers like AWS, GCP, and Azure.

Popular tools for IaC

By now, there are many tools in the IaC space, however, these are a few that remain the most widely used:

For this blog post, I will be focusing on Terraform as it is by far the most popular tool. However, ZenML does have a 1-click deployment feature in the browser that leverages native cloud services like AWS CloudFormation.

🥘 Infrastructure as Code in MLOps

In many ways, MLOps is an extension of DevOps, and therefore it is only natural that IaC plays as critical a role in MLOps, as it does in DevOps. Just like with any other computing paradigm, MLOps also requires users to provision and manage resources on the cloud. Therefore, ML practitioners can simply reuse the tried and tested practices of IaC in DevOps, and reap the same rewards. Specifically, for machine learning teams, adopting IaC can lead to the following benefits:

📦MLOps Stacks: Modularized configuration for your infrastructure

ZenML is a MLOps framework that acts as a bridge between machine learning teams and production infrastructure. The ZenML stack concept is the configuration of tools and infrastructure that your pipelines can run on. A machine learning pipeline written in ZenML runs on the configuration defined in the stack.

Out of the box, ZenML’s API assumes that the infrastructure is already provisioned and then expects to register the components in its database. This can be a bit clunky, as it involves enabling the right permissions, executing the correct commands, and ensuring things work as expected across environments. All in all, it can be cumbersome to try to automate this process!

However, if you look closely, there is a natural link between the infrastructure deployed via IaC and the ZenML stack. What if there is a way to do the provisioning and registration back to ZenML in one go? This is where the Terraform modules for ZenML stacks come in.

🛫 Terraform Modules for ZenML Stacks

Recently, we published new modules to the Hashicorp registry for provisioning a MLOps stack on each of the popular cloud providers. These Terraform modules set up the necessary infrastructure for a ZenML stack and register the necessary configuration back to a ZenML server. This allows you to easily integrate MLOps into your existing cloud infrastructure without any reinventing of the wheel!

🛠 Prerequisites

🏗 Resources Created

The Terraform module in this repository creates the following resources in your AWS account:

	AWS	GCP	Azure
Resources provisioned	an S3 bucket an ECR repository an IAM user and an access key for it an IAM role with the minimum necessary permissions to access the S3 bucket, the ECR repository and the SageMaker service to build and push container images, store artifacts, and run pipelines	a GCS bucket a Google Artifact Registry a Service Account with a Service Account Key and the minimum necessary permissions to access the GCS bucket, the Google Artifact Registry and the GCP project to build and push container images with Google Cloud Build, store artifacts and run pipelines with Vertex AI.	an Azure Resource Group with the following child resources: an Azure Storage Account and a Blob Container an Azure Container Registry an Azure Service Principal with a Service Principal Password and the minimum necessary permissions to access the Blob Container, the ACR container registry and the Azure subscription to build and push container images, store artifacts and run pipelines with Skypilot.
Stack components created	an S3 Artifact Store linked to the S3 bucket an ECR Container Registry linked to the ECR repository a SageMaker Orchestrator linked to the AWS account an AWS Service Connector configured with the IAM role credentials and used to authenticate all ZenML components with the AWS account	an GCP Artifact Store linked to the GCS bucket an GCP Container Registry linked to the Google Artifact Registry a Vertex AI Orchestrator linked to the GCP project a Google Cloud Build Image Builder linked to the GCP project a GCP Service Connector configured with the GCP service account credentials and used to authenticate all ZenML components with the GCP resources	an Azure Artifact Store linked to the Azure Storage Account and Blob Container an ACR Container Registry linked to the Azure Container Registry an Azure Skypilot Orchestrator linked to the Azure subscription an Azure Service Connector configured with the Azure Service Principal credentials and used to authenticate all ZenML components with the Azure resources

🚀 How to use the Terraform modules

Aside from the prerequisites mentioned above, you also need to create a ZenML Service Account API key for your ZenML Server. You can do this by running the following command in a terminal where you have the ZenML CLI installed:

zenml service-account create <service-account-name>

After that, it’s a matter of copying this code snippet into a file. Here is an example configuration for AWS (GCP and Azure are similar):

module "zenml_stack" {
  source  = "zenml-io/zenml-stack/aws"
  region = "us-west-2"
  zenml_server_url = "https://your-zenml-server-url.com"
  zenml_api_key = "ZENKEY_1234567890..."
}

terraform init
terraform apply

Wait a few minutes, and you will see a fully configured stack in your ZenML dashboard!

💨 Try it yourself

You can try it yourself by following the instructions above or following the guide in the Hashicorp registry for AWS (Source), GCP (Source), or Azure (Source). The corresponding open-source GitHub repositories are also available.

If you need assistance, join our Slack Community and drop a message in the #general channel. We’re known for our quick and helpful responses!

⭐️ Show Your Support

If you find this project helpful, please consider giving ZenML a star on GitHub. Your support helps promote the project and lets others know it's worth checking out.

Infrastructure as Code (IaC) for MLOps with Terraform & ZenML

Get started with ZenML today

🧑‍💻 What is Infrastructure-as-code?

Popular tools for IaC

🥘 Infrastructure as Code in MLOps

📦MLOps Stacks: Modularized configuration for your infrastructure

🛫 Terraform Modules for ZenML Stacks

🛠 Prerequisites

🏗 Resources Created

🚀 How to use the Terraform modules

💨 Try it yourself

⭐️ Show Your Support

Start deploying reproducible AI workflows today