🧑💻 What is Infrastructure-as-code?
Infrastructure-as-code (IaC) refers to using a dynamic codebase to provision and manage infrastructure, rather than deferring to manual processes. Before IaC took over as a mainstay in the DevOps world, there was no one standard for automating the management of infrastructure resources, especially using the hyperscalers like AWS, GCP, and Azure.
Popular tools for IaC
By now, there are many tools in the IaC space, however, these are a few that remain the most widely used:
- Terraform: Arguably, the tool that popularized IaC. It remains one of the standards in the space and has its configuration language called HCL.
- Pulumi: A competitor to Terraform, with the key difference being that it can be used in any programming language, not just HCL.
- AWS CloudFormation, Azure Resource Manager, and Google Cloud Deployment Manager: ****These are services offered by cloud providers, with their own syntax and methodologies. They offer the advantage of being native to the underlying cloud provider, but come with the disadvantage of your code usually then being constrained to only that cloud provider.
For this blog post, I will be focusing on Terraform as it is by far the most popular tool. However, ZenML does have a 1-click deployment feature in the browser that leverages native cloud services like AWS CloudFormation.
🥘 Infrastructure as Code in MLOps
In many ways, MLOps is an extension of DevOps, and therefore it is only natural that IaC plays as critical a role in MLOps, as it does in DevOps. Just like with any other computing paradigm, MLOps also requires users to provision and manage resources on the cloud. Therefore, ML practitioners can simply reuse the tried and tested practices of IaC in DevOps, and reap the same rewards. Specifically, for machine learning teams, adopting IaC can lead to the following benefits:
- Consistent and reproducible rollouts of experiments, training, and deployments
- A strong security standard for machine learning
- Visibility into spending and potential for cost optimization
- A link between ML and the rest of the organization's infrastructure deployments
- Easy onboarding of new teams/projects
📦MLOps Stacks: Modularized configuration for your infrastructure
ZenML is a MLOps framework that acts as a bridge between machine learning teams and production infrastructure. The ZenML stack concept is the configuration of tools and infrastructure that your pipelines can run on. A machine learning pipeline written in ZenML runs on the configuration defined in the stack.
Out of the box, ZenML’s API assumes that the infrastructure is already provisioned and then expects to register the components in its database. This can be a bit clunky, as it involves enabling the right permissions, executing the correct commands, and ensuring things work as expected across environments. All in all, it can be cumbersome to try to automate this process!
However, if you look closely, there is a natural link between the infrastructure deployed via IaC and the ZenML stack. What if there is a way to do the provisioning and registration back to ZenML in one go? This is where the Terraform modules for ZenML stacks come in.
🛫 Terraform Modules for ZenML Stacks
Recently, we published new modules to the Hashicorp registry for provisioning a MLOps stack on each of the popular cloud providers. These Terraform modules set up the necessary infrastructure for a ZenML stack and register the necessary configuration back to a ZenML server. This allows you to easily integrate MLOps into your existing cloud infrastructure without any reinventing of the wheel!
Here is how to do it:
🛠 Prerequisites
- Terraform installed (version >= 1.9")
- AWS, GCP, or Azure account set up
- AWS, GCP, or Azure CLI installed and authenticated
- ZenML (version >= 0.62.0) installed and configured. You'll need a ZenML server deployed in a remote setting where it can be accessed from the cloud provider. You have the option to either self-host a ZenML server or register for a free ZenML Pro account.
🏗 Resources Created
The Terraform module in this repository creates the following resources in your AWS account:
🚀 How to use the Terraform modules
Aside from the prerequisites mentioned above, you also need to create a ZenML Service Account API key for your ZenML Server. You can do this by running the following command in a terminal where you have the ZenML CLI installed:
After that, it’s a matter of copying this code snippet into a file. Here is an example configuration for AWS (GCP and Azure are similar):
You can then execute:
If you would like to destroy the resources and delete the stack, you can run:
Wait a few minutes, and you will see a fully configured stack in your ZenML dashboard!
📢 Note that currently the terraform scripts only deploy a basic cloud stack with an orchestrator, container registry, image builder, and artifact store. To add more components, please read the docs
💨 Try it yourself
You can try it yourself by following the instructions above or following the guide in the Hashicorp registry for AWS (Source), GCP (Source), or Azure (Source). The corresponding open-source GitHub repositories are also available.
If you need assistance, join our Slack Community and drop a message in the #general
channel. We’re known for our quick and helpful responses!
⭐️ Show Your Support
If you find this project helpful, please consider giving ZenML a star on GitHub. Your support helps promote the project and lets others know it's worth checking out.
Thank you for your support! 🌟