Software Engineering

Automating Lightning Studio ML Pipelines For Fine Tuning LLM (s)

Hamza Tahir
Oct 9, 2024
06 mins

In the fast-paced world of AI, the ability to efficiently fine-tune Large Language Models (LLMs) for specific tasks is becoming a critical competitive advantage. This post dives into how combining Lightning AI Studios with ZenML can streamline and automate your LLM fine-tuning process, enabling rapid iteration and deployment of task-specific models.

The LLM Fine Tuning Challenge

As LLMs like GPT-4o, Llama 3.1, Mistral, etc. become more accessible, companies are increasingly looking to adapt these models for specialized tasks. This could range from customer service chatbots and content generation to specialized data analysis and decision support systems.

One of the most exciting developments in LLM fine-tuning is the ability to create and serve multiple fine-tuned variants of a model efficiently. With LoRA (Low-Rank Adaptation) and its many variants [1][2], you can generate small adapter weights that modify the behavior of the base model without changing its core parameters. This allows you to service each use case with its own fine-tuned model variant.

This approach has tremendous benefits:

  1. Fine-tune models with minimal computational resources
  2. Store and distribute only the adapter weights (typically a few MB) instead of entire models
  3. Serve thousands of fine-tuned variants from a single base model (see LoRAX)

Generating A Fine Tuning LLM Flywheel

However, as organizations seek to leverage these models for specific use cases, a new challenge has emerged: the need to efficiently fine-tune and manage numerous LLM variants. The challenge lies in scaling this process. Fine-tuning a single LLM is manageable (in fact it’s easier than ever with tools like axolotl), but what happens when you need to maintain and update dozens or even hundreds of fine-tuned models? Some key issues include:

  1. Resource Management: LLM fine-tuning is computationally intensive, often requiring specialized hardware like GPUs or TPUs.
  2. Data Preparation: Each fine-tuning task may require its own dataset, which needs to be collected, cleaned, and formatted appropriately.
  3. Hyperparameter Optimization: Finding the right hyperparameters for each fine-tuning task can be a time-consuming process.
  4. Version Control: Keeping track of different model versions, their training data, and performance metrics becomes increasingly complex.
  5. Deployment and Serving: Getting fine-tuned models into production efficiently and managing their lifecycle presents another set of challenges.
  6. Cost Management: With the computational resources required, costs can quickly spiral if not managed carefully.

Setting up infrastructure is hard

As organizations scale up their LLM usage, these challenges compound. What's needed is not just a way to fine-tune models, but a comprehensive, automated pipeline for managing the entire process. These projects also face challenges in setting up and managing infrastructure. This process can be time-consuming and complex, especially when dealing with distributed training or scaling workloads. Data scientists working on the models shouldn’t have to worry about spinning up and maintaining compute instances or managing credentials for them. This is where ZenML and Lightning AI can help you! This blog post explores the growing complexity of LLM fine-tuning at scale and introduces a solution that combines the flexibility of Lightning Studios with the automation capabilities of ZenML.

Lightning Studios: Your New ML Playground

A screenshot of a Lightning AI studio workspace. It showcases a VS Code instance. On the right you see the ability to change the Machine type to many different GPU configurations.

Lightning AI, the brainchild of the PyTorch Lightning crew, has dropped a bomb on the ML world with their Studios concept. Think of it as your personal, scalable ML lab in the cloud. Need to switch from CPU to multi-GPU setups on the fly? Lightning's got you covered.

Studios provide modular, cloud-based environments pre-configured with essential ML libraries. The key advantage is flexibility - you can easily scale from CPU-only setups to multi-GPU, multi-node configurations as your needs evolve. For LLM fine-tuning, this means you can:

  1. Prototype and debug your data preperation scripts on CPU instances
  2. Seamlessly switch to GPU instances for the actual training
  3. Scale up to multi-GPU setups for larger models or datasets

ZenML: Structuring and Automating ML Workflows

While Lightning Studios provide the environment, ZenML brings structure and automation to your ML pipelines. As an open-source MLOps framework, ZenML integrates seamlessly with Lightning AI, allowing you to define reproducible, scalable workflows for your fine-tuning tasks.

ZenML offers several key benefits when combined with Lightning AI:

  1. Faster Execution: Automatic packaging and upload of code to Lightning AI Studio.
  2. Reproducible Training: Consistent results by encapsulating Lightning AI configurations within ZenML pipelines.
  3. Quick Experimentation: Run experiments with different parameters and on different machines quickly using ZenML's configurable pipelines.
  4. Seamless Tracking: Track and compare model metrics, hyperparameters, and artifacts using ZenML's experiment tracking features.
  5. Managed Infrastructure: Access to Lightning AI's infrastructure, including GPUs, for running your pipelines.
  6. Built-in Distributed Training: Leverage Lightning AI's support for distributed training out of the box.

To prepare your environment for LLM fine-tuning with ZenML and Lightning Studios:

  1. Install ZenML and the Lightning integration:
pip install zenml
zenml integration install lightning s3 aws -y
  1. Clone the sample project:
git clone https://github.com/zenml-io/zenml-projects.git
cd zenml-projects/llm-finetuning-simple
pip install -r requirements.txt
  1. Initialize and connect to a deployed ZenML server:
zenml init
zenml connect --url <MYZENMLSERVERURL>

These steps install necessary tools, set up your project, and prepare ZenML for use with Lightning Studios. This foundation enables you to create reproducible ML pipelines, easily switch between local and cloud environments, and effectively track your experiments.

With this setup, you're ready to define your LLM fine-tuning pipeline and leverage the scalability of Lightning Studios for your training tasks.

Fine-tuning LLMs in 60 lines of code

While LLM fine-tuning can seem daunting, we can distill the core process into a concise, yet powerful script. Here's a condensed ZenML pipeline that captures the essence of LLM fine-tuning in just about 60 lines of code:

import torch
from datasets import load_dataset, Dataset
from transformers import AutoTokenizer, AutoModelForCausalLM, Trainer, TrainingArguments, DataCollatorForLanguageModeling
from zenml import pipeline, step, log_model_metadata
from typing_extensions import Annotated
import argparse
from zenml.integrations.huggingface.materializers.huggingface_datasets_materializer import HFDatasetMaterializer

@step(output_materializers=HFDatasetMaterializer)
def prepare_data(base_model_id: str, dataset_name: str, dataset_size: int, max_length: int) -> Annotated[Dataset, "tokenized_dataset"]:
    tokenizer = AutoTokenizer.from_pretrained(base_model_id)
    tokenizer.pad_token = tokenizer.eos_token
    dataset = load_dataset(dataset_name, split=f"train[:{dataset_size}]")

    def tokenize_function(example):
        prompt = f"Question: {example['question']}\\nAnswer: {example['answers']['text'][0]}"
        return tokenizer(prompt, truncation=True, padding="max_length", max_length=max_length)

    tokenized_data = dataset.map(tokenize_function, remove_columns=dataset.column_names)
    log_model_metadata(metadata={"dataset_size": len(tokenized_data), "max_length": max_length})
    return tokenized_data

@step
def finetune(base_model_id: str, tokenized_dataset: Dataset, num_train_epochs: int, per_device_train_batch_size: int) -> None:
    torch.cuda.empty_cache()
    model = AutoModelForCausalLM.from_pretrained(
        base_model_id,
        device_map="auto",
        torch_dtype=torch.float32,  # Changed from float16 to float32
        low_cpu_mem_usage=True
    )
    tokenizer = AutoTokenizer.from_pretrained(base_model_id)
    tokenizer.pad_token = tokenizer.eos_token
    model.config.pad_token_id = tokenizer.pad_token_id

    training_args = TrainingArguments(
        output_dir="./results",
        num_train_epochs=num_train_epochs,
        per_device_train_batch_size=per_device_train_batch_size,
        gradient_accumulation_steps=8,
        logging_steps=10,
        save_strategy="epoch",
        learning_rate=2e-5,
        weight_decay=0.01,
        optim="adamw_torch",
    )

    trainer = Trainer(
        model=model,
        args=training_args,
        train_dataset=tokenized_dataset,
        data_collator=DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm=False),
    )

    train_result = trainer.train()
    log_model_metadata(metadata={"metrics": {"train_loss": train_result.metrics.get("train_loss")}})
    trainer.save_model("finetuned_model")

@pipeline
def llm_finetune_pipeline(base_model_id: str):
    tokenized_dataset = prepare_data(base_model_id)
    finetune(base_model_id, tokenized_dataset)

if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument('--config', type=str, required=True, help='Path to the YAML config file')
    args = parser.parse_args()
    llm_finetune_pipeline.with_options(config_path=args.config)()

<aside>You can see all the code from this blog in this GitHub repository.</aside>

This script encapsulates the core components of LLM fine-tuning:

  1. Data Preparation: The prepare_data step loads and tokenizes the dataset, preparing it for training.
  2. Fine-tuning: The finetune step sets up the model, training arguments, and executes the fine-tuning process.
  3. Pipeline Definition: The llm_finetune_pipeline ties these steps together into a cohesive workflow.

While this script provides a solid foundation, in practice, you'll likely want to add more sophisticated error handling, logging, and potentially additional steps for evaluation and model deployment.

Leveraging PyTorch Lightning Checkpoints

ZenML's flexibility extends to integrating with popular deep learning frameworks like PyTorch Lightning. One powerful feature is the ability to link externally produced data as ZenML artifacts. This is particularly useful for managing model checkpoints produced during training.

Here's how we can modify our existing fine-tuning step to incorporate this feature:

from zenml import step, link_folder_as_artifact
from transformers import AutoModelForCausalLM, AutoTokenizer, Trainer, TrainingArguments

@step
def finetune(base_model_id: str, tokenized_dataset: Dataset, num_train_epochs: int, per_device_train_batch_size: int) -> None:
    # ... (existing model and tokenizer setup)

    training_args = TrainingArguments(
        output_dir="./results",
        num_train_epochs=num_train_epochs,
        per_device_train_batch_size=per_device_train_batch_size,
        # ... (other training arguments)
    )

    trainer = Trainer(
        model=model,
        args=training_args,
        train_dataset=tokenized_dataset,
        data_collator=DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm=False),
    )

    try:
        train_result = trainer.train()
        trainer.save_model("finetuned_model")
    finally:
        # Link the saved model and checkpoints as ZenML artifacts
        link_folder_as_artifact(
            folder_uri="./results",
            name="llm_fine_tuning_artifacts"
        )

    log_model_metadata(metadata={"metrics": {"train_loss": train_result.metrics.get("train_loss")}})

This approach allows you to leverage the Hugging Face Trainer's checkpoint saving capabilities while seamlessly integrating with ZenML's artifact management system. By using link_folder_as_artifact, you can treat the saved model and checkpoints as ZenML artifacts, making them easily accessible for future use, versioning, and tracking.

Streamlining Parameter Management with YAML

Rather than cluttering our command line with numerous parameters, we can leverage a YAML configuration file to manage our fine-tuning settings. Here's an example small_model.yaml:

model:
  name: llm-finetuning-distilgpt2-small
  description: "Fine-tune DistilGPT-2 on smaller computer."
  tags:
    - llm
    - finetuning
    - distilgpt2

parameters:
  base_model_id: distilgpt2

steps:
  prepare_data:
    parameters:
      dataset_name: squad
      dataset_size: 100
      max_length: 128

  finetune:
    parameters:
      num_train_epochs: 1
      per_device_train_batch_size: 4

This YAML file allows us to easily adjust parameters like the base model, dataset size, and training epochs without modifying our core script. To run the fine-tuning process with these parameters, we simply execute:

python run.py --config configs/config_small_cpu.yaml

Visualizing Results with ZenML

One of the key advantages of using ZenML is the ability to track and visualize your experiments. After running your fine-tuning pipeline, you can view the results in the ZenML dashboard:

A screenshot of the ZenML directed acyclic graph visualizer, showcasing a LLM finetuning pipeline run

This dashboard provides an overview of your pipeline runs, allowing you to compare different experiments and track your progress over time.

For ZenML Pro users, the Model section offers even more detailed metrics:

A screenshot of the ZenML model control plane, with some metrics and metadata tracked for a finetuned LLM model.

These visualizations can be invaluable for understanding the performance of your fine-tuned models and making data-driven decisions about further improvements.

Leveraging Lightning AI Studios + ZenML

To take full advantage of Lightning AI Studios' scalable compute resources, we can configure our ZenML stack to use Lightning as our orchestrator. Here's how to set it up:

  1. First, register a Lightning orchestrator:
zenml integration install lightning s3
zenml orchestrator register lightning_orchestrator -f lightning \\
    --machine_type=CPU \\
    --user_id=<YOUR_LIGHTNING_USER_ID> \\
    --api_key=<YOUR_LIGHTNING_USER_KEY> \\
    --username=<YOUR_LIGHTNING_USERNAME> \\ # or --organization
    --teamspace=<YOUR_LIGHTNING_TEAMSPACE>
  1. Next, set up a remote artifact store (in this case, using AWS S3).
zenml artifact-store register s3_store -f s3 --path=s3://yourpath
  1. Finally, create and set your ZenML stack:
zenml stack register lightning_stack -o lightning_orchestrator -a s3_store
zenml stack set lightning_stack

With this configuration, you can now leverage Lightning AI Studios' resources for your fine-tuning tasks. To make the most of this setup, you can create a more detailed YAML configuration file that specifies different compute resources for different steps of your pipeline:

settings:
  docker:
    requirements: requirements.txt
    python_package_installer: uv
    apt_packages:
      - git
    environment:
      PJRT_DEVICE: CUDA
      USE_TORCH_XLA: "false"
      MKL_SERVICE_FORCE_INTEL: "1"
      PYTORCH_CUDA_ALLOC_CONF: "expandable_segments"

model:
  name: llm-finetuning-gpt2-large
  description: "Fine-tune GPT-2 on larger GPU."
  tags:
    - llm
    - finetuning
    - gpt2-large

parameters:
  base_model_id: gpt2-large

steps:
  prepare_data:
    parameters:
      dataset_name: squad
      dataset_size: 1000
      max_length: 512

  finetune:
    parameters:
      num_train_epochs: 3
      per_device_train_batch_size: 8

    settings:
        orchestrator.lightning:
          machine_type: A10G

This configuration allows you to use CPU instances for data preparation and potentially evaluation steps, while leveraging powerful GPU instances (in this case, an A10G) for the compute-intensive fine-tuning step.

We can run it with:

python run.py --config configs/config_large_gpu.yaml

This will set off a run in the ZenML Dashboard using Lightning AI Studio as the orchestrator.

You will now see a new studio spin up on your lightning studio account, which will execute the pipeline and then exit when the task finishes.

Why This Matters: The Future of AI is Task-Specific

While general-purpose models like GPT-4 and Claude Opus are undoubtedly impressive, they often represent overkill for many specific tasks. These models come with significant computational and financial costs, making them impractical for many organizations to deploy at scale.

The future of AI lies not just in these massive, general-purpose models, but in the ability to rapidly create and deploy task-specific models that are more efficient and cost-effective. As highlighted in a recent TechCrunch article, ZenML is betting on a future where companies build their own AI stacks using smaller, more efficient models tailored to their specific needs.

This approach offers several key advantages:

  1. Cost-Effectiveness: Smaller, task-specific models require less computational resources to run, reducing operational costs.
  2. Improved Performance: Models fine-tuned on domain-specific data often outperform general-purpose models on specialized tasks.
  3. Faster Iteration: Smaller models allow for quicker experimentation and iteration cycles, speeding up development.
  4. Data Privacy: By fine-tuning your own models, you maintain control over your training data, which is crucial for many industries with strict privacy requirements.

The combination of Lightning Studios and ZenML provides a powerful toolkit for automating LLM fine-tuning pipelines, positioning you to ride this wave of task-specific AI. This approach enables teams to:

  1. Rapidly prototype and experiment with different fine-tuning strategies
  2. Efficiently allocate computational resources across the pipeline
  3. Maintain reproducibility and scalability in ML workflows
  4. Easily manage and deploy multiple fine-tuned model variants

As we move towards more specialized AI applications, the ability to quickly fine-tune and deploy task-specific models becomes increasingly valuable. Whether you're building a specialized chatbot, a domain-specific text analyzer, or exploring novel AI applications, this automated pipeline approach provides the flexibility and efficiency needed to stay competitive in the rapidly evolving world of AI.

The future of AI isn't just about having the biggest model - it's about having the right model for the job. With Lightning AI Studios and ZenML, you can build and deploy those models faster and more efficiently than ever.

Looking to Get Ahead in MLOps & LLMOps?

Subscribe to the ZenML newsletter and receive regular product updates, tutorials, examples, and more articles like this one.
We care about your data in our privacy policy.