Newsletter Edition #4 - Learnings from Building with LLMs

Hamza Tahir

Jun 7, 2024

•

5 min

Contents

Newsletter Edition #2 - 2024: The most exciting year for MLOps yet

Newsletter Edition #7 - Notebooks in Production: The eternal MLOps debate

This is also a heading
This is a heading

Hello there! Today, we're back to LLM land (Not too far from Lalaland). Not only do we have a new LoRA + Accelerate-powered finetuning pipeline for you, we're also hosting a RAG themed webinar. Enough content to fill the entire month and satisfy your LLMOps curiosities. Let's dive in..

Learnings from Building with LLMs

Back to LLMs for this edition: Over the past year, significant insights have emerged from building with large language models (LLMs) in production, and I wanted to synthesize my thoughts here:

Understanding fundamental prompting techniques is crucial, including n-shot prompts and in-context learning, which require a sufficient number of detailed examples to guide the LLMs effectively. Chain-of-Thought (CoT) prompting has shown to reduce hallucinations by helping models outline their reasoning before generating the final response. Structuring inputs and outputs to match the model’s training data format, such as using XML for Claude and JSON for GPT, also significantly improves performance. Additionally, decomposing complex prompts into multiple, simpler parts allows for more precise and manageable workflows.

When moving from development to production, regularly reviewing LLM inputs and outputs is essential to understand data distribution and detect anomalies or drift. As elsewhere in MLOps, it’s necessary to manage the differences between development and production data to maintain model accuracy. Structured outputs, such as JSON or YAML, ensure smoother downstream integration. Despite the allure of sophisticated models, exploring smaller, less resource-intensive models can often yield comparable or better results while optimizing for latency and cost. Design involvement early in the product development process is vital to ensure intuitive user experiences, particularly when incorporating Human-in-the-Loop (HITL) systems, which facilitate continuous improvement through user feedback.

Here are five good practices to consider when building with LLMs:

Regular Data Review: Periodically check LLM inputs and outputs to maintain accuracy.
Structured Outputs: Use machine-readable formats to ease integration with downstream systems. Use tools like Instructor or Outlines to coerce outputs into these formats.
Model Pinning: Pin specific model versions to avoid unpredictable changes.
Design Involvement: Engage design teams early for better user experience.
Human-in-the-Loop Systems: Integrate feedback mechanisms to enhance data quality and model refinement.

A hands-on take on a lot of these core principles is explored in our LLMOps guide, which is nearly complete! Check it out today and reply to this email to share feedback!

Finetuning LLM Project

We have a new LLM template + project for the community! Andrei created this from scratch and finetuned Mistral in a distributed setting to make sure it works. All you need to do is follow the README instructions to finetune it on your data!

Checkout the code

ZenML Product Corner

Fresh new docs

We've heard many times that old docs indexed by search engines and hard-to-search docs make the ZenML experience slightly tougher. We've taken your feedback into account and made the docs easier to understand and search through.

Check out the new docs

+3 New Annotator Integrations

Following the computer vision webinar, we realized we missed a few integrations in that space. The 0.58.0 release brings in three new integrations for the annotator stack component: Prodigy , Argilla, and Pigeon. Check it out by updating your ZenML!

See notes

Retry configuration for Steps

The 0.58.0 release also includes new `retry` configuration for the steps. The following parameters can be set: `StepRetryConfig(max_retries=3, delay=10, backoff=2)`

See notes

Fresh from the community

Reusing pipelines across projects

New community member John W Oliver started a fantastic thread on how to share a data loading step in a pipeline between different projects. Michael replied with a clever solution using the log_artifact_metadata method.

See Slack thread

The advantage of adopting ZenML

Julien Beaulieu started an insightful discussion on how a team who has a mature software engineer setup can leverage ZenML to adopt better MLOps practices and still keep the benefits of their legacy infrastructure

See Slack thread

Register free for the LLM Webinar

An hour of a practical hands-on session about RAG pipelines

The process of ingesting and preprocessing data for your RAG pipeline
The critical role of embeddings in an RAG retrieval workflow
How ZenML simplifies the tracking and management of RAG-associated artifacts
Strategies for assessing the performance of your RAG pipelines
The use of rerankers to enhance the overall retrieval process in your RAG pipeline

Register now

---

If you have any questions or need assistance, feel free to join our Slack community.

Happy Developing!

Hamza Tahir

Start deploying reproducible AI workflows today

Enterprise-grade MLOps platform trusted by thousands of companies in production.

Book a Demo

Use Open Source

Be first to deploy unified MLOps and LLMOps

Join the waitlist for early access to one platform for all your AI workflows.