Software Engineering

Building and Optimizing RAG Pipelines: Data Preprocessing, Embeddings, and Evaluation with ZenML

ZenML Team
Jun 14, 2024
2 mins

In the latest ZenML webinar, we dove deep into the world of Retrieval-Augmented Generation (RAG) pipelines and how ZenML can streamline your RAG workflows. In this hands-on workshop, Alex guides us through the essential components of building and optimizing RAG pipelines.​

We cover:​

  • The fundamentals of RAG, discussing why it exists and the problems it solves in the realm of natural language processing.​
  • The process of ingesting and preprocessing data for your RAG pipeline, focusing on best practices and techniques to ensure optimal performance.​
  • The critical role of embeddings in a RAG retrieval workflow, including how to generate and store these embeddings in a vector database for efficient retrieval of relevant information.
  • How ZenML simplifies the tracking and management of RAG-associated artifacts, ensuring reproducibility and facilitating collaboration.​
  • Strategies for assessing the performance of your RAG pipelines and measuring the impact of any modifications you make, along with insights on how to approach RAG evaluation and interpret the results effectively.​
  • The use of rerankers to enhance the overall retrieval process in your RAG pipeline, with practical examples and guidance on implementing rerankers to improve the relevance and quality of the retrieved information.​

This workshop is designed to cater to both newcomers to RAG and those with some experience who want to leverage ZenML to streamline their workflows. We'll adopt a tool-agnostic approach, using plain Python wherever possible to ensure accessibility and flexibility.​

Please note that this workshop is the first part of a two-part series. The second part, which will be hosted at a later date, will focus on fine-tuning embeddings and language models specifically for RAG pipelines.​

Speaker: Alex Strick van Linschoten is an ML Engineer at ZenML. Based in Delft, Alex had a previous career as a historian and linguist before retraining in a technical domain. He's interested in the ways small fine-tuned language models can outperform the proprietary options as well as the uses of LLMs (and generative AI in general) for education.

Don't miss out on other webinars in the future. Sign up to the ZenML newsletter to keep up-to-date.

Looking to Get Ahead in MLOps & LLMOps?

Subscribe to the ZenML newsletter and receive regular product updates, tutorials, examples, and more articles like this one.
We care about your data in our privacy policy.