Lyft: Evolution of ML Platform to Support GenAI Infrastructure

LLMOps Database

Tech

Lyft

Company

Lyft

Title

Evolution of ML Platform to Support GenAI Infrastructure

Industry

Tech

Link

https://www.youtube.com/watch?v=KMrMnll64og&list=PLlcxuf1qTrwBpYSlNtp1snVfFCiCgRJMf&index=3

Year

2024

Summary (short)

Lyft's journey of evolving their ML platform to support GenAI infrastructure, focusing on how they adapted their existing ML serving infrastructure to handle LLMs and built new components for AI operations. The company transitioned from self-hosted models to vendor APIs, implemented comprehensive evaluation frameworks, and developed an AI assistants interface, while maintaining their established ML lifecycle principles. This evolution enabled various use cases including customer support automation and internal productivity tools.

This case study examines how Lyft evolved their machine learning platform to support generative AI infrastructure, providing valuable insights into the challenges and solutions of implementing LLMs in a production environment at scale.

Company Background and Initial ML Infrastructure

Lyft has maintained a substantial ML presence with over 50 models across more than 100 GitHub repositories, serving over 1000 unique models company-wide. Their approach differs from other companies of similar size by favoring breadth over depth, with numerous teams each managing their own models. These models support various functions including location suggestions, ETAs, pricing, routing, and fraud detection, with some handling over 10,000 requests per second.

Evolution of ML Platform

The company's ML platform was initially designed around a comprehensive lifecycle approach, supporting models from ideation through deployment and ongoing maintenance. Their journey to supporting GenAI began with their existing ML infrastructure, which had already evolved to handle various model types and frameworks.

Key Infrastructure Evolution Stages:

Started with simple model serving for regression models
Introduced a common Lyft ML model interface to standardize deployment
Implemented pre- and post-processing capabilities to increase flexibility
Adapted the platform to support LLMs through both self-hosted models and vendor APIs

GenAI Infrastructure Components

LLM Proxy and Client Architecture

Lyft built a sophisticated proxy system that routes all LLM traffic through their existing ML serving infrastructure. They wrapped vendor client libraries (like OpenAI's) to maintain the same interface while controlling the transport layer. This approach provided several benefits:

Eliminated the need for individual API keys
Enhanced observability and tracking of LLM usage
Centralized security and infrastructure management
Standardized deployment and monitoring

Evaluation Framework

To address the growing usage of LLMs, Lyft developed a comprehensive evaluation framework with three main categories:

Online input evaluation (e.g., PII filtering)
Online/offline output evaluation
Offline input-output pair analysis

The framework allows for both automated and LLM-based evaluations of responses, including checks for harmful content and response completeness. This modular approach enables teams to implement specific guardrails and quality metrics for their use cases.

AI Assistants Interface

Lyft is developing a higher-level interface for AI applications that wraps core LLM functionality and includes:

Knowledge base integration
Tool configurations
Prompt augmentation capabilities
Integration with existing evaluation and proxy systems

Production Use Cases

Lyft has implemented several production use cases leveraging their GenAI infrastructure:

Customer Support

Their flagship implementation uses a RAG-based approach for initial customer support responses, combining LLMs with knowledge bases to:

Provide faster initial responses
Better context for human agents
Improved response quality through document search

Internal Tools

Slack-based AI bot for company data search
Incident report generation using few-shot learning
Performance review assistance
Translation services
Fraud detection and prevention

Technical Challenges and Solutions

The team faced several challenges in adapting their ML platform for GenAI:

Request Complexity

LLM requests are significantly more complex than traditional ML model inputs. Lyft addressed this by:

Building flexible client libraries
Implementing comprehensive request/response logging
Developing specialized monitoring for LLM interactions

Security and Privacy

They implemented several security measures:

Centralized API key management
PII filtering before vendor interactions
Custom evaluation pipelines for content safety

Integration with Existing Systems

The platform maintains compatibility with existing ML infrastructure while adding GenAI-specific components:

Features/tabular data concepts translated to knowledge bases
Traditional monitoring adapted for LLM evaluation
Existing deployment pipelines modified for API-based models

Results and Impact

The platform has enabled rapid adoption of GenAI across Lyft, with hundreds of internal users and multiple production applications. The standardized infrastructure has reduced implementation time for new use cases while maintaining security and quality standards.

Future Directions

Lyft continues to evolve their platform with plans to:

Expand the AI assistants interface
Develop more user-facing products
Enhance evaluation capabilities
Build additional knowledge base integrations

The case study demonstrates how a mature ML platform can be effectively adapted for GenAI while maintaining operational excellence and enabling rapid innovation across the organization.

Start your new ML Project today with ZenML Pro

Join 1,000s of members already deploying models with ZenML.

Learn more

Try Free