Lyft's journey of evolving their ML platform to support GenAI infrastructure, focusing on how they adapted their existing ML serving infrastructure to handle LLMs and built new components for AI operations. The company transitioned from self-hosted models to vendor APIs, implemented comprehensive evaluation frameworks, and developed an AI assistants interface, while maintaining their established ML lifecycle principles. This evolution enabled various use cases including customer support automation and internal productivity tools.
This case study examines how Lyft evolved their machine learning platform to support generative AI infrastructure, providing valuable insights into the challenges and solutions of implementing LLMs in a production environment at scale.
## Company Background and Initial ML Infrastructure
Lyft has maintained a substantial ML presence with over 50 models across more than 100 GitHub repositories, serving over 1000 unique models company-wide. Their approach differs from other companies of similar size by favoring breadth over depth, with numerous teams each managing their own models. These models support various functions including location suggestions, ETAs, pricing, routing, and fraud detection, with some handling over 10,000 requests per second.
## Evolution of ML Platform
The company's ML platform was initially designed around a comprehensive lifecycle approach, supporting models from ideation through deployment and ongoing maintenance. Their journey to supporting GenAI began with their existing ML infrastructure, which had already evolved to handle various model types and frameworks.
### Key Infrastructure Evolution Stages:
* Started with simple model serving for regression models
* Introduced a common Lyft ML model interface to standardize deployment
* Implemented pre- and post-processing capabilities to increase flexibility
* Adapted the platform to support LLMs through both self-hosted models and vendor APIs
## GenAI Infrastructure Components
### LLM Proxy and Client Architecture
Lyft built a sophisticated proxy system that routes all LLM traffic through their existing ML serving infrastructure. They wrapped vendor client libraries (like OpenAI's) to maintain the same interface while controlling the transport layer. This approach provided several benefits:
* Eliminated the need for individual API keys
* Enhanced observability and tracking of LLM usage
* Centralized security and infrastructure management
* Standardized deployment and monitoring
### Evaluation Framework
To address the growing usage of LLMs, Lyft developed a comprehensive evaluation framework with three main categories:
* Online input evaluation (e.g., PII filtering)
* Online/offline output evaluation
* Offline input-output pair analysis
The framework allows for both automated and LLM-based evaluations of responses, including checks for harmful content and response completeness. This modular approach enables teams to implement specific guardrails and quality metrics for their use cases.
### AI Assistants Interface
Lyft is developing a higher-level interface for AI applications that wraps core LLM functionality and includes:
* Knowledge base integration
* Tool configurations
* Prompt augmentation capabilities
* Integration with existing evaluation and proxy systems
## Production Use Cases
Lyft has implemented several production use cases leveraging their GenAI infrastructure:
### Customer Support
Their flagship implementation uses a RAG-based approach for initial customer support responses, combining LLMs with knowledge bases to:
* Provide faster initial responses
* Better context for human agents
* Improved response quality through document search
### Internal Tools
* Slack-based AI bot for company data search
* Incident report generation using few-shot learning
* Performance review assistance
* Translation services
* Fraud detection and prevention
## Technical Challenges and Solutions
The team faced several challenges in adapting their ML platform for GenAI:
### Request Complexity
LLM requests are significantly more complex than traditional ML model inputs. Lyft addressed this by:
* Building flexible client libraries
* Implementing comprehensive request/response logging
* Developing specialized monitoring for LLM interactions
### Security and Privacy
They implemented several security measures:
* Centralized API key management
* PII filtering before vendor interactions
* Custom evaluation pipelines for content safety
### Integration with Existing Systems
The platform maintains compatibility with existing ML infrastructure while adding GenAI-specific components:
* Features/tabular data concepts translated to knowledge bases
* Traditional monitoring adapted for LLM evaluation
* Existing deployment pipelines modified for API-based models
## Results and Impact
The platform has enabled rapid adoption of GenAI across Lyft, with hundreds of internal users and multiple production applications. The standardized infrastructure has reduced implementation time for new use cases while maintaining security and quality standards.
## Future Directions
Lyft continues to evolve their platform with plans to:
* Expand the AI assistants interface
* Develop more user-facing products
* Enhance evaluation capabilities
* Build additional knowledge base integrations
The case study demonstrates how a mature ML platform can be effectively adapted for GenAI while maintaining operational excellence and enabling rapid innovation across the organization.
Start your new ML Project today with ZenML Pro
Join 1,000s of members already deploying models with ZenML.