eBay: Multi-Track Approach to Developer Productivity Using LLMs

LLMOps Database

E-commerce

eBay

Company

eBay

Title

Multi-Track Approach to Developer Productivity Using LLMs

Industry

E-commerce

Link

https://innovation.ebayinc.com/tech/features/cutting-through-the-noise-three-things-weve-learned-about-generative-ai-and-developer-productivity/

Year

2024

Summary (short)

eBay implemented a three-track approach to enhance developer productivity using AI: deploying GitHub Copilot enterprise-wide, creating a custom-trained LLM called eBayCoder based on Code Llama, and developing an internal RAG-based knowledge base system. The Copilot implementation showed a 17% decrease in PR creation to merge time and 12% decrease in Lead Time for Change, while maintaining code quality. Their custom LLM helped with codebase-specific tasks and their internal knowledge base system leveraged RAG to make institutional knowledge more accessible.

meta

openai

eBay's journey into implementing LLMs for developer productivity represents a comprehensive and pragmatic approach to adopting AI technologies in a large-scale enterprise environment. The company explored three distinct but complementary tracks for improving developer productivity through AI, offering valuable insights into the real-world challenges and benefits of deploying LLMs in production. The case study is particularly noteworthy for its measured approach to evaluation and deployment, using both quantitative and qualitative metrics to assess the impact of these technologies. Instead of relying on a single solution, eBay recognized that different aspects of developer productivity could be better served by different approaches to LLM deployment. ### Track 1: GitHub Copilot Implementation The first track involved the enterprise-wide deployment of GitHub Copilot, preceded by a carefully designed A/B test experiment with 300 developers. The evaluation methodology was robust, involving: * A control group setup with similar assignments and abilities * A two-week ramp-up period * Multiple measurement metrics including code acceptance rate, accuracy, and PR metrics * Code quality monitoring through Sonar The results showed significant improvements: * 27% code acceptance rate (via Copilot telemetry) * 70% accuracy for generated documents * 60% accuracy for generated code * 17% decrease in pull request creation to merge time * 12% decrease in Lead Time for Change However, eBay was also transparent about the limitations, particularly noting Copilot's context window constraints when dealing with their massive codebase. This highlights an important consideration for large enterprises implementing similar solutions. ### Track 2: Custom LLM Development (eBayCoder) The second track demonstrates a more specialized approach to handling company-specific code requirements. eBay created eBayCoder by fine-tuning Code Llama 13B on their internal codebase and documentation. This approach addressed several limitations of commercial solutions: * Better handling of company-specific libraries and frameworks * Improved context awareness for large-scale codebases * Enhanced ability to handle software upkeep and migration tasks * Reduced code duplication through better awareness of internal services The implementation shows careful consideration of model selection (Code Llama 13B) and training strategy (post-training and fine-tuning on internal data). This represents a significant investment in MLOps infrastructure to support model training and deployment. ### Track 3: Internal Knowledge Base System The third track focused on creating an intelligent knowledge retrieval system using RAG (Retrieval Augmented Generation). This system demonstrates several sophisticated LLMOps practices: * Automated, recurring content ingestion from multiple sources (GitHub Markdowns, Google Docs, Jira, Slack, Wikis) * Vector embedding creation and storage in a vector database * Similarity-based retrieval using cosine similarity * Integration with both commercial and open-source LLMs * Implementation of RLHF (Reinforcement Learning from Human Feedback) for continuous improvement The system includes important production-ready features: * Automated content updates * User feedback collection interface * Clear fallback mechanisms when answers aren't available * Integration with multiple data sources ### MLOps and Production Considerations The case study reveals several important MLOps considerations: * Multi-model orchestration: Managing multiple LLM solutions in production * Evaluation frameworks: Using both quantitative and qualitative metrics * Feedback loops: Implementing RLHF for continuous improvement * Data pipeline automation: Regular updates to knowledge bases * Security and compliance: Handling sensitive internal documentation * Scale considerations: Dealing with massive codebases and documentation ### Monitoring and Evaluation eBay implemented comprehensive monitoring and evaluation strategies: * Developer surveys for qualitative feedback * Code quality metrics through Sonar * PR and deployment metrics * Usage tracking for internal tools * Accuracy measurements for generated content * User feedback collection and integration ### Future Considerations The case study acknowledges that they are at the beginning of an exponential curve in terms of productivity gains. They maintain a pragmatic view of the technology while recognizing its transformative potential. The implementation of RLHF and continuous improvement mechanisms suggests a long-term commitment to evolving these systems. This case study provides valuable insights into how large enterprises can systematically approach LLM deployment, balancing commercial solutions with custom development while maintaining a focus on practical productivity improvements. The multi-track approach demonstrates a sophisticated understanding of how different LLM implementations can complement each other in a production environment.

Start deploying reproducible AI workflows today

Enterprise-grade MLOps platform trusted by thousands of companies in production.

Book a Demo

Use Open Source