Uber's Developer Platform team explored three major initiatives using LLMs in production: a custom IDE coding assistant (which was later abandoned in favor of GitHub Copilot), an AI-powered test generation system called Auto Cover, and an automated Java-to-Kotlin code migration system. The team combined deterministic approaches with LLMs to achieve significant developer productivity gains while maintaining code quality and safety. They found that while pure LLM approaches could be risky, hybrid approaches combining traditional software engineering practices with AI showed promising results.
This case study from Uber details their journey in implementing LLMs across their developer platform, highlighting three major initiatives and the lessons learned from each. The context is particularly interesting as Uber manages over 100 million lines of code across six different monorepos, presenting unique challenges for LLM integration at scale.
The first initiative involved building their own in-house IDE coding assistant to compete with GitHub Copilot. The team hypothesized that fine-tuning models on their massive codebase would provide better results than generic models. They invested six months of engineering effort across multiple teams, working on IDE integrations, model hosting, fine-tuning, and service infrastructure. However, despite achieving some technical success, they ultimately decided to abandon the project in favor of GitHub Copilot for several reasons:
* The maintenance burden was too high compared to vendor solutions
* They couldn't compete with the rapid pace of improvement in commercial products
* The user experience requirements were more complex than initially anticipated
* There were challenges with UI surface area cannibalization
This experience led them to adopt what they call the "ecosystem principle" - building on top of existing platforms rather than competing with them. They redirected their efforts to enhance GitHub Copilot integration through internal support chatbots and developer education programs.
The second initiative, Auto Cover, focuses on automated test generation. The system uses an agentic design pattern combining deterministic steps with LLM reasoning:
* Preparation of mocks and test files (deterministic)
* Test generation (LLM-driven)
* Build and test execution (deterministic)
* Failure analysis and fixes (LLM-driven)
To address concerns about test quality, they implemented additional validation steps including:
* Checking generated tests against function documentation
* Identifying and removing redundant tests
* Refactoring into table-driven tests
* Using mutation testing to verify test effectiveness
The third initiative tackles the challenge of migrating Java code to Kotlin in their Android codebase. Rather than using pure LLM-based conversion (which could introduce risks through hallucinations), they developed a hybrid approach:
* Using existing migration data from previous manual conversions as training examples
* Combining deterministic AST-based transformations with LLM-generated rules
* Implementing a feedback loop where LLMs help write AST transformation rules
* Maintaining human review in the process while automating the mechanical aspects
The team's approach to measuring success is noteworthy. Rather than focusing solely on quantitative metrics, they use a combination of qualitative feedback and a unified "developer time saved" metric across different initiatives. This helps them compare impact across diverse projects while avoiding the pitfalls of metric-driven development.
Key LLMOps lessons from their experience include:
* MVPs are deceptively easy in the LLM space - productionization is the real challenge
* Hybrid approaches combining deterministic systems with LLMs often work better than pure LLM solutions
* User experience and workflow integration are crucial for adoption
* Building on top of existing ecosystems can be more effective than building competing solutions
* The importance of maintaining human oversight while automating mechanical aspects
* The need for careful rollout strategies, especially in critical systems
The team has organized their LLMOps efforts through a horizontal AI Developer Experience team that brings together experts from different parts of the organization. They use LangChain and LangGraph for building agentic systems, with custom abstractions for deployment in their infrastructure.
Their experiences highlight the evolution of LLMOps practices in a large organization, showing how initial experiments and failures led to more sophisticated and practical approaches. The case study demonstrates the importance of balancing automation with safety, and how combining traditional software engineering practices with LLM capabilities can lead to more robust solutions.
Start your new ML Project today with ZenML Pro
Join 1,000s of members already deploying models with ZenML.