Company
Uber
Title
DragonCrawl: Uber's Journey to AI-Powered Mobile Testing Using Small Language Models
Industry
Automotive
Year
2024
Summary (short)
Uber developed DragonCrawl, an innovative AI-powered mobile testing system that uses a small language model (110M parameters) to automate app testing across multiple languages and cities. The system addressed critical challenges in mobile testing, including high maintenance costs and scalability issues across Uber's global operations. Using an MPNet-based architecture with a retriever-ranker approach, DragonCrawl achieved 99%+ stability in production, successfully operated in 85 out of 89 tested cities, and demonstrated remarkable adaptability to UI changes without requiring manual updates. The system proved particularly valuable by blocking ten high-priority bugs from reaching customers while significantly reducing developer maintenance time. Most notably, DragonCrawl exhibited human-like problem-solving behaviors, such as retrying failed operations and implementing creative solutions like app restarts to overcome temporary issues.

Notes on Uber's DragonCrawl Implementation

Company/Use Case Overview

  • Problem: Mobile testing at scale (3000+ simultaneous experiments, 50+ languages)
  • Challenge: 30-40% of engineer time spent on test maintenance
  • Solution: AI-powered testing system mimicking human behavior
  • Scale: Global deployment across Uber's mobile applications

Technical Implementation

Model Architecture

  • Base Model: MPNet (110M parameters)
  • Embedding Size: 768 dimensions
  • Evaluation Approach: Retrieval-based task
  • Performance Metrics:

Key Features

  • Language Model Integration
  • Testing Capabilities

Production Results

Performance Metrics

  • 99%+ stability in production
  • Success in 85 out of 89 cities
  • Zero maintenance requirements
  • Cross-device compatibility
  • Blocked 10 high-priority bugs

Key Achievements

  • Automated testing across different:
  • Significant reduction in maintenance costs
  • Improved bug detection

Technical Challenges & Solutions

Hallucination Management

  • Small model size (110M parameters) to limit complexity
  • Ground truth validation from emulator
  • Action verification system
  • Loop detection and prevention

Adversarial Cases

  • Detection of non-optimal paths
  • Implementation of guardrails
  • Solution steering mechanisms
  • Backtracking capabilities

System Integration

  • GPS location handling
  • Payment system integration
  • UI change adaptation
  • Error recovery mechanisms

Notable Behaviors

Adaptive Problem Solving

  • Persistent retry mechanisms
  • App restart capability
  • Creative navigation solutions
  • Goal-oriented persistence

Error Handling

  • Automatic retry on failures
  • Context-aware decision making
  • Alternative path finding
  • System state awareness

Future Directions

Planned Improvements

  • RAG applications development
  • Dragon Foundational Model (DFM)
  • Developer toolkit expansion
  • Enhanced testing capabilities

Architectural Evolution

  • Smaller dataset utilization
  • Improved embedding quality
  • Enhanced reward modeling
  • Expanded use cases

Key Learnings

Success Factors

  • Small model advantages
  • Focus on specific use cases
  • Strong guardrails
  • Production-first approach

Implementation Insights

  • Value of small, focused models
  • Importance of real-world testing
  • Benefits of goal-oriented design
  • Balance of automation and control

Business Impact

  • Reduced maintenance costs
  • Improved test coverage
  • Enhanced bug detection
  • Accelerated development cycle

Start your new ML Project today with ZenML Pro

Join 1,000s of members already deploying models with ZenML.