Three practitioners share their experiences deploying LLM agents in production: Sam discusses building a personal assistant with real-time user feedback and router agents, Div presents a browser automation assistant called Milton that can control web applications, and Devin explores using LLMs to help engineers with non-coding tasks by navigating codebases. Each case study highlights different approaches to routing between agents, handling latency, testing strategies, and model selection for production deployment.
# Production Agent Systems: Three Case Studies
This case study examines three different approaches to deploying LLM agents in production, presented by practitioners Sam, Div, and Devin. Each tackles different challenges in making agents reliable and useful in real-world applications.
# Sam's Personal Assistant Case Study
## System Architecture
- Built a personal assistant with a conversational interface
- Uses APIs as primary tools rather than code generation
- Implements REACT-style format (Thought, Action, Action Input, Observation)
- Supports multi-action inputs/outputs
## Key Innovation: Real-time User Feedback
- Implemented websocket-based feedback system allowing users to course-correct agents during execution
- User feedback stored in Redis and checked before each planning stage
- Feedback incorporated into prompt context to influence agent decisions
- Helps prevent agents from going down unproductive paths
## Router Agent Implementation
- Created template-based routing system to handle different product flows
- Templates contain specific instructions and constrained tool sets for common tasks
- Example templates:
- Router implemented as conversational agent with templates as tools
- Includes fallback dynamic agent for unexpected queries
# Div's Browser Automation Case Study
## Milton Browser Assistant
- General-purpose browser automation system
- Can control any website through DOM manipulation
- Supports mobile and chat interface integrations
- Implements multimodal capabilities (OCR for icons/images)
## Technical Implementation
- Custom DOM parser compresses HTML representation
- Achieves 90% website coverage with <2K tokens
- Combines text-based parsing with OCR for comprehensive interface understanding
- Authorization system to control which sites agents can access
## Performance Optimization
- Implements streaming for better user experience
- Uses caching to reduce LLM calls
- Combines GPT-4 and GPT-3.5 strategically
- Explores custom smaller models for faster response times
# Devin's Code Understanding Agent
## Use Case
- Helps engineers with non-coding tasks requiring code understanding
- Focus areas: customer support, documentation, triage
- Specifically targets closed-end feature requests
## Architecture
- Background indexing system for repository monitoring
- Webhook-based event processing
- Cloud function implementation for agent execution
- Implements "checkpointing" system for step-by-step progress
## Search and Navigation
- Converts natural language requests to repository queries
- Uses synthetic data generation for better search coverage
- Builds knowledge graphs relating repository components
- Generates hypothetical questions to improve searchability
# Common Themes and Best Practices
## Model Selection
- GPT-4 preferred for complex planning and reasoning
- GPT-3.5 suitable for simpler, well-defined tasks
- Strategic use of both models to balance cost and performance
- Exploration of fine-tuning and smaller models for optimization
## Testing Strategies
- Combination of manual and automated testing
- Integration tests with mocked API responses
- Focus on testing side effects rather than exact outputs
- Challenge of cost-effective testing with expensive models
## Performance Optimization
- Streaming implementations for better UX
- Caching strategies to reduce latency
- Parallel execution of actions where possible
- Background processing for non-time-critical tasks
## Routing Architectures
- Language model-based routing decisions
- Template-based approaches for common workflows
- Importance of clear instruction sets
- Balance between flexibility and reliability
# Key Learnings
- Production agents require careful consideration of:
- Success depends on:
- Common challenges include:
Start your new ML Project today with ZenML Pro
Join 1,000s of members already deploying models with ZenML.