Replit tackled the challenge of automating code repair in their IDE by developing a specialized 7B parameter LLM that integrates directly with their Language Server Protocol (LSP) diagnostics. They created a production-ready system that can automatically fix Python code errors by processing real-time IDE events, operational transformations, and project snapshots. Using DeepSeek-Coder-Instruct-v1.5 as their base model, they implemented a comprehensive data pipeline with serverless verification, structured input/output formats, and GPU-accelerated inference. The system achieved competitive results against much larger models like GPT-4 and Claude-3, with their finetuned 7B model matching or exceeding the performance of these larger models on both academic benchmarks and real-world error fixes. The production system features low-latency inference, load balancing, and real-time code application, demonstrating successful deployment of an LLM system in a high-stakes development environment where speed and accuracy are crucial.
# LLMOps Case Study Notes: Replit Code Repair System
## Overview
- Replit is developing AI-native development tools by integrating LLMs directly into their IDE
- Primary use case: Automated code repair using LSP (Language Server Protocol) diagnostics
- Goal: Create an LLM that can understand IDE events and provide contextually appropriate fixes
- Target: Fix Python code errors identified by LSP that don't have deterministic solutions
## System Architecture & Data Pipeline
### Data Sources
- User events from IDE sessions
- Operational Transformations (OTs) representing code changes
- LSP diagnostic messages
- Project snapshots for verification
### Pipeline Components
- Serverless verification of reconstructed states
- Training data store for processed examples
- GPU cluster for model training
- Deployment infrastructure with load balancing
### Data Processing
- Reconstruction of filesystem state at time of errors
- Filtering of deterministic cases (where LSP provides fixes)
- Removal of stylistic rules
- Exclusion of private/non-Python projects
- Verification against GCS stored copies
- Integration with Ruff and Pyright for diagnostic validation
## Model Development
### Base Model Selection
- Chose DeepSeek-Coder-Instruct-v1.5 (7B parameters)
- Selected based on balance of:
- Used Flash Attention v2 Triton kernel optimization
### Training Infrastructure
- Platform: MosaicML
- Hardware: Single node with 8 H100s
- Framework: LLM Foundry (v0.5.0) with Composer
- Distribution: FSDP with Full Shard strategy
- Activation checkpointing enabled
### Training Configuration
- Optimizer: Decoupled AdamW
- Learning rate: 1e-5 with Cosine Annealing
- Warmup: 100 batches
- Training duration: 4 epochs
- Batch size: 16
- Mixed precision: BF16
- Gradient clipping threshold: 1.0
- Packing ratio: 6.0 for sequence binning
## Production Considerations
### Input/Output Format
- Schema using angle-bracketed sentinel tokens
- Structured format for IDE integration
- Consistent template for parsing/generation
- Support for future IDE event types
### Performance Optimizations
- Line numbering for unambiguous fixes
- Maintained code formatting close to training distribution
- Efficient tokenizer mapping (3-5 tokens per sentinel)
- Flexible output space for various edit types
### Production Pipeline
- Integration with Replit workspace
- Load balancer for GPU inference
- Real-time code application
- Model serving infrastructure
## Evaluation Framework
### Metrics
- Functional correctness (for Leetcode problems)
- AST matching
- String representation matching
- Pass@1 performance
### Evaluation Sets
- Leetcode repair benchmark (360 samples)
- Replit repair benchmark (389 samples)
- Zero-shot and few-shot testing
- Cross-language transfer evaluation
## Scaling & Future Work
### Current Scaling Results
- Data scaling shows consistent improvement
- Parameter scaling demonstrates benefits up to 33B
- Competitive with larger models (GPT-4, Claude-3)
### Future Development Plans
- Expand to cross-file edits
- Improve multi-line edit performance
- Support additional programming languages
- Implement DPO based on user feedback
- Scale training dataset
- Enhance evaluation coverage
## Production Integration Notes
### Deployment Strategy
- Client-side integration with Replit IDE
- Workspace LSP diagnostic handling
- Real-time diff application
- Model versioning and updates
### Monitoring & Feedback
- User acceptance tracking
- Error rate monitoring
- Performance metrics collection
- Feedback loop for model improvements
### System Requirements
- Low latency requirements
- High throughput capability
- Reliable error handling
- Scalable infrastructure
## Lessons Learned
Start your new ML Project today with ZenML Pro
Join 1,000s of members already deploying models with ZenML.