Cursor, an AI-assisted coding platform, scaled their infrastructure from handling basic code completion to processing 100 million model calls per day across a global deployment. They faced and overcame significant challenges in database management, model inference scaling, and indexing systems. The case study details their journey through major incidents, including a database crisis that led to a complete infrastructure refactor, and their innovative solutions for handling high-scale AI model inference across multiple providers while maintaining service reliability.
This case study provides an in-depth look at Cursor's journey in scaling their AI-assisted coding platform, offering valuable insights into the challenges and solutions in deploying LLMs at scale. The discussion is led by Cursor's CTO and co-founder, providing detailed technical insights into their infrastructure and operations.
**Infrastructure Scale and Components**
Cursor's infrastructure has grown dramatically, scaling by a factor of 100 or more in the last year. Their system processes approximately 100 million model calls daily just for their custom models, separate from their substantial frontier model (third-party) traffic. Their infrastructure consists of three core components:
* Indexing Systems: Multiple systems processing billions of documents daily, with a cumulative processing of hundreds of billions of documents over the company's lifetime. These systems are crucial for understanding and analyzing user repositories.
* Model Infrastructure: The autocomplete system runs on every keystroke, handling around 20,000 model calls per second across a fleet of approximately 2,000 H100 GPUs. This infrastructure is globally distributed across locations including the US East Coast, West Coast, London, and Tokyo to ensure low latency for users worldwide.
* Product Infrastructure: This includes various optimizations and techniques to enhance user experience, including specialized apply models and streaming infrastructure for continuous service improvement.
**Challenges and Solutions in Model Inference**
The company faces several critical challenges in managing model inference at scale:
* Cold Start Problems: When scaling up after outages, they must carefully manage the cold start problem where a small number of recovered nodes can get overwhelmed by incoming requests before the full fleet can recover.
* Provider Relations: Cursor works with multiple model providers and has become one of the largest consumers of tokens globally. This requires careful negotiation of rate limits and managing relationships with multiple providers to ensure service reliability.
* Load Balancing: They've implemented sophisticated systems to balance workloads across different providers and regions, with careful monitoring of token usage and rate limits.
**Critical Incident Management**
The case study details two major incidents that highlight the challenges of operating at scale:
1. Indexing System Crisis (September 2023):
* Initial implementation used Merkle trees for efficient file change detection
* Encountered issues with YugaByte DB, leading to a migration to PostgreSQL
* Faced challenges with cache misses and race conditions
* Solution involved rewriting significant portions of the system while maintaining service
2. Database Crisis:
* RDS instance reached 22TB out of 64TB limit
* PostgreSQL's VACUUM process became problematic with their update-heavy workload
* Required complete system refactor during the incident
* Successfully migrated to object storage-based solution
**Infrastructure Evolution and Learning**
Key learnings from their experience include:
* Database Selection: They learned to prefer simpler, proven solutions (like RDS) over more complex distributed databases initially
* Storage Architecture: Moving towards object storage-based solutions for better scalability and reliability
* Infrastructure Distribution: Implementing global distribution for lower latency while managing the complexity of distributed systems
* Security Considerations: Implementing encryption for stored vectors and careful management of proprietary code
**Rate Limit Management**
Cursor has developed sophisticated systems for managing rate limits across different providers:
* Multiple provider relationships to ensure redundancy
* Active negotiation with providers for increased limits
* Load balancing across providers to optimize usage
* Custom infrastructure to handle rate limiting and token management
**Security and Privacy**
The platform implements several security measures:
* Encryption of embedded code vectors
* Client-side key management
* Secure storage and processing of proprietary code
**Future Directions**
The case study suggests several emerging trends:
* Moving towards object storage-based solutions
* Increasing use of AI for code review and bug detection
* Continued evolution of infrastructure to handle growing scale
This case study provides valuable insights into the real-world challenges of scaling AI-assisted development tools and the practical solutions developed to address them. It highlights the importance of flexible architecture, robust incident response procedures, and the balance between rapid scaling and maintaining service reliability.
Start your new ML Project today with ZenML Pro
Join 1,000s of members already deploying models with ZenML.