AngelList transformed their investment document processing from manual classification to an automated system using LLMs. They initially used AWS Comprehend for news article classification but transitioned to OpenAI's models, which proved more accurate and cost-effective. They built Relay, a product that automatically extracts and organizes investment terms and company updates from documents, achieving 99% accuracy in term extraction while significantly reducing operational costs compared to manual processing.
# AngelList's Journey with LLMs in Investment Document Processing
## Company Background and Initial ML Implementation
AngelList started their machine learning journey with basic use cases like news article classification. Initially, they had no dedicated machine learning team or infrastructure. Their first ML engineer implemented a classification system using AWS Comprehend, which took about two months to deploy. This system was used to route news articles to investor dashboards for companies they had invested in.
## Transition to LLM-Based Solutions
### Limitations of Traditional ML Approach
- AWS Comprehend had scaling limitations
- Need for deeper document analysis capabilities
### OpenAI Implementation Success
- Rewrote entire system in one day using OpenAI models
- Achieved better results with simpler implementation
- Gained additional capabilities without extra development
- Cost benefits through pay-per-request model
- Automatic improvement with model updates (GPT-3 to 3.5 to 4)
## Relay Product Development
### Product Features
- Automatic processing of investment documents
- Extraction of key terms and investment details
- Organization of company updates
- Dashboard creation for investment tracking
- Email integration for direct updates
### Technical Architecture
- Document infrastructure for secure storage and analysis
- LangChain for prompt orchestration
- Cascading prompt system
- Integration with both Azure OpenAI and OpenAI direct APIs
### Quality Assurance and Testing
- 99% accuracy in term extraction
- Verification system using document source text
- Extensive testing against historical data
- Human-in-the-loop validation process
- Plans for automated regression testing
## Infrastructure and Scaling Challenges
### API Management
- Dealing with OpenAI rate limits
- Access to GPT-4 and GPT-4 32k
- Load balancing between Azure OpenAI and OpenAI direct
- Implementation of retry mechanisms for API downtime
### Azure OpenAI Benefits
- More flexible scaling options
- Familiar cloud environment
- Better usage tracking
- More stable and structured approach
## Development Philosophy and Practices
### Prompt Engineering Approach
- Domain experts (lawyers, operations team) involved in prompt creation
- Iterative prompt improvement process
- Focus on natural language accessibility
- Balance between automation and human oversight
### Strategic Decisions
- Prioritization of breadth over optimization
- Focus on proven use cases before exploring new models
- Cost-effectiveness compared to manual processing
- Strategic planning for potential in-house model development
## Future Directions
### Planned Improvements
- Development of automated regression testing
- Exploration of custom model training
- Potential implementation of customer-specific models
- Investigation of on-premise model deployment
### Risk Management
- Document source verification for accuracy
- Multiple API provider strategy for redundancy
- Focus on non-critical financial document processing
- Maintenance of human oversight for critical decisions
## Technical Implementation Details
### Tools and Technologies
- LangChain for prompt management
- Azure OpenAI and OpenAI APIs
- Custom document processing infrastructure
- Email integration systems
### Process Flow
- Document intake through email or direct upload
- Initial classification and routing
- Term extraction and validation
- Dashboard integration and update
- Verification against source documents
Start your new ML Project today with ZenML Pro
Join 1,000s of members already deploying models with ZenML.