Grab faced challenges with data discovery across their 200,000+ tables in their data lake. They developed HubbleIQ, an LLM-powered chatbot integrated with their data discovery platform, to improve search capabilities and automate documentation generation. The solution included enhancing Elasticsearch, implementing GPT-4 for automated documentation generation, and creating a Slack-integrated chatbot. This resulted in documentation coverage increasing from 20% to 90% for frequently queried tables, with 73% of users reporting improved data discovery experience.
# LLM Implementation for Data Discovery at Grab
## Company Overview and Problem Statement
Grab, a leading superapp platform in Southeast Asia, faced significant challenges in data discovery across their massive data infrastructure containing over 200,000 tables. Their existing data discovery tool, Hubble, built on Datahub, was struggling with true data discovery capabilities and had poor documentation coverage (only 20% for frequently accessed tables).
## Technical Implementation Details
### Initial System Assessment
- Identified four categories of search queries:
- Found existing Elasticsearch implementation had limitations with only 82% click-through rate
- Users were heavily reliant on tribal knowledge through Slack channels
### Enhancement Strategy
### Phase 1: Elasticsearch Optimization
- Implemented multiple optimization techniques:
- Results showed improved click-through rate to 94%
### Phase 2: Documentation Generation System
- Built documentation generation engine using GPT-4
- Key features implemented:
- Achieved 90% documentation coverage for P80 tables
- 95% of users found generated documentation useful
### Phase 3: HubbleIQ Development
- Leveraged Glean enterprise search tool for faster deployment
- Implementation components:
### LLMOps Specific Implementations
### Prompt Engineering
- Developed and refined prompts for documentation generation
- Implemented system prompts for HubbleIQ chatbot
- Created context-aware prompt systems for semantic search
### Quality Assurance
- Implemented AI-generated content tagging
- Built review workflow for data producers
- Monitored documentation quality through user feedback
### Integration Architecture
- Built seamless integration between:
### Results and Impact
- Documentation coverage improved from 20% to 90%
- User satisfaction with data discovery increased by 17 percentage points
- Achieved record high monthly active users
- Reduced data discovery time from days to seconds
- 73% of users reported improved ease in dataset discovery
## Future LLMOps Initiatives
### Documentation Enhancement
- Planned improvements:
### HubbleIQ Evolution
- Upcoming features:
### Quality Control
- Development of evaluator model for documentation quality
- Implementation of Reflexion workflow
## Technical Architecture Considerations
### System Integration
- Seamless integration between multiple components:
### Scalability
- Built to handle:
### Security and Governance
- Implementation of:
## Lessons Learned and Best Practices
### Implementation Strategy
- Phased approach to deployment
- Focus on user feedback and iteration
- Balance between automation and human oversight
- Importance of documentation quality
### Technical Considerations
- LLM prompt engineering significance
- Integration complexity management
- Performance optimization requirements
- Quality control automation
### User Adoption
- Importance of user interface simplicity
- Value of integrated workflow
- Need for continuous improvement based on usage patterns
- Significance of response accuracy
Start your new ML Project today with ZenML Pro
Join 1,000s of members already deploying models with ZenML.