This case study explores Prosus's development and deployment of the "Token Data Analyst" agent, a system designed to democratize data access across their portfolio companies including iFood, OLX, and other technology companies.
The core problem being addressed was the bottleneck created by data analysts having to handle numerous routine data queries, preventing them from focusing on more complex analytical work. With around 30,000 employees across various tech companies, the need for quick data access was critical for decision-making, customer service, and operations.
**System Architecture and Implementation**
The Token Data Analyst agent was built with several key architectural decisions:
* Core Architecture: The system uses a standard agent framework with an LLM as the brain, specialized tools for execution, and history management for maintaining context.
* Database Integration: Instead of direct database connections, the agent uses a generalized SQL execution tool that interfaces with different database types (Snowflake, Databricks, BigQuery) through specific connectors. Each instance is configured for a particular database and SQL dialect.
* Context Management: The system uses dedicated Slack channels for different data domains (marketing, logistics, etc.), which helps manage context and access control. This channel-based approach simplifies the agent's context management by limiting it to specific database contexts.
* Query Validation: A crucial innovation was the implementation of a pre-check step that validates whether the agent has sufficient information to answer a query before attempting to generate SQL. This helps prevent hallucination and improper responses.
**Technical Challenges and Solutions**
The team encountered several significant challenges during development:
* Model Confidence Control: Early versions of the agent would attempt to answer questions even without sufficient context. This was addressed by implementing a separate "Clarity Check" step before query generation.
* Context Management: Business context and terminology required careful curation. The team found that standard data documentation, while human-readable, wasn't suitable for LLMs and required specific formatting to remove ambiguity and implicit knowledge.
* Query Complexity Ceiling: The agent tends to default to simpler solutions, creating a natural ceiling for query complexity. This limitation was embraced by positioning the agent as a complement to, rather than replacement for, human analysts.
* Model Updates: The team discovered that prompts could be "overfitted" to specific model versions, causing significant behavior changes when updating to newer models. This highlighted the need for robust testing and evaluation procedures.
* Performance and Safety: Implementation included guards against expensive queries (like SELECT *) and appropriate access controls to prevent database overload or security issues.
**Evaluation and Testing**
The evaluation process was multifaceted:
* Accuracy Testing: Due to SQL's flexibility (multiple valid ways to write the same query) and time-dependent data, traditional accuracy metrics were challenging to implement.
* Use Case Classification: The team developed a system of categorizing use cases by complexity and readiness for production, with separate development and production instances for testing.
* Analyst Integration: Close collaboration with data analysts was crucial for validation and improvement of the system.
**Impact and Results**
The implementation showed significant positive outcomes:
* 74% reduction in query response time
* Substantial increase in total data insights generated
* Successful adoption across multiple portfolio companies
* Democratized data access for non-technical users
**Lessons Learned and Best Practices**
Several key insights emerged from the project:
* Co-development with end users (data analysts) was crucial for success
* Avoiding over-engineering: The team deliberately chose not to implement RAG or embeddings, focusing instead on simpler, more maintainable solutions
* Clear separation between development and production environments
* Importance of explicit context and removal of ambiguity in documentation
* Need for robust testing when updating underlying models
**Future Directions**
The team is exploring several improvements:
* Integration with newer reasoning-focused models to improve query complexity handling
* Better schema change management
* Enhanced evaluation of context sufficiency
* Automated schema analysis for ambiguity detection
This case study represents a successful implementation of LLMs in production, demonstrating how careful architectural decisions, close user collaboration, and pragmatic engineering choices can lead to significant business value while maintaining system reliability and safety.