ZenML

SQL Query Agent for Data Democratization

Prosus 2024
View original source

Prosus developed a SQL-generating agent called "Token Data Analyst" to help democratize data access across their portfolio companies. The agent serves as a first-line support for data queries, allowing non-technical users to get insights from databases through natural language questions in Slack. The system achieved a 74% reduction in query response time and significantly increased the total number of data insights generated, while maintaining high accuracy through careful prompt engineering and context management.

Industry

Tech

Technologies

This case study explores Prosus’s development and deployment of the “Token Data Analyst” agent, a system designed to democratize data access across their portfolio companies including iFood, OLX, and other technology companies.

The core problem being addressed was the bottleneck created by data analysts having to handle numerous routine data queries, preventing them from focusing on more complex analytical work. With around 30,000 employees across various tech companies, the need for quick data access was critical for decision-making, customer service, and operations.

System Architecture and Implementation

The Token Data Analyst agent was built with several key architectural decisions:

Technical Challenges and Solutions

The team encountered several significant challenges during development:

Evaluation and Testing

The evaluation process was multifaceted:

Impact and Results

The implementation showed significant positive outcomes:

Lessons Learned and Best Practices

Several key insights emerged from the project:

Future Directions

The team is exploring several improvements:

This case study represents a successful implementation of LLMs in production, demonstrating how careful architectural decisions, close user collaboration, and pragmatic engineering choices can lead to significant business value while maintaining system reliability and safety.

More Like This

Agentic AI Copilot for Insurance Underwriting with Multi-Tool Integration

Snorkel 2025

Snorkel developed a specialized benchmark dataset for evaluating AI agents in insurance underwriting, leveraging their expert network of Chartered Property and Casualty Underwriters (CPCUs). The benchmark simulates an AI copilot that assists junior underwriters by reasoning over proprietary knowledge, using multiple tools including databases and underwriting guidelines, and engaging in multi-turn conversations. The evaluation revealed significant performance variations across frontier models (single digits to ~80% accuracy), with notable error modes including tool use failures (36% of conversations) and hallucinations from pretrained domain knowledge, particularly from OpenAI models which hallucinated non-existent insurance products 15-45% of the time.

healthcare fraud_detection customer_support +90

Multi-Agent Financial Research and Question Answering System

Yahoo! Finance 2025

Yahoo! Finance built a production-scale financial question answering system using multi-agent architecture to address the information asymmetry between retail and institutional investors. The system leverages Amazon Bedrock Agent Core and employs a supervisor-subagent pattern where specialized agents handle structured data (stock prices, financials), unstructured data (SEC filings, news), and various APIs. The solution processes heterogeneous financial data from multiple sources, handles temporal complexities of fiscal years, and maintains context across sessions. Through a hybrid evaluation approach combining human and AI judges, the system achieves strong accuracy and coverage metrics while processing queries in 5-50 seconds at costs of 2-5 cents per query, demonstrating production viability at scale with support for 100+ concurrent users.

question_answering data_analysis chatbot +49

Reinforcement Learning for Code Generation and Agent-Based Development Tools

Cursor 2025

This case study examines Cursor's implementation of reinforcement learning (RL) for training coding models and agents in production environments. The team discusses the unique challenges of applying RL to code generation compared to other domains like mathematics, including handling larger action spaces, multi-step tool calling processes, and developing reward signals that capture real-world usage patterns. They explore various technical approaches including test-based rewards, process reward models, and infrastructure optimizations for handling long context windows and high-throughput inference during RL training, while working toward more human-centric evaluation metrics beyond traditional test coverage.

code_generation code_interpretation data_analysis +61