Volvo implemented a Retrieval Augmented Generation (RAG) system that allows non-technical users to query business intelligence data through a Slack interface using natural language. The system translates natural language questions into SQL queries for BigQuery, executes them, and returns results - effectively automating what was previously manual work done by data analysts. The system leverages DBT metadata and schema information to provide accurate responses while maintaining control over data access.
This case study examines how Volvo implemented a RAG-based natural language interface to their business intelligence systems, specifically focusing on their car-sharing business unit. The system represents an interesting example of pragmatic LLMOps implementation that adds real business value while working within existing enterprise constraints and tools.
The core problem being solved was the inefficient use of data team resources to answer relatively simple business intelligence questions. Data team members would regularly receive requests through Slack channels asking for basic metrics like "how many journeys did we have yesterday" or "how many users signed up." These requests, while valuable to the business, were interrupting more strategic work.
The technical implementation combines several key components:
* A Slack bot interface that users are already familiar with
* ChatGPT's API for natural language understanding and SQL generation
* BigQuery as the underlying data warehouse
* DBT for data transformation and metadata management
A key architectural decision was how to provide the language model with context about the database schema and business logic. The system leverages DBT's metadata layer, which includes not just the raw schema information but also:
* Enum values for categorical fields (like B2B vs B2C)
* Field descriptions and documentation
* Business logic embedded in DBT transformations
Initially, the team experimented with semantic search to find relevant schema information based on the user's question. However, with the release of models supporting larger context windows (up to 128k tokens), they found they could simply include the entire schema context in the prompt. While this worked technically, they discovered there were practical limits well below the theoretical maximum - typically using no more than 40-50k tokens for reliable results.
The production system flow works as follows:
1. User asks a question in natural language through Slack
2. The system provides relevant schema and metadata context to ChatGPT
3. ChatGPT generates appropriate SQL for BigQuery
4. The query is executed against BigQuery
5. Results are returned to the user through Slack
Some key learnings from the implementation:
* The system benefits from DBT's structured approach to data transformation and documentation
* Larger context windows don't necessarily mean you should use all available tokens
* Simple integration with existing tools (Slack) increases adoption
* Having schema information and business logic documentation is crucial for accurate query generation
The team also discovered interesting limitations and areas for future improvement:
* The current implementation returns raw table results without visualization
* There may be opportunities to add more sophisticated error handling
* The system could potentially benefit from more interactive clarification of user intent
* Additional controls may be needed around query complexity and resource usage
From an LLMOps perspective, several best practices emerge:
* Start with existing workflows and tools where possible
* Leverage metadata and documentation you already have
* Be pragmatic about model capabilities vs practical limitations
* Focus on specific high-value use cases rather than trying to solve everything at once
The implementation demonstrates how LLM capabilities can be practically integrated into enterprise workflows without requiring massive changes to existing systems and processes. It shows how careful attention to context management and proper use of existing metadata can create powerful natural language interfaces to traditional business intelligence systems.
The case also raises interesting questions about the evolution of business intelligence tools and the role of data teams as natural language interfaces become more capable. While the current implementation focuses on simple metric queries, there's potential to expand to more complex analysis and insights generation.
Security and governance considerations were also important, as the system needed to work within existing data access controls and permissions. The use of BigQuery as the execution engine helps ensure that existing security policies are maintained.
The project started as a "hobby project" but evolved into a production system, highlighting how experimental LLMOps implementations can mature into valuable business tools when they address clear pain points and integrate well with existing workflows.
Start your new ML Project today with ZenML Pro
Join 1,000s of members already deploying models with ZenML.