The case study demonstrates how to build production-ready conversational analytics applications by integrating LangGraph's multi-agent framework with Waii's advanced text-to-SQL capabilities. The solution tackles complex database operations through sophisticated join handling, knowledge graph construction, and agentic flows, enabling natural language interactions with complex data structures while maintaining high accuracy and scalability.
This case study presents Waii’s approach to building conversational analytics applications by combining their text-to-SQL capabilities with LangGraph, a framework for building stateful multi-agent applications. The article, published in October 2024, serves both as a technical deep-dive into handling complex SQL joins and as a demonstration of how to integrate Waii’s API into production LLM applications. It’s worth noting that this content originates from Waii’s own blog, so the claims about accuracy and capabilities should be viewed with appropriate skepticism as they represent vendor marketing material rather than independent validation.
The core problem addressed is the difficulty of translating natural language queries into accurate SQL, particularly when dealing with complex database schemas that require sophisticated join operations. Many text-to-SQL solutions struggle with multi-table joins, and this case study claims to demonstrate Waii’s differentiated approach to solving this challenge.
The solution architecture combines two main components: Waii’s text-to-SQL engine and LangGraph’s orchestration framework. The LangGraph application is structured as a state machine with multiple nodes handling different aspects of the conversational analytics workflow.
The workflow consists of several key nodes:
Question Classifier: Uses an LLM (OpenAI’s ChatGPT) to determine whether a user’s question relates to database information, visualization needs, or general queries. This node retrieves database catalog information from Waii and uses it as context for classification.
SQL Generator: Calls Waii’s Query.generate API to translate natural language into SQL. This is where the complex join handling occurs.
SQL Executor: Executes the generated SQL against the database through Waii’s Query.run API, which also injects security constraints before execution.
Result Classifier: Another LLM-based classifier that determines whether results should be presented as data or visualizations.
Chart Generator: Uses Waii’s Chart.generate_chart API to create visualizations from query results.
Insight Generator: A fallback path using OpenAI directly for general questions not requiring database access.
Result Synthesizer: Combines all outputs into a coherent response for the user.
The state management is handled through a Pydantic BaseModel that tracks the database description, query, generated SQL, data results, chart specifications, insights, responses, errors, and path decisions across the workflow.
The article’s primary technical contribution is explaining Waii’s approach to handling complex database joins. The example query provided is notably sophisticated, spanning 14 tables with various join types and semantics. The example creates a director performance dashboard combining data from movies, TV series, genres, keywords, awards, and actor collaborations.
Key join capabilities demonstrated include:
The generated SQL uses Common Table Expressions (CTEs) extensively, with named subqueries like director_movie_count, director_tv_count, combined_counts, ranked_directors, and various aggregation CTEs. This demonstrates that the system generates production-quality SQL rather than simple single-table queries.
A key differentiator claimed by Waii is their automatic construction of a knowledge graph representing database relationships. This graph incorporates multiple data sources:
The knowledge graph is described as continuously updated and refined with schema changes, new queries, and user feedback. This represents a form of continuous learning that could improve text-to-SQL accuracy over time, though no quantitative evidence is provided in the article.
Waii employs a sequence of specialized “agentic flows” for query construction:
Table Selection: Analyzing the user’s request to identify relevant tables, using semantic understanding and knowledge of common join relationships to find tables that might not be directly mentioned in the user’s input.
Join Graph Analysis: Proposing and evaluating potential join paths between selected tables, scoring alignment with previously seen joins and semantic understanding of relationships.
Join Condition Evaluation/Refinement: A separate check ensuring outer joins and join conditions are correctly applied, including proper handling of “on” vs “where” clause conditions.
Query Construction: Building the SQL query based on the chosen join graph and conditions.
Compilation and Optimization: Ensuring syntactic correctness, performance optimization, and enforcement of operational constraints (e.g., max output rows, max input partitions).
The implementation includes several production-relevant features. Security constraints are enforced at the execution layer, with code injection to limit row/column access based on user-level policies. The workflow includes error handling with a loop that can rewrite input and regenerate required objects on exceptions. State is maintained across interactions to allow follow-up questions and iterative analysis.
The code sample provided uses environment variables for configuration (WAII_URL, WAII_API_KEY, DB_CONNECTION), following standard practices for secrets management. The application maintains a continuous loop for interactive use, with exception handling that restarts the workflow on errors.
While the technical architecture presented is sound and the example query is genuinely impressive in its complexity, several caveats apply. The article provides no quantitative benchmarks comparing Waii’s accuracy to other text-to-SQL solutions. Claims about “high-accuracy joins” and “scalable table selection for large databases” are not substantiated with metrics. The knowledge graph construction process, while interesting conceptually, lacks detail on how predictions are made or how feedback is incorporated.
The example focuses on a single successful case rather than discussing failure modes, edge cases, or the types of queries where the system might struggle. The claim that users without specialized technical skills can perform complex data analysis should be viewed cautiously, as even correct SQL results require domain knowledge to interpret meaningfully.
The integration with LangGraph is relatively straightforward, primarily using Waii as an API service rather than demonstrating deep integration with LangGraph’s more advanced features like checkpointing, human-in-the-loop workflows, or parallel execution.
The article identifies several industry applications including business intelligence for executives, healthcare research database exploration, financial market analysis, e-commerce customer behavior analysis, and educational administration insights. These represent reasonable applications of conversational analytics, though actual implementations would require significant additional work beyond what’s shown in the code sample.
This case study demonstrates a practical approach to building production conversational analytics applications using LLMs and specialized text-to-SQL services. The combination of LangGraph for workflow orchestration and Waii for SQL generation represents a sensible architectural pattern. However, teams considering similar implementations should conduct their own evaluations of text-to-SQL accuracy, particularly for their specific database schemas and query patterns, rather than relying solely on vendor claims.
Snorkel developed a specialized benchmark dataset for evaluating AI agents in insurance underwriting, leveraging their expert network of Chartered Property and Casualty Underwriters (CPCUs). The benchmark simulates an AI copilot that assists junior underwriters by reasoning over proprietary knowledge, using multiple tools including databases and underwriting guidelines, and engaging in multi-turn conversations. The evaluation revealed significant performance variations across frontier models (single digits to ~80% accuracy), with notable error modes including tool use failures (36% of conversations) and hallucinations from pretrained domain knowledge, particularly from OpenAI models which hallucinated non-existent insurance products 15-45% of the time.
Raindrop's CTO Ben presents a comprehensive framework for building reliable AI agents in production, addressing the challenge that traditional offline evaluations cannot capture the full complexity of real-world user behavior. The core problem is that AI agents fail in subtle ways without concrete errors, making issues difficult to detect and fix. Raindrop's solution centers on a "discover, track, and fix" loop that combines explicit signals like thumbs up/down with implicit signals detected semantically in conversations, such as user frustration, task failures, and agent forgetfulness. By clustering these signals with user intents and tracking them over time, teams can identify the most impactful issues and systematically improve their agents. The approach emphasizes experimentation and production monitoring over purely offline testing, drawing parallels to how traditional software engineering shifted from extensive QA to tools like Sentry for error monitoring.
Loka, an AWS partner specializing in generative AI solutions, and Domo, a business intelligence platform, demonstrate production implementations of agentic AI systems across multiple industries. Loka showcases their drug discovery assistant (ADA) that integrates multiple AI models and databases to accelerate pharmaceutical research workflows, while Domo presents agentic solutions for call center optimization and financial analysis. Both companies emphasize the importance of systematic approaches to AI implementation, moving beyond simple chatbots to multi-agent systems that can take autonomous actions while maintaining human oversight through human-in-the-loop architectures.