A UK-based NLQ (Natural Language Query) company developed an AI-powered interface for Aachen Uniklinik to make intensive care unit databases more accessible to healthcare professionals. The system uses a hybrid approach combining vector databases, large language models, and traditional SQL to allow non-technical medical staff to query complex patient data using natural language. The solution includes features for handling dirty data, intent detection, and downstream complication analysis, ultimately improving clinical decision-making processes.
This case study, presented at the NLP Summit by Dennis, COO of NLQ (a company based in England), describes the development of an AI-powered natural language interface for healthcare databases. The company worked with Aachen University Clinic (Aachen Uniklinik) in Germany to build an intelligent interface that allows non-technical healthcare professionals—such as physicians, clinicians, and department heads—to query Intensive Care Unit (ICU) databases using natural language instead of SQL.
The project was motivated by German healthcare policy requiring standardization of databases across regions to enable access to patient information from different locations. Additionally, the company completed a research and development project with Innovate UK to create an AI tool that empowers physicians with intuitive interfaces for accessing poorly accessible clinical data to accelerate clinical decision-making.
The core challenge addressed is the data analytics bottleneck between technical and non-technical employees in healthcare settings. The presenter outlined several key pain points:
The solution employs what the presenter describes as a “hybrid” approach combining multiple AI/ML techniques. This is notable from an LLMOps perspective as it demonstrates a production system that integrates several different components rather than relying on a single LLM.
The system stores metadata from healthcare databases in a vector database. This metadata includes diagnosis names, prescription names, and other text-format information from the healthcare database. When processing a user’s natural language question, the system compares the input against stored embeddings using cosine similarity. This approach helps handle small typos and semantically similar terms, improving the robustness of natural language understanding.
For intent detection, the team fine-tuned an open-source large language model. This is a critical component because healthcare use cases involve many different types of queries (which they call “intents” in their logic). The fine-tuned model helps detect the proper intent from user questions, even when users phrase the same question in multiple different ways. The presenter explicitly noted that their system will not respond to out-of-scope questions (using the example of “who is Justin Bieber” being rejected unless Justin Bieber happens to be a patient).
The system also uses LLMs for slot filling—extracting specific parameters from natural language queries and transforming them into formats suitable for SQL interfaces. For example, when a user mentions a time period in natural language, the slot filling model detects the period intent and converts it to the proper date format required by the database engine.
The production backend operates with both the vector database and a structured data database simultaneously. The model works with both databases to transform natural language questions into structured query language (SQL) code. This hybrid approach is designed to leverage the strengths of different database types for different aspects of the query processing pipeline.
One of the features emphasized as “most important” is the system’s ability to ask additional questions when it detects ambiguities in user queries. Given the complexity of medical terminology—where the same term can have different meanings in different departments—this clarification mechanism is essential for generating accurate SQL queries. The system will iteratively ask questions until it has enough information to produce a correct query.
The presenter showcased several real-world analytics capabilities:
The downstream complication feature enables physicians to leverage hospital patient history to predict and potentially mitigate complications for current patients. For example, if historical data shows certain bacterial infections are common complications after specific surgeries and hospital stays of certain durations, physicians can take preventive measures.
The system handles varying result sizes intelligently:
The presenter noted that they build “enterprise-ready software” with a focus on security and connectivity to different data sources. The core offering is an API that transforms natural language questions to SQL code, which can be integrated into existing hospital IT ecosystems.
Recognizing that most hospitals require on-premise solutions for data security reasons, the system can be deployed as a microservice with the AI model on the backend within the hospital’s own infrastructure. This is an important LLMOps consideration—the architecture needs to accommodate enterprise security requirements and on-premise deployment rather than assuming cloud-based operation.
While the presentation demonstrates an interesting hybrid approach to text-to-SQL in healthcare, several aspects warrant careful consideration:
The case study represents a practical application of multiple LLM and NLP techniques in a production healthcare setting, demonstrating how hybrid architectures combining vector search, fine-tuned LLMs, and traditional databases can address real-world data accessibility challenges in regulated industries.
Snorkel developed a specialized benchmark dataset for evaluating AI agents in insurance underwriting, leveraging their expert network of Chartered Property and Casualty Underwriters (CPCUs). The benchmark simulates an AI copilot that assists junior underwriters by reasoning over proprietary knowledge, using multiple tools including databases and underwriting guidelines, and engaging in multi-turn conversations. The evaluation revealed significant performance variations across frontier models (single digits to ~80% accuracy), with notable error modes including tool use failures (36% of conversations) and hallucinations from pretrained domain knowledge, particularly from OpenAI models which hallucinated non-existent insurance products 15-45% of the time.
John Snow Labs developed a comprehensive healthcare LLM system that integrates multimodal medical data (structured, unstructured, FHIR, and images) into unified patient journeys. The system enables natural language querying across millions of patient records while maintaining data privacy and security. It uses specialized healthcare LLMs for information extraction, reasoning, and query understanding, deployed on-premises via Kubernetes. The solution significantly improves clinical decision support accuracy and enables broader access to patient data analytics while outperforming GPT-4 in medical tasks.
Prudential Financial, in partnership with AWS GenAI Innovation Center, built a scalable multi-agent platform to support 100,000+ financial advisors across insurance and financial services. The system addresses fragmented workflows where advisors previously had to navigate dozens of disconnected IT systems for client engagement, underwriting, product information, and servicing. The solution features an orchestration agent that routes requests to specialized sub-agents (quick quote, forms, product, illustration, book of business) while maintaining context and enforcing governance. The platform-based microservices architecture reduced time-to-value from 6-8 weeks to 3-4 weeks for new agent deployments, enabled cross-business reusability, and provided standardized frameworks for authentication, LLM gateway access, knowledge management, and observability while handling the complexity of scaling multi-agent systems in a regulated financial services environment.