Great Ormond Street Hospital NHS Trust developed a solution to extract information from 15,000 unstructured cardiac MRI reports spanning 10 years. They implemented a hybrid approach using small LLMs for entity extraction and few-shot learning for table structure classification. The system successfully extracted patient identifiers and clinical measurements from heterogeneous reports, enabling linkage with structured data and improving clinical research capabilities. The solution demonstrated significant improvements in extraction accuracy when using contextual prompting with models like FLAN-T5 and RoBERTa, while operating within NHS security constraints.
This case study from Great Ormond Street Hospital (GOSH) NHS Trust demonstrates a practical implementation of LLMs in a highly regulated healthcare environment, specifically focusing on extracting valuable information from unstructured cardiac MRI reports. The project represents a significant step forward in healthcare data utilization, combining modern LLM techniques with practical constraints of operating in a medical setting.
# Context and Challenge
Great Ormond Street Hospital, a pediatric hospital specializing in rare and complex conditions, faced a significant challenge with their historical medical data. They had accumulated approximately 15,000 cardiac MRI reports over a decade, containing valuable clinical information locked in unstructured formats. This information was crucial for:
* Clinical decision-making
* Retrospective analysis
* Secondary research purposes
* Precision medicine
* Drug discovery
The manual extraction of this information was time-consuming, required domain expertise, and could only be performed on small data subsets, making it impractical for large-scale analysis.
# Technical Infrastructure
The team implemented a specialized infrastructure called "grid" to handle the processing requirements while maintaining NHS security standards. This consisted of:
* A development server with internet access but no patient data processing capabilities
* A staging server disconnected from the internet for processing patient data
* This setup reduced processing time from days to hours compared to using standard workstations
# LLM Implementation Strategy
The team adopted a pragmatic approach to using LLMs, recognizing both the potential and limitations of deploying them in a healthcare setting:
## Entity Extraction Using Small LLMs
* Chose smaller LLM models due to CPU-only infrastructure constraints
* Implemented a question-answering approach for extracting patient identifiers and clinical information
* Evaluated multiple models including FLAN-T5, RoBERTa, and BERT variants
* Found that some models (like FLAN-T5 small) showed significant hallucination issues and were excluded
## Prompt Engineering and Model Performance
The team conducted detailed experiments with prompt engineering:
* Added contextual prompts explaining the nature of the reports (cardiac MRI reports from GOSH)
* Observed significant performance improvements in entity extraction for FLAN-T5 and RoBERTa with context
* Discovered an interesting degradation in BERT's performance when extracting large numbers (like NHS numbers) with added context
* Used integrated gradients attribution scores to analyze model behavior with and without prompts
## Hybrid Approach for Table Extraction
For extracting tabular data containing clinical measurements, they developed a sophisticated hybrid approach:
* Initially attempted rule-based approaches, which proved insufficient due to data heterogeneity
* Implemented a few-shot learning approach using SetFit methodology
* Used sentence transformers (specifically DistilRoBERTa) for generating embeddings
* Applied SVM classifier for final classification
* Achieved significant performance improvements over rule-based methods with just 280 training examples
# Technical Challenges and Solutions
Several key challenges were addressed during implementation:
## Data Heterogeneity
* Reports were created by different consultants over a decade
* Formats and structures varied significantly
* LLMs proved effective at handling this heterogeneity without requiring extensive rule engineering
## Infrastructure Limitations
* Operated within NHS security constraints
* Limited to CPU processing
* Required careful model selection to balance performance and resource constraints
## Model Selection and Evaluation
* Conducted thorough evaluations of different model sizes and architectures
* Balanced performance against resource constraints
* Documented and excluded models showing hallucination tendencies
# Results and Impact
The implementation showed several positive outcomes:
* Successfully extracted patient identifiers and clinical measurements from unstructured reports
* Enabled linkage with existing structured data sources
* Reduced processing time significantly compared to manual methods
* Demonstrated the viability of using smaller LLMs in production healthcare settings
* Proved the effectiveness of few-shot learning for specialized classification tasks
# Future Directions
The team identified several areas for future development:
* Exploring zero-shot capabilities of general-purpose LLMs
* Testing the generalizability of the approach to other types of medical reports
* Investigating the use of domain-specific LLMs
* Expanding the few-shot learning approach to other classification tasks
This case study demonstrates a practical, production-ready implementation of LLMs in healthcare, balancing the need for accurate information extraction with real-world constraints of operating in a regulated medical environment. The hybrid approach combining LLMs with traditional machine learning techniques shows how organizations can effectively leverage AI capabilities while working within infrastructure and security limitations.
Start your new ML Project today with ZenML Pro
Join 1,000s of members already deploying models with ZenML.