ZenML

Automated HCC Code Extraction from Clinical Notes Using Healthcare NLP

WVU Medicine 2023
View original source

WVU Medicine implemented an automated system for extracting Hierarchical Condition Category (HCC) codes from clinical notes using John Snow Labs' Healthcare NLP models. The system processes radiology notes for upcoming patient appointments, extracts relevant diagnoses, converts them to CPT codes, and then maps them to HCC codes. The solution went live in December 2023 and has processed over 27,000 HCC codes with an 18.4% acceptance rate by providers, positively impacting over 5,000 patients.

Industry

Healthcare

Technologies

Overview

WVU Medicine is a comprehensive health system operating across 25 hospitals in North Central West Virginia, southern Pennsylvania, Western Ohio, and Eastern Maryland. The organization has a strong affiliation with the academic Medical Center at West Virginia University, allowing them to combine research capabilities with patient care. Their flagship facility, Ruby Memorial Hospital in Morgantown, West Virginia, was recognized by US News and World Report as the top hospital in West Virginia.

A key advantage for WVU Medicine’s AI initiatives is that all 25 hospitals operate on the same Electronic Medical Record (EMR) system and follow standardized procedures, making it easier to scale new implementations across the organization. The AI team at WVU Medicine focuses on developing predictive and classification models across various areas including population health, revenue cycle, and ambulatory services.

The Problem

The case study centers on Hierarchical Condition Category (HCC) coding, which is a system that categorizes patient chronic conditions to predict future healthcare needs and ensure appropriate care management. HCC coding serves multiple purposes: enhancing patient care through personalized treatment plans, improving population health management, supporting regulatory compliance and quality reporting, and increasing patient engagement.

The fundamental challenge is that providers must evaluate chronic conditions and document relevant HCC codes while simultaneously assessing and treating patients. These codes are derived from clinical notes, exam reports, and other unstructured patient documents—a task that is both extensive and detail-oriented. Given the scale and complexity of the organization, keeping up with comprehensive HCC coding is overwhelming, which often leads to some codes being missed from patient diagnosis charts despite providers’ best efforts.

The importance of accurate and comprehensive HCC coding cannot be overstated because these codes affect payment adjustments to health plans based on patient health status and demographics, particularly for Medicare Advantage plan enrollers. More critically, accurate coding enables early identification of high-risk patients and development of targeted, personalized care plans.

Technical Solution

WVU Medicine built an in-house CPT (Current Procedural Terminology) code extraction engine using John Snow Labs’ Spark NLP Healthcare module. The workflow follows a clear pipeline: clinical notes are processed to extract diagnoses, which are then translated into CPT codes, which are subsequently mapped to HCC codes.

NLP Model Architecture and Customization

The John Snow Labs Healthcare NLP model employs several sophisticated techniques for extracting meaningful context from clinical notes:

WVU Medicine customized the base model to better suit their specific needs. This customization involved refining the token extraction process by removing irrelevant details such as family history and suggestive comments. The goal was to accurately extract CPT codes that are relevant to the specific patient at the specific time of care. Additionally, since their patient population is unique, they filter out codes that are not applicable to their patient demographics.

Confidence Scoring

A notable feature of their implementation is the confidence scoring mechanism. The model outputs a confidence score for each extracted diagnosis:

Production Integration Workflow

The integration workflow demonstrates thoughtful production engineering:

The system queries the EMR database to gather all upcoming patient appointments scheduled over the next 14 days. This forward-looking approach ensures timely and relevant data processing. For these patients, all associated radiology notes are retrieved—these notes contain crucial information for identifying potential diagnoses.

The CPT extraction engine processes these radiology notes, efficiently handling large volumes of unstructured data to identify relevant CPT codes. The extracted codes are stored in a database where data manipulation and filtering assigns a “load value” to each code. The filtering logic removes codes that are already attached to the patient’s diagnosis chart and those with low confidence levels. A load value of 1 indicates the code should be loaded into the EMR system; a load value of 0 indicates it should not.

An automated process picks up these filtered CPT codes and loads them into the EMR front-end system, ensuring relevant codes are available to providers during patient visits.

Real-Time Provider Integration

The production workflow incorporates a sophisticated feedback mechanism through best practice alerts. During patient visits, providers receive real-time alerts containing suggested HCC codes as they enter diagnoses. Providers have three options:

These responses are tracked for future validation and audit purposes, creating a feedback loop that helps refine the process and ensure accuracy over time. This human-in-the-loop approach is critical for healthcare applications where clinical judgment must remain paramount.

Data Pipeline and File Processing

The data architecture team developed a process that scans output tables, retrieving codes and patient information for entries with a load value of 1. This process also pulls supplemental data necessary to trigger the best practice alert. A flat file format is generated and stored in a designated file location. The EMR system’s internal process then ingests these files, converts CPT codes to HCC codes, and ensures the best practice alert is displayed during patient encounters.

Document Tracking and Efficiency

A key factor in the model’s operational efficiency is the document tracking system. WVU Medicine maintains records of all scanned documents to avoid rescanning the same files when patients schedule subsequent appointments. This optimization streamlines the process and ensures fast runtime, with the full operation completing in under an hour.

Results and Performance

The model went live in December 2023, making it a relatively new implementation at the time of this presentation. The team noted they are still training providers to use it effectively. Key metrics reported include:

The 18.4% acceptance rate deserves some context—while this may seem modest, in healthcare settings where false positives could have serious implications, a conservative approach that prioritizes precision is often preferable. The fact that providers can reject or mark codes as invalid provides important quality control.

Future Development Plans

WVU Medicine outlined several next steps for the project:

Model Upgrades: They are developing version 2 based on the latest John Snow Labs Healthcare NLP model, which in testing shows more codes extracted with higher precision. They acknowledged that their current workflow has drawbacks around model upgrades—currently requiring an elaborate process. They are actively improving infrastructure to make model upgrades a single-line code change, which represents a significant DevOps/MLOps improvement.

Expanded Document Coverage: Plans to extend beyond radiology documents to include progress notes, providing a more comprehensive view of patient information.

Additional Use Cases: Exploring extension of the base Healthcare NLP model to other patient care areas, including classification of mislabeled documents and identifying incidental findings.

Lessons Learned and Partnership Value

The presenter emphasized the importance of having strong operational business partners for complex healthcare AI projects. Success depends heavily on thorough clinical data validation and meticulous workflow evaluation, which requires operational expertise beyond pure AI capabilities.

As a smaller AI team, WVU Medicine found value in partnering with John Snow Labs’ Professional Services, which helped them implement their ideas using state-of-the-art AI tools within a few months. They strongly encouraged other small AI teams to consider leveraging such professional services to expedite implementation and realize value sooner.

Critical Assessment

While the presentation highlights significant achievements, several aspects warrant consideration:

Overall, this case study represents a practical example of deploying healthcare NLP in a production environment with thoughtful attention to clinical workflow integration, provider feedback loops, and operational efficiency considerations.

More Like This

Agentic AI Copilot for Insurance Underwriting with Multi-Tool Integration

Snorkel 2025

Snorkel developed a specialized benchmark dataset for evaluating AI agents in insurance underwriting, leveraging their expert network of Chartered Property and Casualty Underwriters (CPCUs). The benchmark simulates an AI copilot that assists junior underwriters by reasoning over proprietary knowledge, using multiple tools including databases and underwriting guidelines, and engaging in multi-turn conversations. The evaluation revealed significant performance variations across frontier models (single digits to ~80% accuracy), with notable error modes including tool use failures (36% of conversations) and hallucinations from pretrained domain knowledge, particularly from OpenAI models which hallucinated non-existent insurance products 15-45% of the time.

healthcare fraud_detection customer_support +90

Forward Deployed Engineering: Bringing Enterprise LLM Applications to Production

OpenAI 2025

OpenAI's Forward Deployed Engineering (FDE) team, led by Colin Jarvis, embeds with enterprise customers to solve high-value problems using LLMs and deliver production-grade AI applications. The team focuses on problems worth tens of millions to billions in value, working with companies across industries including finance (Morgan Stanley), manufacturing (semiconductors, automotive), telecommunications (T-Mobile, Klarna), and others. By deeply understanding customer domains, building evaluation frameworks, implementing guardrails, and iterating with users over months, the FDE team achieves 20-50% efficiency improvements and high adoption rates (98% at Morgan Stanley). The approach emphasizes solving hard, novel problems from zero-to-one, extracting learnings into reusable products and frameworks (like Swarm and Agent Kit), then scaling solutions across the market while maintaining strategic focus on product development over services revenue.

customer_support healthcare code_generation +42

Using LLMs to Combat Health Insurance Claim Denials

Fight Health Insurance 2026

Fight Health Insurance is an open-source project that uses fine-tuned large language models to help people appeal denied health insurance claims in the United States. The system processes denial letters, extracts relevant information, and generates appeal letters based on training data from independent medical review boards. The project addresses the widespread problem of insurance claim denials by automating the complex and time-consuming process of crafting effective appeals, making it accessible to individuals who lack the resources or knowledge to navigate the appeals process themselves. The tool is available both as an open-source Python package and as a free hosted service, though the sustainability model is still being developed.

healthcare document_processing question_answering +23