Multi-stage Retrieve and Re-rank Model for Automatic Medical Coding Recommendation

Read original: arXiv:2405.19093 - Published 5/30/2024 by Xindi Wang, Robert E. Mercer, Frank Rudzicz

Multi-stage Retrieve and Re-rank Model for Automatic Medical Coding Recommendation

Overview

This research paper presents a multi-stage model for automatically recommending medical codes based on clinical notes.
The model first retrieves a set of relevant medical codes, then re-ranks them to identify the most appropriate codes for a given clinical note.
The authors evaluated their approach on real-world medical coding data and found it outperformed existing methods.

Plain English Explanation

When patients visit a healthcare provider, information about their condition, treatment, and other details are recorded in clinical notes. These notes need to be associated with standardized medical codes, known as International Classification of Diseases (ICD) codes, to facilitate billing, research, and other purposes.

However, manually assigning these codes is a time-consuming and error-prone process. The authors of this paper developed a multi-stage machine learning model to automate this task. Their approach first retrieves a set of potentially relevant medical codes based on the content of the clinical note. It then re-ranks these code recommendations to identify the most appropriate ones.

This two-stage process allows the model to quickly generate a list of relevant codes, then carefully analyze and refine the results to provide the most accurate recommendations. The authors found that their approach outperformed existing methods, demonstrating the value of this multi-stage retrieval and re-ranking technique for automatic medical coding.

Technical Explanation

The authors' multi-stage model consists of two key components:

Retrieval Stage: This stage uses a transformer-based language model to encode the clinical note and match it against a database of medical codes. The model retrieves the top-k most relevant codes based on their similarity to the note.
Re-ranking Stage: The retrieved codes are then fed into a second model that re-ranks them based on various features, such as the code's textual similarity to the note, its frequency in the training data, and its hierarchical relationships to other codes. This allows the model to identify the most appropriate codes for the given clinical note.

The authors evaluated their approach on a large dataset of real-world clinical notes and ICD codes. They found that their multi-stage model outperformed existing methods, such as single-stage classification and hybrid retrieval-classification approaches, in terms of both recommendation accuracy and computational efficiency.

Critical Analysis

One potential limitation of this research is that it focuses on a specific medical coding system (ICD) and may not be directly applicable to other medical coding taxonomies. Additionally, the performance of the model is dependent on the quality and diversity of the training data, which can be challenging to obtain in the medical domain due to privacy and regulatory concerns.

While the authors demonstrate the effectiveness of their multi-stage approach, it would be valuable to explore further optimizations, such as incorporating additional domain-specific knowledge or exploring more advanced neural architectures for the retrieval and re-ranking stages.

Conclusion

This research paper presents a novel multi-stage model for automatically recommending medical codes based on clinical notes. By separating the retrieval and re-ranking stages, the authors have developed a more accurate and efficient approach compared to existing methods. The insights from this work could have significant implications for improving medical coding workflows and supporting a wide range of healthcare applications, such as billing, research, and population health management.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Multi-stage Retrieve and Re-rank Model for Automatic Medical Coding Recommendation

Xindi Wang, Robert E. Mercer, Frank Rudzicz

The International Classification of Diseases (ICD) serves as a definitive medical classification system encompassing a wide range of diseases and conditions. The primary objective of ICD indexing is to allocate a subset of ICD codes to a medical record, which facilitates standardized documentation and management of various health conditions. Most existing approaches have suffered from selecting the proper label subsets from an extremely large ICD collection with a heavy long-tailed label distribution. In this paper, we leverage a multi-stage ``retrieve and re-rank'' framework as a novel solution to ICD indexing, via a hybrid discrete retrieval method, and re-rank retrieved candidates with contrastive learning that allows the model to make more accurate predictions from a simplified label space. The retrieval model is a hybrid of auxiliary knowledge of the electronic health records (EHR) and a discrete retrieval method (BM25), which efficiently collects high-quality candidates. In the last stage, we propose a label co-occurrence guided contrastive re-ranking model, which re-ranks the candidate labels by pulling together the clinical notes with positive ICD codes. Experimental results show the proposed method achieves state-of-the-art performance on a number of measures on the MIMIC-III benchmark.

5/30/2024

Auxiliary Knowledge-Induced Learning for Automatic Multi-Label Medical Document Classification

Xindi Wang, Robert E. Mercer, Frank Rudzicz

The International Classification of Diseases (ICD) is an authoritative medical classification system of different diseases and conditions for clinical and management purposes. ICD indexing assigns a subset of ICD codes to a medical record. Since human coding is labour-intensive and error-prone, many studies employ machine learning to automate the coding process. ICD coding is a challenging task, as it needs to assign multiple codes to each medical document from an extremely large hierarchically organized collection. In this paper, we propose a novel approach for ICD indexing that adopts three ideas: (1) we use a multi-level deep dilated residual convolution encoder to aggregate the information from the clinical notes and learn document representations across different lengths of the texts; (2) we formalize the task of ICD classification with auxiliary knowledge of the medical records, which incorporates not only the clinical texts but also different clinical code terminologies and drug prescriptions for better inferring the ICD codes; and (3) we introduce a graph convolutional network to leverage the co-occurrence patterns among ICD codes, aiming to enhance the quality of label representations. Experimental results show the proposed method achieves state-of-the-art performance on a number of measures.

5/30/2024

💬

Large language models are good medical coders, if provided with tools

Keith Kwan

This study presents a novel two-stage Retrieve-Rank system for automated ICD-10-CM medical coding, comparing its performance against a Vanilla Large Language Model (LLM) approach. Evaluating both systems on a dataset of 100 single-term medical conditions, the Retrieve-Rank system achieved 100% accuracy in predicting correct ICD-10-CM codes, significantly outperforming the Vanilla LLM (GPT-3.5-turbo), which achieved only 6% accuracy. Our analysis demonstrates the Retrieve-Rank system's superior precision in handling various medical terms across different specialties. While these results are promising, we acknowledge the limitations of using simplified inputs and the need for further testing on more complex, realistic medical cases. This research contributes to the ongoing effort to improve the efficiency and accuracy of medical coding, highlighting the importance of retrieval-based approaches.

7/19/2024

A Novel ICD Coding Method Based on Associated and Hierarchical Code Description Distillation

Bin Zhang, Junli Wang

ICD(International Classification of Diseases) coding involves assigning ICD codes to patients visit based on their medical notes. ICD coding is a challenging multilabel text classification problem due to noisy medical document inputs. Recent advancements in automated ICD coding have enhanced performance by integrating additional data and knowledge bases with the encoding of medical notes and codes. However, most of them ignore the code hierarchy, leading to improper code assignments. To address these problems, we propose a novel framework based on associated and hierarchical code description distillation (AHDD) for better code representation learning and avoidance of improper code assignment.we utilize the code description and the hierarchical structure inherent to the ICD codes. Therefore, in this paper, we leverage the code description and the hierarchical structure inherent to the ICD codes. The code description is also applied to aware the attention layer and output layer. Experimental results on the benchmark dataset show the superiority of the proposed framework over several state-of-the-art baselines.

9/4/2024