Common and Rare Fundus Diseases Identification Using Vision-Language Foundation Model with Knowledge of Over 400 Diseases

Read original: arXiv:2406.09317 - Published 7/2/2024 by Meng Wang, Tian Lin, Aidi Lin, Kai Yu, Yuanyuan Peng, Lianyu Wang, Cheng Chen, Ke Zou, Huiyu Liang, Man Chen and 39 others

Common and Rare Fundus Diseases Identification Using Vision-Language Foundation Model with Knowledge of Over 400 Diseases

Overview

This paper presents a vision-language foundation model for identifying common and rare fundus diseases, drawing on knowledge of over 400 diseases.
The model leverages advancements in deep learning and multimodal learning to improve the diagnosis of eye conditions from fundus images.
The researchers evaluate the model's performance on several benchmark datasets, demonstrating its ability to accurately identify a wide range of eye diseases, including both common and rare conditions.

Plain English Explanation

The paper describes a new artificial intelligence (AI) system that can help doctors identify eye diseases by analyzing images of the back of the eye, known as the fundus. This system uses a deep learning approach that combines visual information from the images with language-based knowledge about over 400 different eye diseases.

The key idea is to use a foundational model - a type of AI model that has been trained on a large, diverse dataset to develop a general understanding of the world. This foundational model can then be fine-tuned or adapted to specific tasks, in this case, the identification of eye diseases from fundus images.

By drawing on this broad knowledge base, the AI system is able to recognize both common and rare eye conditions, which can be challenging for human doctors to diagnose, especially the rarer ones. The researchers demonstrate that their model performs well on a variety of benchmark datasets, suggesting it could be a valuable tool for assisting medical professionals in the diagnosis of eye diseases.

Technical Explanation

The researchers developed a vision-language foundation model that leverages advancements in multimodal learning to improve the identification of both common and rare fundus diseases. The model is built upon a hybrid trio architecture that combines visual and language-based information to make accurate diagnoses.

The visual component of the model is a convolutional neural network (CNN) that processes the fundus images, while the language component is a transformer-based model that encodes information about the over 400 eye diseases in the model's knowledge base. These two components are then integrated using a fusion module to make the final disease predictions.

The researchers evaluated the performance of their model on several benchmark datasets, including Retina-CLIP, which contains a diverse set of fundus images and disease labels. Their results demonstrate the model's ability to accurately identify a wide range of eye conditions, including both common and rare diseases, outperforming previous state-of-the-art approaches.

Critical Analysis

The researchers have made a compelling case for the use of vision-language foundation models in the diagnosis of fundus diseases. By leveraging a large knowledge base of eye conditions, the model is able to recognize a broader range of diseases, including rare and challenging cases that can be difficult for human experts to identify.

However, the paper does not fully address the potential limitations of this approach. For instance, the model's performance may be dependent on the quality and completeness of the underlying knowledge base, which could be challenging to maintain and update as new diseases and research emerge. Additionally, the model's reliance on fundus images may limit its applicability in cases where other diagnostic information, such as patient history or laboratory tests, are required for accurate diagnosis.

Further research is needed to explore the generalizability of this approach across different medical domains and to address potential issues related to data quality, model interpretability, and clinical integration. Nonetheless, this work represents an important step forward in the development of AI-powered tools for assisting medical professionals in the diagnosis and management of eye diseases.

Conclusion

The paper presents a novel vision-language foundation model for the identification of both common and rare fundus diseases. By leveraging advancements in deep learning and multimodal learning, the model demonstrates strong performance on a variety of benchmark datasets, suggesting its potential as a valuable tool for assisting medical professionals in the diagnosis and management of eye conditions.

While further research is needed to address the model's limitations and improve its clinical integration, this work represents an important step forward in the development of AI-powered tools for healthcare. The ability to accurately identify a broad range of eye diseases, including rare and challenging cases, could have significant implications for improving patient outcomes and reducing the burden on healthcare systems.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Common and Rare Fundus Diseases Identification Using Vision-Language Foundation Model with Knowledge of Over 400 Diseases

Meng Wang, Tian Lin, Aidi Lin, Kai Yu, Yuanyuan Peng, Lianyu Wang, Cheng Chen, Ke Zou, Huiyu Liang, Man Chen, Xue Yao, Meiqin Zhang, Binwei Huang, Chaoxin Zheng, Peixin Zhang, Wei Chen, Yilong Luo, Yifan Chen, Honghe Xia, Tingkun Shi, Qi Zhang, Jinming Guo, Xiaolin Chen, Jingcheng Wang, Yih Chung Tham, Dianbo Liu, Wendy Wong, Sahil Thakur, Beau Fenner, Danqi Fang, Siying Liu, Qingyun Liu, Yuqiang Huang, Hongqiang Zeng, Yanda Meng, Yukun Zhou, Zehua Jiang, Minghui Qiu, Changqing Zhang, Xinjian Chen, Sophia Y Wang, Cecilia S Lee, Lucia Sobrin, Carol Y Cheung, Chi Pui Pang, Pearse A Keane, Ching-Yu Cheng, Haoyu Chen, Huazhu Fu

Previous foundation models for retinal images were pre-trained with limited disease categories and knowledge base. Here we introduce RetiZero, a vision-language foundation model that leverages knowledge from over 400 fundus diseases. To RetiZero's pre-training, we compiled 341,896 fundus images paired with text descriptions, sourced from public datasets, ophthalmic literature, and online resources, encompassing a diverse range of diseases across multiple ethnicities and countries. RetiZero exhibits superior performance in several downstream tasks, including zero-shot disease recognition, image-to-image retrieval, and internal- and cross-domain disease identification. In zero-shot scenarios, RetiZero achieves Top5 accuracy scores of 0.8430 for 15 fundus diseases and 0.7561 for 52 fundus diseases. For image retrieval, it achieves Top5 scores of 0.9500 and 0.8860 for the same disease sets, respectively. Clinical evaluations show that RetiZero's Top3 zero-shot performance surpasses the average of 19 ophthalmologists from Singapore, China and the United States. Furthermore, RetiZero significantly enhances clinicians' accuracy in diagnosing fundus disease. These findings underscore the value of integrating the RetiZero foundation model into clinical settings, where a variety of fundus diseases are encountered.

7/2/2024

A Disease-Specific Foundation Model Using Over 100K Fundus Images: Release and Validation for Abnormality and Multi-Disease Classification on Downstream Tasks

Boa Jang, Youngbin Ahn, Eun Kyung Choe, Chang Ki Yoon, Hyuk Jin Choi, Young-Gon Kim

Artificial intelligence applied to retinal images offers significant potential for recognizing signs and symptoms of retinal conditions and expediting the diagnosis of eye diseases and systemic disorders. However, developing generalized artificial intelligence models for medical data often requires a large number of labeled images representing various disease signs, and most models are typically task-specific, focusing on major retinal diseases. In this study, we developed a Fundus-Specific Pretrained Model (Image+Fundus), a supervised artificial intelligence model trained to detect abnormalities in fundus images. A total of 57,803 images were used to develop this pretrained model, which achieved superior performance across various downstream tasks, indicating that our proposed model outperforms other general methods. Our Image+Fundus model offers a generalized approach to improve model performance while reducing the number of labeled datasets required. Additionally, it provides more disease-specific insights into fundus images, with visualizations generated by our model. These disease-specific foundation models are invaluable in enhancing the performance and efficiency of deep learning models in the field of fundus imaging.

8/19/2024

🔮

Diagnosis of Multiple Fundus Disorders Amidst a Scarcity of Medical Experts Via Self-supervised Machine Learning

Yong Liu, Mengtian Kang, Shuo Gao, Chi Zhang, Ying Liu, Shiming Li, Yue Qi, Arokia Nathan, Wenjun Xu, Chenyu Tang, Edoardo Occhipinti, Mayinuer Yusufu, Ningli Wang, Weiling Bai, Luigi Occhipinti

Fundus diseases are major causes of visual impairment and blindness worldwide, especially in underdeveloped regions, where the shortage of ophthalmologists hinders timely diagnosis. AI-assisted fundus image analysis has several advantages, such as high accuracy, reduced workload, and improved accessibility, but it requires a large amount of expert-annotated data to build reliable models. To address this dilemma, we propose a general self-supervised machine learning framework that can handle diverse fundus diseases from unlabeled fundus images. Our method's AUC surpasses existing supervised approaches by 15.7%, and even exceeds performance of a single human expert. Furthermore, our model adapts well to various datasets from different regions, races, and heterogeneous image sources or qualities from multiple cameras or devices. Our method offers a label-free general framework to diagnose fundus diseases, which could potentially benefit telehealth programs for early screening of people at risk of vision loss.

4/24/2024

Adaptive Multiscale Retinal Diagnosis: A Hybrid Trio-Model Approach for Comprehensive Fundus Multi-Disease Detection Leveraging Transfer Learning and Siamese Networks

Yavuz Selim Inan

WHO has declared that more than 2.2 billion people worldwide are suffering from visual disorders, such as media haze, glaucoma, and drusen. At least 1 billion of these cases could have been either prevented or successfully treated, yet they remain unaddressed due to poverty, a lack of specialists, inaccurate ocular fundus diagnoses by ophthalmologists, or the presence of a rare disease. To address this, the research has developed the Hybrid Trio-Network Model Algorithm for accurately diagnosing 12 distinct common and rare eye diseases. This algorithm utilized the RFMiD dataset of 3,200 fundus images and the Binary Relevance Method to detect diseases separately, ensuring expandability and avoiding incorrect correlations. Each detector, incorporating finely tuned hyperparameters to optimize performance, consisted of three feature components: A classical transfer learning CNN model, a two-stage CNN model, and a Siamese Network. The diagnosis was made using features extracted through this Trio-Model with Ensembled Machine Learning algorithms. The proposed model achieved an average accuracy of 97% and an AUC score of 0.96. Compared to past benchmark studies, an increase of over 10% in the F1-score was observed for most diseases. Furthermore, using the Siamese Network, the model successfully made predictions in diseases like optic disc pallor, which past studies failed to predict due to low confidence. This diagnostic tool presents a stable, adaptive, cost-effective, efficient, accessible, and fast solution for globalizing early detection of both common and rare diseases.

5/30/2024