nnMobileNe: Rethinking CNN for Retinopathy Research

Read original: arXiv:2306.01289 - Published 4/17/2024 by Wenhui Zhu, Peijie Qiu, Xiwen Chen, Xin Li, Natasha Lepore, Oana M. Dumitrascu, Yalin Wang

🗣️

Overview

Over the past few decades, convolutional neural networks (CNNs) have been widely used for detecting and tracking various retinal diseases (RD).
However, the emergence of vision transformers (ViT) in the 2020s has led to a shift in the development of RD models.
ViT-based models have shown leading-edge performance in RD applications, largely due to their scalability and ability to improve with more parameters.
While ViT-based models often outperform traditional CNNs in RD, they come at the cost of increased data and computational demands.
The authors of this study revisited and optimized the architecture of a CNN model, specifically MobileNet, to enhance its utility in RD diagnostics.

Plain English Explanation

Retinal diseases (RD), such as diabetic retinopathy and diabetic macular edema, can have serious consequences if not detected and treated early. In recent years, machine learning models, particularly convolutional neural networks (CNNs), have been widely used to help detect and track these diseases. [CNNs are a type of deep learning model that is particularly good at analyzing images.]

However, a newer type of machine learning model called a vision transformer (ViT) has emerged in the last few years and has started to outperform CNNs in many RD applications. ViTs work differently than CNNs, processing images in "patches" rather than local regions, which can make them better at handling the small, variable lesions often seen in RD.

The trade-off is that ViTs require more data and computational power than CNNs to achieve their high performance. In this study, the researchers looked at ways to optimize a specific type of CNN called MobileNet to see if they could match or exceed the performance of ViT-based models, but with lower data and computational requirements.

Through a series of targeted modifications, the researchers were able to create an optimized version of MobileNet that outperformed ViT-based models on several RD benchmarks, including diabetic retinopathy grading, detection of multiple fundus diseases, and classification of diabetic macular edema. This suggests that with the right architectural tweaks, CNNs may still have a role to play in the future of RD diagnostics, offering a balance of performance and efficiency.

Technical Explanation

ViTs also differ from CNNs in their approach to processing images, working with patches rather than local regions, which can complicate the precise localization of small, variably presented lesions in RD. In this study, the researchers revisited and updated the architecture of a CNN model, specifically MobileNet, to enhance its utility in RD diagnostics.

The researchers found that an optimized MobileNet, through selective modifications, can surpass ViT-based models in various RD benchmarks, including diabetic retinopathy grading, detection of multiple fundus diseases, and classification of diabetic macular edema. The code for this optimized MobileNet model is available on GitHub at https://github.com/Retinal-Research/NN-MOBILENET.

Critical Analysis

The researchers in this study have provided an interesting perspective on the ongoing evolution of machine learning models for retinal disease (RD) diagnostics. While vision transformers (ViTs) have shown impressive performance in this domain, the authors have demonstrated that with the right architectural optimizations, convolutional neural networks (CNNs) like MobileNet can still hold their own.

One potential limitation of the study is the specific nature of the benchmarks used, which focused on tasks like diabetic retinopathy grading and diabetic macular edema classification. It would be valuable to see how the optimized MobileNet model performs on a broader range of RD tasks, including the detection and localization of rarer or more complex lesions.

Additionally, the authors do not provide a detailed analysis of the trade-offs between the ViT-based models and their optimized MobileNet in terms of factors like inference time, memory usage, and power consumption. This information could be crucial for real-world deployment, where efficiency and deployability may be just as important as raw performance.

Overall, this study offers a thought-provoking challenge to the narrative that ViTs are the inevitable future of RD diagnostics. By showing that CNNs can still be competitive with the right design choices, the authors encourage researchers and practitioners to continue exploring the full potential of both model architectures and to think critically about the needs and constraints of their specific applications.

Conclusion

This study has demonstrated that with careful optimization, convolutional neural networks (CNNs) like MobileNet can surpass the performance of state-of-the-art vision transformer (ViT) models in various retinal disease (RD) diagnostics tasks. While ViTs have gained significant attention for their leading-edge results, the authors have shown that CNNs can still play a vital role in RD applications, particularly when considering factors like data and computational efficiency.

The optimized MobileNet model developed in this research offers a promising alternative to ViT-based approaches, potentially allowing for more accessible and deployable RD detection and tracking solutions. As the field of machine learning for medical imaging continues to evolve, studies like this one encourage researchers to explore the full breadth of model architectures and to prioritize practical considerations alongside raw performance metrics.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🗣️

nnMobileNe: Rethinking CNN for Retinopathy Research

Wenhui Zhu, Peijie Qiu, Xiwen Chen, Xin Li, Natasha Lepore, Oana M. Dumitrascu, Yalin Wang

Over the past few decades, convolutional neural networks (CNNs) have been at the forefront of the detection and tracking of various retinal diseases (RD). Despite their success, the emergence of vision transformers (ViT) in the 2020s has shifted the trajectory of RD model development. The leading-edge performance of ViT-based models in RD can be largely credited to their scalability-their ability to improve as more parameters are added. As a result, ViT-based models tend to outshine traditional CNNs in RD applications, albeit at the cost of increased data and computational demands. ViTs also differ from CNNs in their approach to processing images, working with patches rather than local regions, which can complicate the precise localization of small, variably presented lesions in RD. In our study, we revisited and updated the architecture of a CNN model, specifically MobileNet, to enhance its utility in RD diagnostics. We found that an optimized MobileNet, through selective modifications, can surpass ViT-based models in various RD benchmarks, including diabetic retinopathy grading, detection of multiple fundus diseases, and classification of diabetic macular edema. The code is available at https://github.com/Retinal-Research/NN-MOBILENET

4/17/2024

🌐

Lesion-aware network for diabetic retinopathy diagnosis

Xue Xia, Kun Zhan, Yuming Fang, Wenhui Jiang, Fei Shen

Deep learning brought boosts to auto diabetic retinopathy (DR) diagnosis, thus, greatly helping ophthalmologists for early disease detection, which contributes to preventing disease deterioration that may eventually lead to blindness. It has been proved that convolutional neural network (CNN)-aided lesion identifying or segmentation benefits auto DR screening. The key to fine-grained lesion tasks mainly lies in: (1) extracting features being both sensitive to tiny lesions and robust against DR-irrelevant interference, and (2) exploiting and re-using encoded information to restore lesion locations under extremely imbalanced data distribution. To this end, we propose a CNN-based DR diagnosis network with attention mechanism involved, termed lesion-aware network, to better capture lesion information from imbalanced data. Specifically, we design the lesion-aware module (LAM) to capture noise-like lesion areas across deeper layers, and the feature-preserve module (FPM) to assist shallow-to-deep feature fusion. Afterward, the proposed lesion-aware network (LANet) is constructed by embedding the LAM and FPM into the CNN decoders for DR-related information utilization. The proposed LANet is then further extended to a DR screening network by adding a classification layer. Through experiments on three public fundus datasets with pixel-level annotations, our method outperforms the mainstream methods with an area under curve of 0.967 in DR screening, and increases the overall average precision by 7.6%, 2.1%, and 1.2% in lesion segmentation on three datasets. Besides, the ablation study validates the effectiveness of the proposed sub-modules.

8/15/2024

Enhancing Eye Disease Diagnosis with Deep Learning and Synthetic Data Augmentation

Saideep Kilaru, Kothamasu Jayachandra, Tanishka Yagneshwar, Suchi Kumari

In recent years, the focus is on improving the diagnosis of diabetic retinopathy (DR) using machine learning and deep learning technologies. Researchers have explored various approaches, including the use of high-definition medical imaging, AI-driven algorithms such as convolutional neural networks (CNNs) and generative adversarial networks (GANs). Among all the available tools, CNNs have emerged as a preferred tool due to their superior classification accuracy and efficiency. Although the accuracy of CNNs is comparatively better but it can be improved by introducing some hybrid models by combining various machine learning and deep learning models. Therefore, in this paper, an ensemble learning technique is proposed for early detection and management of DR with higher accuracy. The proposed model is tested on the APTOS dataset and it is showing supremacy on the validation accuracy ($99%)$ in comparison to the previous models. Hence, the model can be helpful for early detection and treatment of the DR, thereby enhancing the overall quality of care for affected individuals.

7/26/2024

Perception and Localization of Macular Degeneration Applying Convolutional Neural Network, ResNet and Grad-CAM

Tahmim Hossain, Sagor Chandro Bakchy

A well-known retinal disease that sends blurry visions to the affected patients is Macular Degeneration. This research is based on classifying the healthy and macular degeneration fundus by localizing the affected region of the fundus. A CNN architecture and CNN with ResNet architecture (ResNet50, ResNet50v2, ResNet101, ResNet101v2, ResNet152, ResNet152v2) as the backbone are used to classify the two types of fundus. The data are split into three categories including (a) Training set is 90% and Testing set is 10% (b) Training set is 80% and Testing set is 20%, (c) Training set is 50% and Testing set is 50%. After the training, the best model has been selected from the evaluation metrics. Among the models, CNN with a backbone of ResNet50 performs best which gives the training accuracy of 98.7% for 90% train and 10% test data split. With this model, we have performed the Grad-CAM visualization to get the region of the affected area of the fundus.

5/3/2024