Deep Learning Models to Automate the Scoring of Hand Radiographs for Rheumatoid Arthritis

Read original: arXiv:2406.09980 - Published 6/17/2024 by Zhiyan Bo, Laura C. Coates, Bartlomiej W. Papiez

Deep Learning Models to Automate the Scoring of Hand Radiographs for Rheumatoid Arthritis

Overview

Researchers developed deep learning models to automate the scoring of hand radiographs for rheumatoid arthritis
The models were trained on a dataset of hand radiographs with expert-provided scores
Transfer learning was used to leverage pre-trained models and improve performance
The models were evaluated on their ability to accurately predict the radiograph scores

Plain English Explanation

Rheumatoid arthritis is a condition that can cause damage to the joints, especially in the hands and feet. Doctors often use X-ray images of the hands to assess the severity of this damage and track the progression of the disease. However, scoring these X-ray images manually can be time-consuming and subjective.

In this study, the researchers set out to develop [object Object] that could analyze hand X-ray images and provide scores that match those of expert human assessors. They trained these models using a dataset of hand X-ray images that had been previously scored by medical experts.

By leveraging [object Object], the researchers were able to fine-tune pre-existing deep learning models to work effectively on the task of scoring hand X-rays for rheumatoid arthritis. This allowed them to build robust models without having to start from scratch.

The researchers then tested their models on a separate set of hand X-ray images to see how well they could predict the expert-provided scores. Their results showed that the deep learning models were able to match the human experts quite well, suggesting that they could be a useful tool for automating this task in clinical practice.

Technical Explanation

The researchers used [object Object] that had been scored by medical experts according to the Larsen scoring system, a widely used method for assessing joint damage in rheumatoid arthritis. They split this dataset into training, validation, and test sets.

They then experimented with several different deep learning architectures, including convolutional neural networks (CNNs) and transformer-based models. The models were trained to take in the hand radiograph images and output a predicted Larsen score.

To leverage [object Object], the researchers used pre-trained models that had been trained on large general-purpose image datasets, such as ImageNet. They fine-tuned these pre-trained models on the hand radiograph dataset, which allowed the models to learn the task-specific features needed for accurate rheumatoid arthritis scoring.

The researchers evaluated their models' performance on the held-out test set, measuring metrics such as mean absolute error, root mean squared error, and correlation with the expert-provided Larsen scores. Their results showed that the deep learning models were able to achieve strong performance, rivaling the scoring accuracy of human experts.

Critical Analysis

The researchers acknowledge several limitations of their study. First, the dataset used was relatively small, with only around 1,000 hand radiographs. Expanding the dataset size could help improve the models' generalization capabilities.

Additionally, the dataset only included radiographs from a single institution, which may limit the models' performance on data from other healthcare settings. Further validation on more diverse datasets would be necessary to assess the models' robustness.

The researchers also note that while the models performed well on average, there were still some individual cases where the model predictions diverged significantly from the expert scores. Understanding the source of these errors and developing strategies to address them could further improve the models' reliability.

Finally, the researchers did not explore the potential [object Object] of their automated scoring system, such as its impact on workflow efficiency or clinical decision-making. Prospective studies in real-world clinical settings would be needed to fully assess the value of this technology.

Conclusion

This study demonstrates the potential of deep learning models to automate the scoring of hand radiographs for rheumatoid arthritis, potentially streamlining a time-consuming and subjective clinical task. The researchers' use of transfer learning allowed them to develop robust models without requiring an extremely large dataset.

While further research is needed to address the limitations and fully assess the clinical utility of this approach, the promising results suggest that automated radiograph scoring could become a valuable tool in the management of rheumatoid arthritis. As deep learning continues to advance, we may see increasing adoption of such AI-powered technologies in various medical imaging applications.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Deep Learning Models to Automate the Scoring of Hand Radiographs for Rheumatoid Arthritis

Zhiyan Bo, Laura C. Coates, Bartlomiej W. Papiez

The van der Heijde modification of the Sharp (SvdH) score is a widely used radiographic scoring method to quantify damage in Rheumatoid Arthritis (RA) in clinical trials. However, its complexity with a necessity to score each individual joint, and the expertise required limit its application in clinical practice, especially in disease progression measurement. In this work, we addressed this limitation by developing a bespoke, automated pipeline that is capable of predicting the SvdH score and RA severity from hand radiographs without the need to localise the joints first. Using hand radiographs from RA and suspected RA patients, we first investigated the performance of the state-of-the-art architectures in predicting the total SvdH score for hands and wrists and its corresponding severity class. Secondly, we leveraged publicly available data sets to perform transfer learning with different finetuning schemes and ensemble learning, which resulted in substantial improvement in model performance being on par with an experienced human reader. The best model for RA scoring achieved a Pearson's correlation coefficient (PCC) of 0.925 and root mean squared error (RMSE) of 18.02, while the best model for RA severity classification achieved an accuracy of 0.358 and PCC of 0.859. Our score prediction model attained almost comparable accuracy with experienced radiologists (PCC = 0.97, RMSE = 18.75). Finally, using Grad-CAM, we showed that our models could focus on the anatomical structures in hands and wrists which clinicians deemed as relevant to RA progression in the majority of cases.

6/17/2024

🔮

Modified Risk Formulation for Improving the Prediction of Knee Osteoarthritis Progression

Haresh Rengaraj Rajamohan, Richard Kijowski, Kyunghyun Cho, Cem M. Deniz

Current methods for predicting osteoarthritis (OA) outcomes do not incorporate disease specific prior knowledge to improve the outcome prediction models. We developed a novel approach that effectively uses consecutive imaging studies to improve OA outcome predictions by incorporating an OA severity constraint. This constraint ensures that the risk of OA for a knee should either increase or remain the same over time. DL models were trained to predict TKR within multiple time periods (1 year, 2 years, and 4 years) using knee radiographs and MRI scans. Models with and without the risk constraint were evaluated using the area under the receiver operator curve (AUROC) and the area under the precision recall curve (AUPRC) analysis. The novel RiskFORM2 method, leveraging a dual model risk constraint architecture, demonstrated superior performance, yielding an AUROC of 0.87 and AUPRC of 0.47 for 1 year TKR prediction on the OAI radiograph test set, a marked improvement over the 0.79 AUROC and 0.34 AUPRC of the baseline approach. The performance advantage extended to longer followup periods, with RiskFORM2 maintaining a high AUROC of 0.86 and AUPRC of 0.75 in predicting TKR within 4 years. Additionally, when generalizing to the external MOST radiograph test set, RiskFORM2 generalized better with an AUROC of 0.77 and AUPRC of 0.25 for 1 year predictions, which was higher than the 0.71 AUROC and 0.19 AUPRC of the baseline approach. In the MRI test sets, similar patterns emerged, with RiskFORM2 outperforming the baseline approach consistently. However, RiskFORM1 exhibited the highest AUROC of 0.86 and AUPRC of 0.72 for 4 year predictions on the OAI set.

6/17/2024

Rethinking Knee Osteoarthritis Severity Grading: A Few Shot Self-Supervised Contrastive Learning Approach

Niamh Belton, Misgina Tsighe Hagos, Aonghus Lawlor, Kathleen M. Curran

Knee Osteoarthritis (OA) is a debilitating disease affecting over 250 million people worldwide. Currently, radiologists grade the severity of OA on an ordinal scale from zero to four using the Kellgren-Lawrence (KL) system. Recent studies have raised concern in relation to the subjectivity of the KL grading system, highlighting the requirement for an automated system, while also indicating that five ordinal classes may not be the most appropriate approach for assessing OA severity. This work presents preliminary results of an automated system with a continuous grading scale. This system, namely SS-FewSOME, uses self-supervised pre-training to learn robust representations of the features of healthy knee X-rays. It then assesses the OA severity by the X-rays' distance to the normal representation space. SS-FewSOME initially trains on only 'few' examples of healthy knee X-rays, thus reducing the barriers to clinical implementation by eliminating the need for large training sets and costly expert annotations that existing automated systems require. The work reports promising initial results, obtaining a positive Spearman Rank Correlation Coefficient of 0.43, having had access to only 30 ground truth labels at training time.

7/16/2024

An AI System for Continuous Knee Osteoarthritis Severity Grading Using Self-Supervised Anomaly Detection with Limited Data

Niamh Belton, Aonghus Lawlor, Kathleen M. Curran

The diagnostic accuracy and subjectivity of existing Knee Osteoarthritis (OA) ordinal grading systems has been a subject of on-going debate and concern. Existing automated solutions are trained to emulate these imperfect systems, whilst also being reliant on large annotated databases for fully-supervised training. This work proposes a three stage approach for automated continuous grading of knee OA that is built upon the principles of Anomaly Detection (AD); learning a robust representation of healthy knee X-rays and grading disease severity based on its distance to the centre of normality. In the first stage, SS-FewSOME is proposed, a self-supervised AD technique that learns the 'normal' representation, requiring only examples of healthy subjects and <3% of the labels that existing methods require. In the second stage, this model is used to pseudo label a subset of unlabelled data as 'normal' or 'anomalous', followed by denoising of pseudo labels with CLIP. The final stage involves retraining on labelled and pseudo labelled data using the proposed Dual Centre Representation Learning (DCRL) which learns the centres of two representation spaces; normal and anomalous. Disease severity is then graded based on the distance to the learned centres. The proposed methodology outperforms existing techniques by margins of up to 24% in terms of OA detection and the disease severity scores correlate with the Kellgren-Lawrence grading system at the same level as human expert performance. Code available at https://github.com/niamhbelton/SS-FewSOME_Disease_Severity_Knee_Osteoarthritis.

7/17/2024