Weakly-supervised Autism Severity Assessment in Long Videos

Read original: arXiv:2407.09159 - Published 7/15/2024 by Abid Ali, Mahmoud Ali, Jean-Marc Odobez, Camilla Barbini, S'everine Dubuisson, Francois Bremond, Susanne Thummler

Weakly-supervised Autism Severity Assessment in Long Videos

Overview

This paper presents a weakly-supervised approach for assessing autism severity in long, unconstrained videos of individuals.
The researchers developed a model that can detect autism-related behaviors in videos without requiring explicit annotations or labels for training.
The goal is to enable more scalable and less invasive autism assessment compared to traditional clinical methods.

Plain English Explanation

The researchers have created a new way to evaluate how severe someone's autism is by looking at videos of them, without needing detailed information about the person's behavior. Traditionally, assessing autism severity involves detailed clinical observations and tests, which can be time-consuming and intrusive.

This new approach uses machine learning to automatically detect signs of autism in longer, unstructured videos of people going about their everyday activities. The model is trained to recognize behaviors associated with autism, like repetitive movements or difficulty with social interaction, without requiring the videos to be labeled or annotated ahead of time.

By making the assessment process more automated and less reliant on intensive clinical observation, the researchers hope this technology can lead to more widespread and frequent monitoring of autism symptoms, potentially allowing for earlier intervention and support. This aligns with other recent efforts to leverage technology for earlier autism diagnosis and [understanding](https://aimodels.fyi/papers/arxiv/hear-me-see-me-understand-me-audio, https://aimodels.fyi/papers/arxiv/automatic-voice-classification-autistic-subjects, https://aimodels.fyi/papers/arxiv/exploring-speech-pattern-disorders-autism-using-machine).

Technical Explanation

The paper introduces a weakly-supervised framework for assessing autism severity from long, unconstrained videos of individuals. The key innovation is that the model can be trained to detect autism-related behaviors without requiring the videos to be pre-labeled or annotated.

The approach involves two main components:

A video encoder network that learns visual features associated with autism-related behaviors from the unlabeled video data. This leverages self-supervised learning techniques to extract meaningful representations without explicit annotations.
A regression head that takes the video features and outputs a predicted autism severity score. This is trained in a weakly-supervised manner, using only high-level video-level labels of autism severity rather than frame-level annotations.

The researchers evaluate their model on a dataset of long videos of children, demonstrating its ability to accurately assess autism severity compared to clinical assessments. This provides a promising step towards more scalable and less invasive autism monitoring using computer vision techniques.

Critical Analysis

The proposed approach represents an interesting advancement in using weakly-supervised learning for autism assessment, addressing some of the limitations of traditional clinical methods. By avoiding the need for detailed annotations, the framework can be more easily scaled to larger and more diverse video datasets.

However, the paper also acknowledges several caveats and areas for further research. The dataset used is still relatively small, and may not fully capture the diversity of autism symptomatology. Additionally, the model's predictions are not directly interpretable, making it difficult to understand which specific behavioral cues are driving the severity assessments.

Further work is needed to better understand the model's decision-making process and ensure its robustness across different populations and settings. Longitudinal studies tracking how the model's assessments align with clinical outcomes over time would also be valuable.

Additionally, while the weakly-supervised approach is an important step, there are still open questions around how to best integrate this type of technology into real-world clinical workflows in an ethical and responsible manner. Careful consideration of privacy, consent, and potential biases will be crucial as these tools move towards deployment.

Conclusion

This paper presents a novel weakly-supervised framework for assessing autism severity from long, unstructured videos. By avoiding the need for detailed behavioral annotations, the approach has the potential to enable more scalable and less invasive autism monitoring compared to traditional clinical methods.

The results demonstrate the model's ability to accurately predict autism severity scores, suggesting this technology could support earlier detection, more frequent assessment, and ultimately, improved interventions and support for individuals on the autism spectrum. As the field continues to explore the use of computer vision and [other AI techniques](https://aimodels.fyi/papers/arxiv/early-autism-diagnosis-based-path-signature-siamese, https://aimodels.fyi/papers/arxiv/hear-me-see-me-understand-me-audio, https://aimodels.fyi/papers/arxiv/automatic-voice-classification-autistic-subjects, https://aimodels.fyi/papers/arxiv/exploring-speech-pattern-disorders-autism-using-machine) for autism assessment and intervention, this work represents an important step forward.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Weakly-supervised Autism Severity Assessment in Long Videos

Abid Ali, Mahmoud Ali, Jean-Marc Odobez, Camilla Barbini, S'everine Dubuisson, Francois Bremond, Susanne Thummler

Autism Spectrum Disorder (ASD) is a diverse collection of neurobiological conditions marked by challenges in social communication and reciprocal interactions, as well as repetitive and stereotypical behaviors. Atypical behavior patterns in a long, untrimmed video can serve as biomarkers for children with ASD. In this paper, we propose a video-based weakly-supervised method that takes spatio-temporal features of long videos to learn typical and atypical behaviors for autism detection. On top of that, we propose a shallow TCN-MLP network, which is designed to further categorize the severity score. We evaluate our method on actual evaluation videos of children with autism collected and annotated (for severity score) by clinical professionals. Experimental results demonstrate the effectiveness of behavioral biomarkers that could help clinicians in autism spectrum analysis.

7/15/2024

A Novel Dataset for Video-Based Autism Classification Leveraging Extra-Stimulatory Behavior

Manuel Serna-Aguilera, Xuan Bac Nguyen, Han-Seok Seo, Khoa Luu

Autism Spectrum Disorder (ASD) can affect individuals at varying degrees of intensity, from challenges in overall health, communication, and sensory processing, and this often begins at a young age. Thus, it is critical for medical professionals to be able to accurately diagnose ASD in young children, but doing so is difficult. Deep learning can be responsibly leveraged to improve productivity in addressing this task. The availability of data, however, remains a considerable obstacle. Hence, in this work, we introduce the Video ASD dataset--a dataset that contains video frame convolutional and attention map feature data--to foster further progress in the task of ASD classification. The original videos showcase children reacting to chemo-sensory stimuli, among auditory, touch, and vision This dataset contains the features of the frames spanning 2,467 videos, for a total of approximately 1.4 million frames. Additionally, head pose angles are included to account for head movement noise, as well as full-sentence text labels for the taste and smell videos that describe how the facial expression changes before, immediately after, and long after interaction with the stimuli. In addition to providing features, we also test foundation models on this data to showcase how movement noise affects performance and the need for more data and more complex labels.

9/10/2024

$Ensemble Modeling of Multiple Physical Indicators to Dynamically Phenotype Autism Spectrum Disorder$

Ensemble Modeling of Multiple Physical Indicators to Dynamically Phenotype Autism Spectrum Disorder

Marie Huynh (Stanford University), Aaron Kline (Stanford University), Saimourya Surabhi (Stanford University), Kaitlyn Dunlap (Stanford University), Onur Cezmi Mutlu (Stanford University), Mohammadmahdi Honarmand (Stanford University), Parnian Azizian (Stanford University), Peter Washington (University of Hawaii at Manoa), Dennis P. Wall (Stanford University)

Early detection of autism, a neurodevelopmental disorder marked by social communication challenges, is crucial for timely intervention. Recent advancements have utilized naturalistic home videos captured via the mobile application GuessWhat. Through interactive games played between children and their guardians, GuessWhat has amassed over 3,000 structured videos from 382 children, both diagnosed with and without Autism Spectrum Disorder (ASD). This collection provides a robust dataset for training computer vision models to detect ASD-related phenotypic markers, including variations in emotional expression, eye contact, and head movements. We have developed a protocol to curate high-quality videos from this dataset, forming a comprehensive training set. Utilizing this set, we trained individual LSTM-based models using eye gaze, head positions, and facial landmarks as input features, achieving test AUCs of 86%, 67%, and 78%, respectively. To boost diagnostic accuracy, we applied late fusion techniques to create ensemble models, improving the overall AUC to 90%. This approach also yielded more equitable results across different genders and age groups. Our methodology offers a significant step forward in the early detection of ASD by potentially reducing the reliance on subjective assessments and making early identification more accessibly and equitable.

8/26/2024

Localizing Moments of Actions in Untrimmed Videos of Infants with Autism Spectrum Disorder

Halil Ismail Helvaci, Sen-ching Samson Cheung, Chen-Nee Chuah, Sally Ozonoff

Autism Spectrum Disorder (ASD) presents significant challenges in early diagnosis and intervention, impacting children and their families. With prevalence rates rising, there is a critical need for accessible and efficient screening tools. Leveraging machine learning (ML) techniques, in particular Temporal Action Localization (TAL), holds promise for automating ASD screening. This paper introduces a self-attention based TAL model designed to identify ASD-related behaviors in infant videos. Unlike existing methods, our approach simplifies complex modeling and emphasizes efficiency, which is essential for practical deployment in real-world scenarios. Importantly, this work underscores the importance of developing computer vision methods capable of operating in naturilistic environments with little equipment control, addressing key challenges in ASD screening. This study is the first to conduct end-to-end temporal action localization in untrimmed videos of infants with ASD, offering promising avenues for early intervention and support. We report baseline results of behavior detection using our TAL model. We achieve 70% accuracy for look face, 79% accuracy for look object, 72% for smile and 65% for vocalization.

4/10/2024