A Comprehensive Rubric for Annotating Pathological Speech

Read original: arXiv:2404.18851 - Published 4/30/2024 by Mario Corrales-Astorgano, David Escudero-Mancebo, Lourdes Aguilar, Valle Flores-Lucas, Valent'in Carde~noso-Payo, Carlos Vivaracho-Pascual, C'esar Gonz'alez-Ferreras
Total Score

0

A Comprehensive Rubric for Annotating Pathological Speech

Sign in to get full access

or

If you already have an account, we'll log you in

Overview

  • This paper presents a comprehensive rubric for annotating pathological speech, which is speech that deviates from the norm due to various medical conditions.
  • The rubric covers a wide range of speech characteristics, including articulation, voice quality, fluency, and prosody, and provides guidelines for assessing the severity of each characteristic.
  • The authors aim to standardize the process of annotating pathological speech, which is essential for developing and evaluating speech recognition and analysis systems for people with speech disorders.

Plain English Explanation

The paper describes a detailed set of guidelines, or a "rubric," for evaluating and categorizing different types of abnormal speech. This is important because many people have medical conditions that affect how they speak, and researchers need a consistent way to measure and understand these speech patterns.

The rubric covers various aspects of speech, such as how clearly words are pronounced, the quality of the voice, how fluently the person speaks, and the rhythm and emphasis of their speech. For each of these characteristics, the rubric provides a scale for assessing the severity of the deviation from normal speech.

By having a standardized way to analyze pathological speech, it will be easier for researchers to develop and test speech recognition systems that can work well for people with speech disorders. This is important because these technologies can help people with speech difficulties communicate more effectively. The rubric can also be used to create large, high-quality datasets of annotated pathological speech that can be used to train advanced speech models.

Technical Explanation

The paper presents a comprehensive rubric for annotating pathological speech, which covers a wide range of speech characteristics, including articulation, voice quality, fluency, and prosody. The rubric provides detailed guidelines for assessing the severity of each characteristic on a scale from 0 (normal) to 4 (severe).

The authors conducted a systematic review of existing speech annotation frameworks and clinical assessment tools to inform the development of the rubric. They also consulted with speech-language pathologists to ensure the rubric aligns with clinical practice.

To validate the rubric, the authors evaluated its inter-rater reliability by having multiple annotators assess the same speech samples. They found high levels of agreement between the annotators, demonstrating the consistency and reliability of the rubric.

The rubric is designed to be used in both clinical and research settings, and the authors provide instructions for its application, including guidelines for segmenting speech samples and training annotators. The rubric can be used to create large datasets of annotated pathological speech that can be used to develop and evaluate speech recognition and analysis systems for people with speech disorders.

Critical Analysis

The paper presents a well-designed and comprehensive rubric for annotating pathological speech, which addresses a crucial need in the field. The authors have clearly put a lot of thought and effort into developing the rubric, drawing on existing frameworks and clinical expertise.

One potential limitation of the study is the relatively small sample size used for the inter-rater reliability evaluation. While the authors report high levels of agreement, a larger and more diverse set of speech samples could provide further validation of the rubric's reliability.

Additionally, the paper does not address the potential impact of speech anonymization on the assessment of pathological speech characteristics. This is an important consideration, as speech anonymization can alter the acoustic features that are crucial for identifying and analyzing speech disorders.

Overall, the rubric presented in this paper is a valuable contribution to the field of speech-language pathology and has the potential to significantly improve the consistency and quality of pathological speech annotation, which in turn can lead to better speech recognition and analysis systems for individuals with speech disorders.

Conclusion

This paper introduces a comprehensive rubric for annotating pathological speech, which provides a standardized framework for assessing a wide range of speech characteristics. The rubric has been validated for reliability and can be used in both clinical and research settings to create high-quality datasets of annotated pathological speech.

The availability of such datasets is crucial for the development and evaluation of speech recognition and analysis systems that can effectively support individuals with speech disorders. By providing a reliable and consistent way to measure and categorize pathological speech, this rubric represents an important step forward in improving the accessibility and effectiveness of speech technologies for people with various medical conditions affecting their speech.



This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

A Comprehensive Rubric for Annotating Pathological Speech
Total Score

0

A Comprehensive Rubric for Annotating Pathological Speech

Mario Corrales-Astorgano, David Escudero-Mancebo, Lourdes Aguilar, Valle Flores-Lucas, Valent'in Carde~noso-Payo, Carlos Vivaracho-Pascual, C'esar Gonz'alez-Ferreras

Rubrics are a commonly used tool for labeling voice corpora in speech quality assessment, although their application in the context of pathological speech remains relatively limited. In this study, we introduce a comprehensive rubric based on various dimensions of speech quality, including phonetics, fluency, and prosody. The objective is to establish standardized criteria for identifying errors within the speech of individuals with Down syndrome, thereby enabling the development of automated assessment systems. To achieve this objective, we utilized the Prautocal corpus. To assess the quality of annotations using our rubric, two experiments were conducted, focusing on phonetics and fluency. For phonetic evaluation, we employed the Goodness of Pronunciation (GoP) metric, utilizing automatic segmentation systems and correlating the results with evaluations conducted by a specialized speech therapist. While the obtained correlation values were not notably high, a positive trend was observed. In terms of fluency assessment, deep learning models like wav2vec were used to extract audio features, and we employed an SVM classifier trained on a corpus focused on identifying fluency issues to categorize Prautocal corpus samples. The outcomes highlight the complexities of evaluating such phenomena, with variability depending on the specific type of disfluency detected.

Read more

4/30/2024

Selfsupervised learning for pathological speech detection
Total Score

0

Selfsupervised learning for pathological speech detection

Shakeel Ahmad Sheikh

Speech production is a complex phenomenon, wherein the brain orchestrates a sequence of processes involving thought processing, motor planning, and the execution of articulatory movements. However, this intricate execution of various processes is susceptible to influence and disruption by various neurodegenerative pathological speech disorders, such as Parkinsons' disease, resulting in dysarthria, apraxia, and other conditions. These disorders lead to pathological speech characterized by abnormal speech patterns and imprecise articulation. Diagnosing these speech disorders in clinical settings typically involves auditory perceptual tests, which are time-consuming, and the diagnosis can vary among clinicians based on their experiences, biases, and cognitive load during the diagnosis. Additionally, unlike neurotypical speakers, patients with speech pathologies or impairments are unable to access various virtual assistants such as Alexa, Siri, etc. To address these challenges, several automatic pathological speech detection (PSD) approaches have been proposed. These approaches aim to provide efficient and accurate detection of speech disorders, thereby facilitating timely intervention and support for individuals affected by these conditions. These approaches mainly vary in two aspects: the input representations utilized and the classifiers employed. Due to the limited availability of data, the performance of detection remains subpar. Self-supervised learning (SSL) embeddings, such as wav2vec2, and their multilingual versions, are being explored as a promising avenue to improve performance. These embeddings leverage self-supervised learning techniques to extract rich representations from audio data, thereby offering a potential solution to address the limitations posed by the scarcity of labeled data.

Read more

6/6/2024

🗣️

Total Score

0

New!Learnings from curating a trustworthy, well-annotated, and useful dataset of disordered English speech

Pan-Pan Jiang, Jimmy Tobin, Katrin Tomanek, Robert L. MacDonald, Katie Seaver, Richard Cave, Marilyn Ladewig, Rus Heywood, Jordan R. Green

Project Euphonia, a Google initiative, is dedicated to improving automatic speech recognition (ASR) of disordered speech. A central objective of the project is to create a large, high-quality, and diverse speech corpus. This report describes the project's latest advancements in data collection and annotation methodologies, such as expanding speaker diversity in the database, adding human-reviewed transcript corrections and audio quality tags to 350K (of the 1.2M total) audio recordings, and amassing a comprehensive set of metadata (including more than 40 speech characteristic labels) for over 75% of the speakers in the database. We report on the impact of transcript corrections on our machine-learning (ML) research, inter-rater variability of assessments of disordered speech patterns, and our rationale for gathering speech metadata. We also consider the limitations of using automated off-the-shelf annotation methods for assessing disordered speech.

Read more

9/17/2024

🗣️

Total Score

0

A pilot protocol and cohort for the investigation of non-pathological variability in speech

Nicholas Cummins, Lauren L. White, Zahia Rahman, Catriona Lucas, Tian Pan, Ewan Carr, Faith Matcham, Johnny Downs, Richard J. Dobson, Judith Dineley

Background Speech-based biomarkers have potential as a means for regular, objective assessment of symptom severity, remotely and in-clinic in combination with advanced analytical models. However, the complex nature of speech and the often subtle changes associated with health mean that findings are highly dependent on methodological and cohort choices. These are often not reported adequately in studies investigating speech-based health assessment Objective To develop and apply an exemplar protocol to generate a pilot dataset of healthy speech with detailed metadata for the assessment of factors in the speech recording-analysis pipeline, including device choice, speech elicitation task and non-pathological variability. Methods We developed our collection protocol and choice of exemplar speech features based on a thematic literature review. Our protocol includes the elicitation of three different speech types. With a focus towards remote applications, we also choose to collect speech with three different microphone types. We developed a pipeline to extract a set of 14 exemplar speech features. Results We collected speech from 28 individuals three times in one day, repeated at the same times 8-11 weeks later, and from 25 healthy individuals three times in one week. Participant characteristics collected included sex, age, native language status and voice use habits of the participant. A preliminary set of 14 speech features covering timing, prosody, voice quality, articulation and spectral moment characteristics were extracted that provide a resource of normative values. Conclusions There are multiple methodological factors involved in the collection, processing and analysis of speech recordings. Consistent reporting and greater harmonisation of study protocols are urgently required to aid the translation of speech processing into clinical research and practice.

Read more

6/12/2024