Reading Miscue Detection in Primary School through Automatic Speech Recognition

Read original: arXiv:2406.07060 - Published 7/24/2024 by Lingyun Gao, Cristian Tejedor-Garcia, Helmer Strik, Catia Cucchiarini

🔎

Overview

This study investigates how well state-of-the-art Automatic Speech Recognition (ASR) models can recognize Dutch native children's speech and detect reading miscues.
The researchers found that the Hubert Large model fine-tuned on Dutch speech achieved the best phoneme-level performance, while the Whisper (Faster Whisper Large-v2) model had the best word-level performance.
The findings suggest that Wav2Vec2 Large and Whisper are the two best ASR models for detecting reading miscues, with Wav2Vec2 Large showing the highest recall and Whisper exhibiting the highest precision and F1 score.

Plain English Explanation

Automatic reading diagnosis systems can be beneficial for both teachers and students. Teachers can use them to more efficiently score reading exercises, while students can access reading exercises with feedback more easily. However, there has been limited research on Automatic Speech Recognition (ASR) for child speech in languages other than English, as well as limited research on ASR-based reading diagnosis systems.

This study aimed to investigate how well the latest ASR models can recognize Dutch native children's speech and detect reading mistakes (or "miscues"). The researchers found that the Hubert Large model, when fine-tuned on Dutch speech, achieved the best performance at the phoneme level, meaning it could accurately identify the individual sounds in the children's speech. On the other hand, the Whisper (Faster Whisper Large-v2) model had the best performance at the word level, meaning it could accurately identify the actual words the children were saying.

The researchers also found that two specific ASR models, Wav2Vec2 Large and Whisper, were the best for detecting reading miscues. Wav2Vec2 Large showed the highest recall, meaning it was able to identify the most reading mistakes. Whisper, on the other hand, exhibited the highest precision and F1 score, meaning it was the most accurate in the mistakes it did identify.

These findings are significant because they suggest that these ASR models, when used in automatic reading diagnosis systems, could help teachers provide more effective feedback to students and help students improve their reading skills more easily.

Technical Explanation

The study investigated the performance of state-of-the-art (SOTA) pre-trained Automatic Speech Recognition (ASR) models in recognizing Dutch native children's speech and detecting reading miscues. The researchers used two key metrics to evaluate the ASR models: phoneme error rate (PER) for phoneme-level recognition and word error rate (WER) for word-level recognition.

The researchers found that the Hubert Large model, when fine-tuned on Dutch speech, achieved the SOTA PER of 23.1%, indicating the highest phoneme-level recognition accuracy. For word-level recognition, the Whisper (Faster Whisper Large-v2) model achieved the SOTA WER of 9.8%, demonstrating the best word-level performance.

To assess the ASR models' ability to detect reading miscues, the researchers used precision, recall, and F1 score as evaluation metrics. Their findings suggest that Wav2Vec2 Large and Whisper are the two best ASR models for this task. Specifically, Wav2Vec2 Large showed the highest recall at 0.83, meaning it was able to identify the most reading miscues. Conversely, Whisper exhibited the highest precision at 0.52 and an F1 score of 0.52, indicating that it was the most accurate in the reading miscues it did identify.

These results provide valuable insights into the performance of SOTA ASR models for Dutch native children's speech recognition and reading miscue detection. The findings can inform the development of more effective automatic reading diagnosis systems, which can benefit both teachers and students.

Critical Analysis

The study provides a comprehensive evaluation of the performance of SOTA ASR models in recognizing Dutch native children's speech and detecting reading miscues. The researchers' use of well-established metrics, such as PER, WER, precision, recall, and F1 score, ensures the rigor and reliability of their findings.

However, the study does have some limitations. First, the researchers only investigated the performance of pre-trained ASR models, and did not explore the potential benefits of fine-tuning or adapting these models specifically for the task of reading miscue detection. Additional research on model adaptation strategies could further improve the performance of these ASR models in this context.

Additionally, the study focuses solely on Dutch language data, and the findings may not be directly transferable to other languages. Comparative studies that examine the performance of these ASR models across multiple languages could provide a more comprehensive understanding of their capabilities.

Furthermore, the study does not explore the potential biases or accuracy limitations of the ASR models when dealing with young children's speech or reading miscues. Investigating these aspects could lead to further improvements in the design and deployment of automatic reading diagnosis systems.

Conclusion

This study provides valuable insights into the performance of state-of-the-art Automatic Speech Recognition (ASR) models in recognizing Dutch native children's speech and detecting reading miscues. The researchers found that the Hubert Large model fine-tuned on Dutch speech achieved the best phoneme-level recognition, while the Whisper (Faster Whisper Large-v2) model had the best word-level performance.

Importantly, the study identified Wav2Vec2 Large and Whisper as the two best ASR models for reading miscue detection, with Wav2Vec2 Large showing the highest recall and Whisper exhibiting the highest precision and F1 score. These findings have significant implications for the development of more effective automatic reading diagnosis systems, which can benefit both teachers and students by streamlining the scoring of reading exercises and providing more accessible and personalized feedback.

While the study has some limitations, it paves the way for further research on adapting and optimizing ASR models for specific tasks, as well as exploring their performance across different languages and user demographics. Continuous advancements in this field have the potential to transform the way we support and empower children in their reading development.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🔎

Reading Miscue Detection in Primary School through Automatic Speech Recognition

Lingyun Gao, Cristian Tejedor-Garcia, Helmer Strik, Catia Cucchiarini

Automatic reading diagnosis systems can benefit both teachers for more efficient scoring of reading exercises and students for accessing reading exercises with feedback more easily. However, there are limited studies on Automatic Speech Recognition (ASR) for child speech in languages other than English, and limited research on ASR-based reading diagnosis systems. This study investigates how efficiently state-of-the-art (SOTA) pretrained ASR models recognize Dutch native children speech and manage to detect reading miscues. We found that Hubert Large finetuned on Dutch speech achieves SOTA phoneme-level child speech recognition (PER at 23.1%), while Whisper (Faster Whisper Large-v2) achieves SOTA word-level performance (WER at 9.8%). Our findings suggest that Wav2Vec2 Large and Whisper are the two best ASR models for reading miscue detection. Specifically, Wav2Vec2 Large shows the highest recall at 0.83, whereas Whisper exhibits the highest precision at 0.52 and an F1 score of 0.52.

7/24/2024

🤯

An ASR-Based Tutor for Learning to Read: How to Optimize Feedback to First Graders

Yu Bai, Cristian Tejedor-Garcia, Ferdy Hubers, Catia Cucchiarini, Helmer Strik

The interest in employing automatic speech recognition (ASR) in applications for reading practice has been growing in recent years. In a previous study, we presented an ASR-based Dutch reading tutor application that was developed to provide instantaneous feedback to first-graders learning to read. We saw that ASR has potential at this stage of the reading process, as the results suggested that pupils made progress in reading accuracy and fluency by using the software. In the current study, we used children's speech from an existing corpus (JASMIN) to develop two new ASR systems, and compared the results to those of the previous study. We analyze correct/incorrect classification of the ASR systems using human transcripts at word level, by means of evaluation measures such as Cohen's Kappa, Matthews Correlation Coefficient (MCC), precision, recall and F-measures. We observe improvements for the newly developed ASR systems regarding the agreement with human-based judgment and correct rejection (CR). The accuracy of the ASR systems varies for different reading tasks and word types. Our results suggest that, in the current configuration, it is difficult to classify isolated words. We discuss these results, possible ways to improve our systems and avenues for future research.

7/24/2024

🎯

Automatic Assessment of Oral Reading Accuracy for Reading Diagnostics

Bo Molenaar, Cristian Tejedor-Garcia, Helmer Strik, Catia Cucchiarini

Automatic assessment of reading fluency using automatic speech recognition (ASR) holds great potential for early detection of reading difficulties and subsequent timely intervention. Precise assessment tools are required, especially for languages other than English. In this study, we evaluate six state-of-the-art ASR-based systems for automatically assessing Dutch oral reading accuracy using Kaldi and Whisper. Results show our most successful system reached substantial agreement with human evaluations (MCC = .63). The same system reached the highest correlation between forced decoding confidence scores and word correctness (r = .45). This system's language model (LM) consisted of manual orthographic transcriptions and reading prompts of the test data, which shows that including reading errors in the LM improves assessment performance. We discuss the implications for developing automatic assessment systems and identify possible avenues of future research.

7/24/2024

🗣️

Automatic Speech Recognition of Non-Native Child Speech for Language Learning Applications

Simone Wills, Yu Bai, Cristian Tejedor-Garcia, Catia Cucchiarini, Helmer Strik

Voicebots have provided a new avenue for supporting the development of language skills, particularly within the context of second language learning. Voicebots, though, have largely been geared towards native adult speakers. We sought to assess the performance of two state-of-the-art ASR systems, Wav2Vec2.0 and Whisper AI, with a view to developing a voicebot that can support children acquiring a foreign language. We evaluated their performance on read and extemporaneous speech of native and non-native Dutch children. We also investigated the utility of using ASR technology to provide insight into the children's pronunciation and fluency. The results show that recent, pre-trained ASR transformer-based models achieve acceptable performance from which detailed feedback on phoneme pronunciation quality can be extracted, despite the challenging nature of child and non-native speech.

7/24/2024