An ASR-Based Tutor for Learning to Read: How to Optimize Feedback to First Graders

Read original: arXiv:2306.04190 - Published 7/24/2024 by Yu Bai, Cristian Tejedor-Garcia, Ferdy Hubers, Catia Cucchiarini, Helmer Strik

🤯

Overview

The paper examines the use of automatic speech recognition (ASR) in applications for reading practice.
It compares the performance of two new ASR systems developed using a speech corpus, to a previous ASR-based Dutch reading tutor application.
The study analyzes the accuracy of the ASR systems in correctly/incorrectly classifying words compared to human transcripts, using evaluation metrics.
The results suggest improvements in the new ASR systems, but challenges in accurately classifying isolated words.

Plain English Explanation

The paper looks at using automatic speech recognition (ASR) technology to help kids learn to read. In a previous study, the researchers had developed a Dutch reading tutor app that used ASR to provide instant feedback to first-graders. That earlier work showed ASR can be helpful at this stage of reading development, as the kids made progress in reading accuracy and fluency.

For this new study, the researchers used an existing collection of children's speech, called the JASMIN corpus, to create two new ASR systems. They compared the performance of these new systems to the earlier reading tutor app. The key question was how well the ASR could correctly identify words that the children read, compared to human transcripts.

The researchers used various evaluation metrics, like Cohen's Kappa and Matthews Correlation Coefficient (MCC), to measure the agreement between the ASR systems and the human transcripts. They found the new ASR systems showed improvements in correctly rejecting incorrect words.

However, the ASR systems still struggled to accurately classify individual isolated words, regardless of the reading task or word type. The paper discusses possible ways to improve the ASR systems further and suggests areas for future research.

Technical Explanation

The researchers developed two new automatic speech recognition (ASR) systems using the JASMIN corpus of children's speech. They then compared the performance of these new ASR systems to a previous ASR-based Dutch reading tutor application.

The key evaluation focused on the ASR systems' ability to correctly or incorrectly classify words at the individual word level, compared to human-generated transcripts. Metrics like Cohen's Kappa, Matthews Correlation Coefficient (MCC), precision, recall and F-measures were used to assess the agreement between the ASR outputs and the human transcripts.

The results showed improvements in the new ASR systems' ability to correctly reject (CR) incorrect words, compared to the earlier reading tutor application. However, the researchers found it was still difficult for the ASR systems to accurately classify individual isolated words, regardless of reading task or word type.

The paper discusses potential ways to further improve the ASR systems, such as incorporating more contextual information. It also suggests avenues for future research, including exploring the impact of different reading tasks and word types on ASR performance.

Critical Analysis

The paper provides a valuable comparison of new ASR systems developed using a speech corpus, to a previous ASR-based reading tutor application. The evaluation focused on a crucial aspect - the ability to correctly identify individual words, which is essential for providing meaningful feedback to learners.

While the new ASR systems showed improvements, the persistent difficulty in accurately classifying isolated words suggests there is still work to be done. The researchers acknowledge this limitation and propose potential ways to address it, such as leveraging more contextual information.

One area that could be explored further is the impact of different reading tasks and word types on ASR performance. The paper mentions variations in accuracy, but does not delve deeply into the specific challenges posed by certain tasks or word categories. Investigating these nuances could help guide the development of more robust ASR-based reading interventions.

Additionally, the study relied on a single speech corpus (JASMIN) for system development. Evaluating the ASR systems' performance on a broader range of children's speech data could help validate the findings and identify any potential biases or limitations of the corpus.

Overall, the paper provides a solid foundation for understanding the current state of ASR-based reading applications and highlights important directions for future research to overcome the remaining challenges.

Conclusion

This study explored the use of automatic speech recognition (ASR) technology in reading practice applications, comparing the performance of newly developed ASR systems to a previous ASR-based reading tutor.

The results suggest improvements in the new ASR systems' ability to correctly reject incorrect words, but also highlight the persistent challenge of accurately classifying isolated words, regardless of reading task or word type. The researchers propose ways to address this limitation, such as incorporating more contextual information, and identify avenues for future research.

While ASR has shown promise in supporting reading instruction, particularly at early stages, this study underscores the need for continued refinement and optimization of the technology to ensure it can provide reliable and meaningful feedback to learners. Ongoing research in this area has the potential to enhance reading interventions and contribute to improved literacy outcomes.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🤯

An ASR-Based Tutor for Learning to Read: How to Optimize Feedback to First Graders

Yu Bai, Cristian Tejedor-Garcia, Ferdy Hubers, Catia Cucchiarini, Helmer Strik

The interest in employing automatic speech recognition (ASR) in applications for reading practice has been growing in recent years. In a previous study, we presented an ASR-based Dutch reading tutor application that was developed to provide instantaneous feedback to first-graders learning to read. We saw that ASR has potential at this stage of the reading process, as the results suggested that pupils made progress in reading accuracy and fluency by using the software. In the current study, we used children's speech from an existing corpus (JASMIN) to develop two new ASR systems, and compared the results to those of the previous study. We analyze correct/incorrect classification of the ASR systems using human transcripts at word level, by means of evaluation measures such as Cohen's Kappa, Matthews Correlation Coefficient (MCC), precision, recall and F-measures. We observe improvements for the newly developed ASR systems regarding the agreement with human-based judgment and correct rejection (CR). The accuracy of the ASR systems varies for different reading tasks and word types. Our results suggest that, in the current configuration, it is difficult to classify isolated words. We discuss these results, possible ways to improve our systems and avenues for future research.

7/24/2024

🎯

Automatic Assessment of Oral Reading Accuracy for Reading Diagnostics

Bo Molenaar, Cristian Tejedor-Garcia, Helmer Strik, Catia Cucchiarini

Automatic assessment of reading fluency using automatic speech recognition (ASR) holds great potential for early detection of reading difficulties and subsequent timely intervention. Precise assessment tools are required, especially for languages other than English. In this study, we evaluate six state-of-the-art ASR-based systems for automatically assessing Dutch oral reading accuracy using Kaldi and Whisper. Results show our most successful system reached substantial agreement with human evaluations (MCC = .63). The same system reached the highest correlation between forced decoding confidence scores and word correctness (r = .45). This system's language model (LM) consisted of manual orthographic transcriptions and reading prompts of the test data, which shows that including reading errors in the LM improves assessment performance. We discuss the implications for developing automatic assessment systems and identify possible avenues of future research.

7/24/2024

🔎

Reading Miscue Detection in Primary School through Automatic Speech Recognition

Lingyun Gao, Cristian Tejedor-Garcia, Helmer Strik, Catia Cucchiarini

Automatic reading diagnosis systems can benefit both teachers for more efficient scoring of reading exercises and students for accessing reading exercises with feedback more easily. However, there are limited studies on Automatic Speech Recognition (ASR) for child speech in languages other than English, and limited research on ASR-based reading diagnosis systems. This study investigates how efficiently state-of-the-art (SOTA) pretrained ASR models recognize Dutch native children speech and manage to detect reading miscues. We found that Hubert Large finetuned on Dutch speech achieves SOTA phoneme-level child speech recognition (PER at 23.1%), while Whisper (Faster Whisper Large-v2) achieves SOTA word-level performance (WER at 9.8%). Our findings suggest that Wav2Vec2 Large and Whisper are the two best ASR models for reading miscue detection. Specifically, Wav2Vec2 Large shows the highest recall at 0.83, whereas Whisper exhibits the highest precision at 0.52 and an F1 score of 0.52.

7/24/2024

Error-preserving Automatic Speech Recognition of Young English Learners' Language

Janick Michot, Manuela Hurlimann, Jan Deriu, Luzia Sauer, Katsiaryna Mlynchyk, Mark Cieliebak

One of the central skills that language learners need to practice is speaking the language. Currently, students in school do not get enough speaking opportunities and lack conversational practice. Recent advances in speech technology and natural language processing allow for the creation of novel tools to practice their speaking skills. In this work, we tackle the first component of such a pipeline, namely, the automated speech recognition module (ASR), which faces a number of challenges: first, state-of-the-art ASR models are often trained on adult read-aloud data by native speakers and do not transfer well to young language learners' speech. Second, most ASR systems contain a powerful language model, which smooths out errors made by the speakers. To give corrective feedback, which is a crucial part of language learning, the ASR systems in our setting need to preserve the errors made by the language learners. In this work, we build an ASR system that satisfies these requirements: it works on spontaneous speech by young language learners and preserves their errors. For this, we collected a corpus containing around 85 hours of English audio spoken by learners in Switzerland from grades 4 to 6 on different language learning tasks, which we used to train an ASR model. Our experiments show that our model benefits from direct fine-tuning on children's voices and has a much higher error preservation rate than other models.

6/6/2024