HAAQI-Net: A Non-intrusive Neural Music Audio Quality Assessment Model for Hearing Aids

Read original: arXiv:2401.01145 - Published 6/6/2024 by Dyah A. M. G. Wisnu, Stefano Rini, Ryandhimas E. Zezario, Hsin-Min Wang, Yu Tsao

HAAQI-Net: A Non-intrusive Neural Music Audio Quality Assessment Model for Hearing Aids

Overview

This paper presents HAAQI-Net, a neural network model for assessing the quality of music played through hearing aids in a non-intrusive way.
Hearing aids can distort or degrade the quality of music, so an accurate assessment of the perceived audio quality is important for improving hearing aid design and performance.
HAAQI-Net takes in audio signals and predicts a quality score without requiring access to the original, undistorted audio, making it a "non-intrusive" approach.

Plain English Explanation

Hearing aids are devices that help people with hearing loss listen to sounds more clearly. However, when people use hearing aids to listen to music, the audio quality can sometimes get degraded or distorted. HAAQI-Net: A non-intrusive neural music quality assessment model for hearing aids describes a new AI model called HAAQI-Net that can assess the quality of music played through hearing aids without needing access to the original, unprocessed music.

This is useful because it allows hearing aid manufacturers to evaluate how their devices are affecting the listener's music experience and make improvements. The key advantage of HAAQI-Net is that it can analyze the music quality directly from the audio that's played through the hearing aid, rather than requiring a comparison to the original music file. This "non-intrusive" approach makes the quality assessment process simpler and more practical for real-world use.

Technical Explanation

The HAAQI-Net model uses a convolutional neural network architecture to predict a music quality score from the audio signal processed by a hearing aid. The network takes in the hearing aid's output audio as input and produces a scalar value representing the perceived audio quality.

The model was trained and evaluated on a dataset of music samples processed through different hearing aid simulations, with subjective human ratings serving as the ground truth for audio quality. HAAQI-Net demonstrated strong performance in predicting the quality scores, outperforming previous non-intrusive quality assessment approaches.

Critical Analysis

The paper provides a thorough evaluation of HAAQI-Net's performance, including comparisons to other state-of-the-art models. However, the authors acknowledge that the dataset used for training and testing was relatively small, and may not capture the full diversity of real-world hearing aid use cases.

Additionally, the paper does not address potential biases or limitations in the subjective human quality ratings used as ground truth. Further research may be needed to validate HAAQI-Net's performance in more diverse, real-world scenarios.

Hallucination: Perceptual Metric-Driven Speech Enhancement Networks and Quality-Aware Masked Diffusion Transformer for Enhanced Music are examples of other recent work exploring neural network approaches for audio quality assessment, which could provide useful insights for further improving HAAQI-Net.

Conclusion

The HAAQI-Net model represents a valuable contribution to the field of hearing aid technology, providing a practical way to assess music quality without relying on the original audio. This can help hearing aid manufacturers optimize their products to deliver a better listening experience for users. As the authors suggest, further research to expand the model's capabilities and robustness could unlock even greater benefits for people with hearing impairments.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

HAAQI-Net: A Non-intrusive Neural Music Audio Quality Assessment Model for Hearing Aids

Dyah A. M. G. Wisnu, Stefano Rini, Ryandhimas E. Zezario, Hsin-Min Wang, Yu Tsao

This paper introduces HAAQI-Net, a non-intrusive deep learning model for music audio quality assessment tailored for hearing aid users. Unlike traditional methods like the Hearing Aid Audio Quality Index (HAAQI), which rely on intrusive comparisons to a reference signal, HAAQI-Net offers a more accessible and efficient alternative. Using a bidirectional Long Short-Term Memory (BLSTM) architecture with attention mechanisms and features from the pre-trained BEATs model, HAAQI-Net predicts HAAQI scores directly from music audio clips and hearing loss patterns. Results show HAAQI-Net's effectiveness, with predicted scores achieving a Linear Correlation Coefficient (LCC) of 0.9368, a Spearman's Rank Correlation Coefficient (SRCC) of 0.9486, and a Mean Squared Error (MSE) of 0.0064, reducing inference time from 62.52 seconds to 2.54 seconds. Although effective, feature extraction via the large BEATs model incurs computational overhead. To address this, a knowledge distillation strategy creates a student distillBEATs model, distilling information from the teacher BEATs model during HAAQI-Net training, reducing required parameters. The distilled HAAQI-Net maintains strong performance with an LCC of 0.9071, an SRCC of 0.9307, and an MSE of 0.0091, while reducing parameters by 75.85% and inference time by 96.46%. This reduction enhances HAAQI-Net's efficiency and scalability, making it viable for real-world music audio quality assessment in hearing aid settings. This work also opens avenues for further research into optimizing deep learning models for specific applications, contributing to audio signal processing and quality assessment by providing insights into developing efficient and accurate models for practical applications in hearing aid technology.

6/6/2024

🗣️

Non-Intrusive Speech Intelligibility Prediction for Hearing Aids using Whisper and Metadata

Ryandhimas E. Zezario, Fei Chen, Chiou-Shann Fuh, Hsin-Min Wang, Yu Tsao

Automated speech intelligibility assessment is pivotal for hearing aid (HA) development. In this paper, we present three novel methods to improve intelligibility prediction accuracy and introduce MBI-Net+, an enhanced version of MBI-Net, the top-performing system in the 1st Clarity Prediction Challenge. MBI-Net+ leverages Whisper's embeddings to create cross-domain acoustic features and includes metadata from speech signals by using a classifier that distinguishes different enhancement methods. Furthermore, MBI-Net+ integrates the hearing-aid speech perception index (HASPI) as a supplementary metric into the objective function to further boost prediction performance. Experimental results demonstrate that MBI-Net+ surpasses several intrusive baseline systems and MBI-Net on the Clarity Prediction Challenge 2023 dataset, validating the effectiveness of incorporating Whisper embeddings, speech metadata, and related complementary metrics to improve prediction performance for HA.

6/14/2024

Speaker-Independent Acoustic-to-Articulatory Inversion through Multi-Channel Attention Discriminator

Woo-Jin Chung, Hong-Goo Kang

We present a novel speaker-independent acoustic-to-articulatory inversion (AAI) model, overcoming the limitations observed in conventional AAI models that rely on acoustic features derived from restricted datasets. To address these challenges, we leverage representations from a pre-trained self-supervised learning (SSL) model to more effectively estimate the global, local, and kinematic pattern information in Electromagnetic Articulography (EMA) signals during the AAI process. We train our model using an adversarial approach and introduce an attention-based Multi-duration phoneme discriminator (MDPD) designed to fully capture the intricate relationship among multi-channel articulatory signals. Our method achieves a Pearson correlation coefficient of 0.847, marking state-of-the-art performance in speaker-independent AAI models. The implementation details and code can be found online.

6/26/2024

🧠

A Non-Intrusive Neural Quality Assessment Model for Surface Electromyography Signals

Cho-Yuan Lee, Kuan-Chen Wang, Kai-Chun Liu, Yu-Te Wang, Xugang Lu, Ping-Cheng Yeh, Yu Tsao

In practical scenarios involving the measurement of surface electromyography (sEMG) in muscles, particularly those areas near the heart, one of the primary sources of contamination is the presence of electrocardiogram (ECG) signals. To assess the quality of real-world sEMG data more effectively, this study proposes QASE-net, a new non-intrusive model that predicts the SNR of sEMG signals. QASE-net combines CNN-BLSTM with attention mechanisms and follows an end-to-end training strategy. Our experimental framework utilizes real-world sEMG and ECG data from two open-access databases, the Non-Invasive Adaptive Prosthetics Database and the MIT-BIH Normal Sinus Rhythm Database, respectively. The experimental results demonstrate the superiority of QASE-net over the previous assessment model, exhibiting significantly reduced prediction errors and notably higher linear correlations with the ground truth. These findings show the potential of QASE-net to substantially enhance the reliability and precision of sEMG quality assessment in practical applications.

6/14/2024