Development of Large Annotated Music Datasets using HMM-based Forced Viterbi Alignment

Read original: arXiv:2408.14890 - Published 8/28/2024 by S. Johanan Joysingh, P. Vijayalakshmi, T. Nagarajan
Total Score

0

Development of Large Annotated Music Datasets using HMM-based Forced Viterbi Alignment

Sign in to get full access

or

If you already have an account, we'll log you in

Overview

  • Developed large annotated music datasets using HMM-based forced Viterbi alignment
  • Addresses the challenge of creating high-quality music transcription datasets at scale
  • Provides a systematic approach for generating high-quality, large-scale music transcription datasets

Plain English Explanation

The paper presents a method for developing large annotated music datasets using Hidden Markov Model (HMM)-based forced Viterbi alignment. This approach helps address the challenge of creating high-quality music transcription datasets at scale, which is crucial for advancing machine learning techniques in automatic music transcription.

The key idea is to use an HMM-based forced Viterbi alignment process to automatically annotate music recordings with ground truth transcriptions. This allows the researchers to generate large-scale, high-quality datasets that can be used to train and evaluate advanced music transcription models, ultimately contributing to the broader field of computational music analysis.

Technical Explanation

The paper describes a systematic approach for generating large-scale, high-quality music transcription datasets using HMM-based forced Viterbi alignment. The key steps include:

  1. Dataset Collection: The researchers collected a diverse set of music recordings, including both synthetic and real-world data, covering a wide range of genres, instrumentation, and complexity.

  2. Ground Truth Annotation: To annotate the music recordings with ground truth transcriptions, the authors leveraged an HMM-based forced Viterbi alignment process. This technique aligns the audio recordings with symbolic music representations (e.g., MIDI files) to automatically generate high-quality transcriptions.

  3. Dataset Curation: The researchers curated the annotated datasets, removing any low-quality or erroneous annotations, and organizing the data in a structured manner for ease of use by the research community.

The resulting datasets are large-scale, diverse, and accurately annotated, providing a valuable resource for training and evaluating advanced music transcription models and advancing computational music analysis.

Critical Analysis

The paper provides a robust and systematic approach for generating high-quality, large-scale music transcription datasets. However, it is important to note that the accuracy of the annotated datasets relies heavily on the performance of the HMM-based forced Viterbi alignment process. While the authors report high alignment accuracy, there may still be some inherent biases or errors in the automatically generated transcriptions.

Additionally, the diversity of the datasets, while impressive, may still not fully capture the breadth and complexity of real-world music. Further research could explore incorporating even more diverse and challenging musical content to expand the applicability of the developed datasets.

Conclusion

This paper presents a valuable contribution to the field of music informatics by addressing the critical challenge of creating high-quality, large-scale music transcription datasets. The HMM-based forced Viterbi alignment approach provides a systematic and scalable solution for generating accurately annotated datasets, which can be leveraged to train and evaluate advanced music transcription models and drive further advancements in computational music analysis. The resulting datasets represent an important resource for the research community and have the potential to significantly impact the development of more robust and accurate music understanding systems.



This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Development of Large Annotated Music Datasets using HMM-based Forced Viterbi Alignment
Total Score

0

Development of Large Annotated Music Datasets using HMM-based Forced Viterbi Alignment

S. Johanan Joysingh, P. Vijayalakshmi, T. Nagarajan

Datasets are essential for any machine learning task. Automatic Music Transcription (AMT) is one such task, where considerable amount of data is required depending on the way the solution is achieved. Considering the fact that a music dataset, complete with audio and its time-aligned transcriptions would require the effort of people with musical experience, it could be stated that the task becomes even more challenging. Musical experience is required in playing the musical instrument(s), and in annotating and verifying the transcriptions. We propose a method that would help in streamlining this process, making the task of obtaining a dataset from a particular instrument easy and efficient. We use predefined guitar exercises and hidden Markov model(HMM) based forced viterbi alignment to accomplish this. The guitar exercises are designed to be simple. Since the note sequence are already defined, HMM based forced viterbi alignment provides time-aligned transcriptions of these audio files. The onsets of the transcriptions are manually verified and the labels are accurate up to 10ms, averaging at 5ms. The contributions of the proposed work is two fold, i) a well streamlined and efficient method for generating datasets for any instrument, especially monophonic and, ii) an acoustic plectrum guitar dataset containing wave files and transcriptions in the form of label files. This method will aid as a preliminary step towards building concrete datasets for building AMT systems for different instruments.

Read more

8/28/2024

Annotation-free Automatic Music Transcription with Scalable Synthetic Data and Adversarial Domain Confusion
Total Score

0

Annotation-free Automatic Music Transcription with Scalable Synthetic Data and Adversarial Domain Confusion

Gakusei Sato, Taketo Akama

Automatic Music Transcription (AMT) is a vital technology in the field of music information processing. Despite recent enhancements in performance due to machine learning techniques, current methods typically attain high accuracy in domains where abundant annotated data is available. Addressing domains with low or no resources continues to be an unresolved challenge. To tackle this issue, we propose a transcription model that does not require any MIDI-audio paired data through the utilization of scalable synthetic audio for pre-training and adversarial domain confusion using unannotated real audio. In experiments, we evaluate methods under the real-world application scenario where training datasets do not include the MIDI annotation of audio in the target data domain. Our proposed method achieved competitive performance relative to established baseline methods, despite not utilizing any real datasets of paired MIDI-audio. Additionally, ablation studies have provided insights into the scalability of this approach and the forthcoming challenges in the field of AMT research.

Read more

7/4/2024

Quantifying the Corpus Bias Problem in Automatic Music Transcription Systems
Total Score

0

Quantifying the Corpus Bias Problem in Automatic Music Transcription Systems

Luk'av{s} Samuel Mart'ak, Patricia Hu, Gerhard Widmer

Automatic Music Transcription (AMT) is the task of recognizing notes in audio recordings of music. The State-of-the-Art (SotA) benchmarks have been dominated by deep learning systems. Due to the scarcity of high quality data, they are usually trained and evaluated exclusively or predominantly on classical piano music. Unfortunately, that hinders our ability to understand how they generalize to other music. Previous works have revealed several aspects of memorization and overfitting in these systems. We identify two primary sources of distribution shift: the music, and the sound. Complementing recent results on the sound axis (i.e. acoustics, timbre), we investigate the musical one (i.e. note combinations, dynamics, genre). We evaluate the performance of several SotA AMT systems on two new experimental test sets which we carefully construct to emulate different levels of musical distribution shift. Our results reveal a stark performance gap, shedding further light on the Corpus Bias problem, and the extent to which it continues to trouble these systems.

Read more

8/12/2024

Machine Learning Techniques in Automatic Music Transcription: A Systematic Survey
Total Score

0

Machine Learning Techniques in Automatic Music Transcription: A Systematic Survey

Fatemeh Jamshidi, Gary Pike, Amit Das, Richard Chapman

In the domain of Music Information Retrieval (MIR), Automatic Music Transcription (AMT) emerges as a central challenge, aiming to convert audio signals into symbolic notations like musical notes or sheet music. This systematic review accentuates the pivotal role of AMT in music signal analysis, emphasizing its importance due to the intricate and overlapping spectral structure of musical harmonies. Through a thorough examination of existing machine learning techniques utilized in AMT, we explore the progress and constraints of current models and methodologies. Despite notable advancements, AMT systems have yet to match the accuracy of human experts, largely due to the complexities of musical harmonies and the need for nuanced interpretation. This review critically evaluates both fully automatic and semi-automatic AMT systems, emphasizing the importance of minimal user intervention and examining various methodologies proposed to date. By addressing the limitations of prior techniques and suggesting avenues for improvement, our objective is to steer future research towards fully automated AMT systems capable of accurately and efficiently translating intricate audio signals into precise symbolic representations. This study not only synthesizes the latest advancements but also lays out a road-map for overcoming existing challenges in AMT, providing valuable insights for researchers aiming to narrow the gap between current systems and human-level transcription accuracy.

Read more

6/24/2024