Optical Music Recognition in Manuscripts from the Ricordi Archive

Read original: arXiv:2408.10260 - Published 8/21/2024 by Federico Simonetta, Rishav Mondal, Luca Andrea Ludovico, Stavros Ntalampiras

Optical Music Recognition in Manuscripts from the Ricordi Archive

Overview

The paper discusses the application of Optical Music Recognition (OMR) to digitize and transcribe historical music manuscripts from the Ricordi Archive.
OMR is a computer vision technique that automatically extracts musical notation from scanned or photographed sheet music.
The researchers developed a deep learning-based OMR system to handle the unique challenges posed by the Ricordi Archive's handwritten and damaged manuscripts.

Plain English Explanation

The Ricordi Archive contains a vast collection of historical music manuscripts, many of which are handwritten and in poor condition. Digitizing and transcribing this archive is an important task for preserving and studying musical heritage.

The researchers used a technique called Optical Music Recognition (OMR) to automatically extract musical notation from scanned images of the manuscripts. OMR uses computer vision and machine learning to identify and interpret the various symbols and elements of sheet music, such as staves, notes, clefs, and time signatures.

Applying OMR to the Ricordi Archive was particularly challenging because the manuscripts are often handwritten and damaged, making them difficult for computers to analyze. To overcome these issues, the researchers developed a deep learning-based OMR system that was specifically trained on the unique characteristics of the Ricordi manuscripts.

The goal of this research was to create an efficient and accurate OMR system that could help digitize and transcribe the Ricordi Archive, making its musical heritage more accessible to scholars, musicians, and the general public.

Technical Explanation

The researchers approached the OMR task for the Ricordi Archive using a deep learning-based system. They trained a series of neural networks to perform the various sub-tasks of OMR, including staff line detection, musical symbol recognition, and music score parsing.

The staff line detection model used a U-Net-based architecture to identify the horizontal lines that represent the staff system in the sheet music. This was a crucial first step, as the staff lines provide the framework for locating and interpreting the musical symbols.

Next, the researchers developed a symbol recognition model that could identify the various elements of musical notation, such as note heads, stems, clefs, and time signatures. This model was trained on a large dataset of labeled musical symbols to improve its accuracy.

Finally, the researchers implemented a music score parsing module that could interpret the relationships between the detected musical symbols and reconstruct the musical score. This allowed the OMR system to output a digital version of the manuscript that could be further processed or played back.

The key innovation in this research was the ability of the OMR system to handle the unique challenges posed by the Ricordi Archive's handwritten and damaged manuscripts. By carefully designing and training the neural networks, the researchers were able to create an OMR solution that could reliably extract musical notation from these historical documents.

Critical Analysis

The researchers acknowledge several limitations and areas for further research in their paper. One of the main challenges was the diversity and variability of the Ricordi manuscripts, which made it difficult to develop a one-size-fits-all OMR system. The researchers suggest that future work could explore personalized or adaptive OMR models that can better accommodate the unique characteristics of individual manuscripts.

Another limitation is the accuracy and completeness of the OMR output. While the researchers report promising results, there is still room for improvement in terms of correctly identifying all musical elements and reconstructing the scores with high fidelity. Ongoing research could focus on enhancing the symbol recognition and score parsing capabilities of the OMR system.

Additionally, the researchers note that their OMR system currently operates on individual manuscript pages and does not yet address the challenge of page segmentation and document-level analysis. Developing techniques to automatically extract and organize entire musical compositions from the Ricordi Archive could be an important next step.

Overall, this research represents a significant advancement in the application of OMR to historical music manuscripts, and the insights gained could have broader implications for the preservation and study of musical heritage through digital technology.

Conclusion

The paper presented a deep learning-based OMR system specifically designed to handle the unique challenges of digitizing and transcribing historical music manuscripts from the Ricordi Archive. The researchers developed innovative techniques for staff line detection, musical symbol recognition, and score parsing to create an OMR solution that could reliably extract notation from the handwritten and damaged documents.

While the OMR system showed promising results, the researchers identified several areas for further improvement and research, such as personalized models, enhanced accuracy, and document-level analysis. Nonetheless, this work represents an important step forward in the preservation and accessibility of musical heritage through digital technology.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Optical Music Recognition in Manuscripts from the Ricordi Archive

Federico Simonetta, Rishav Mondal, Luca Andrea Ludovico, Stavros Ntalampiras

The Ricordi archive, a prestigious collection of significant musical manuscripts from renowned opera composers such as Donizetti, Verdi and Puccini, has been digitized. This process has allowed us to automatically extract samples that represent various musical elements depicted on the manuscripts, including notes, staves, clefs, erasures, and composer's annotations, among others. To distinguish between digitization noise and actual music elements, a subset of these images was meticulously grouped and labeled by multiple individuals into several classes. After assessing the consistency of the annotations, we trained multiple neural network-based classifiers to differentiate between the identified music elements. The primary objective of this study was to evaluate the reliability of these classifiers, with the ultimate goal of using them for the automatic categorization of the remaining unannotated data set. The dataset, complemented by manual annotations, models, and source code used in these experiments are publicly accessible for replication purposes.

8/21/2024

Knowledge Discovery in Optical Music Recognition: Enhancing Information Retrieval with Instance Segmentation

Elona Shatri, George Fazekas

Optical Music Recognition (OMR) automates the transcription of musical notation from images into machine-readable formats like MusicXML, MEI, or MIDI, significantly reducing the costs and time of manual transcription. This study explores knowledge discovery in OMR by applying instance segmentation using Mask R-CNN to enhance the detection and delineation of musical symbols in sheet music. Unlike Optical Character Recognition (OCR), OMR must handle the intricate semantics of Common Western Music Notation (CWMN), where symbol meanings depend on shape, position, and context. Our approach leverages instance segmentation to manage the density and overlap of musical symbols, facilitating more precise information retrieval from music scores. Evaluations on the DoReMi and MUSCIMA++ datasets demonstrate substantial improvements, with our method achieving a mean Average Precision (mAP) of up to 59.70% in dense symbol environments, achieving comparable results to object detection. Furthermore, using traditional computer vision techniques, we add a parallel step for staff detection to infer the pitch for the recognised symbols. This study emphasises the role of pixel-wise segmentation in advancing accurate music symbol recognition, contributing to knowledge discovery in OMR. Our findings indicate that instance segmentation provides more precise representations of musical symbols, particularly in densely populated scores, advancing OMR technology. We make our implementation, pre-processing scripts, trained models, and evaluation results publicly available to support further research and development.

9/17/2024

Toward a More Complete OMR Solution

Guang Yang (Paul G. Allen School of Computer Science & Engineering, University of Washington, United States), Muru Zhang (Paul G. Allen School of Computer Science & Engineering, University of Washington, United States), Lin Qiu (Paul G. Allen School of Computer Science & Engineering, University of Washington, United States), Yanming Wan (Paul G. Allen School of Computer Science & Engineering, University of Washington, United States), Noah A. Smith (Paul G. Allen School of Computer Science & Engineering, University of Washington, United States, Allen Institute for Artificial Intelligence, United States)

Optical music recognition (OMR) aims to convert music notation into digital formats. One approach to tackle OMR is through a multi-stage pipeline, where the system first detects visual music notation elements in the image (object detection) and then assembles them into a music notation (notation assembly). Most previous work on notation assembly unrealistically assumes perfect object detection. In this study, we focus on the MUSCIMA++ v2.0 dataset, which represents musical notation as a graph with pairwise relationships among detected music objects, and we consider both stages together. First, we introduce a music object detector based on YOLOv8, which improves detection performance. Second, we introduce a supervised training pipeline that completes the notation assembly stage based on detection output. We find that this model is able to outperform existing models trained on perfect detection output, showing the benefit of considering the detection and assembly stages in a more holistic way. These findings, together with our novel evaluation metric, are important steps toward a more complete OMR solution.

9/4/2024

Sheet Music Transformer: End-To-End Optical Music Recognition Beyond Monophonic Transcription

Antonio R'ios-Vila, Jorge Calvo-Zaragoza, Thierry Paquet

State-of-the-art end-to-end Optical Music Recognition (OMR) has, to date, primarily been carried out using monophonic transcription techniques to handle complex score layouts, such as polyphony, often by resorting to simplifications or specific adaptations. Despite their efficacy, these approaches imply challenges related to scalability and limitations. This paper presents the Sheet Music Transformer, the first end-to-end OMR model designed to transcribe complex musical scores without relying solely on monophonic strategies. Our model employs a Transformer-based image-to-sequence framework that predicts score transcriptions in a standard digital music encoding format from input images. Our model has been tested on two polyphonic music datasets and has proven capable of handling these intricate music structures effectively. The experimental outcomes not only indicate the competence of the model, but also show that it is better than the state-of-the-art methods, thus contributing to advancements in end-to-end OMR transcription.

4/30/2024