BERT-like Pre-training for Symbolic Piano Music Classification Tasks

Read original: arXiv:2107.05223 - Published 4/16/2024 by Yi-Hui Chou, I-Chun Chen, Chin-Jui Chang, Joann Ching, Yi-Hsuan Yang

🏷️

Overview

This paper presents a benchmark study using the Bidirectional Encoder Representations from Transformers (BERT) approach for symbolic piano music classification.
The study considers two types of MIDI data: MIDI scores (musical scores rendered directly into MIDI) and MIDI performances (MIDI encodings of human performances).
The researchers pre-train two 12-layer Transformer models using the BERT approach, one for MIDI scores and one for MIDI performances, and fine-tune them for four downstream classification tasks.
The evaluation shows that the BERT approach leads to higher classification accuracy than recurrent neural network (RNN)-based baselines.

Plain English Explanation

The paper explores using a powerful machine learning technique called BERT to classify different types of piano music data. BERT is a type of language model that can understand complex patterns in text. The researchers adapted BERT to work with two kinds of piano music data:

MIDI scores: These are the original musical scores translated directly into MIDI format, without any dynamics or expressive elements.
MIDI performances: These are MIDI recordings of humans actually playing the musical scores, so they include the nuances and expression of a live performance.

The researchers trained two separate BERT models, one for each type of MIDI data. They then used these BERT models to tackle four different classification tasks:

Melody extraction: Identifying the main melody in the music.
Velocity prediction: Predicting the loudness (velocity) of the notes.
Style classification: Identifying the musical style (e.g., classical, jazz, etc.).
Emotion classification: Identifying the emotional expression of the music.

The key finding is that the BERT-based models outperformed more traditional recurrent neural network (RNN) models on all of these classification tasks. This suggests that BERT's ability to understand complex patterns in the MIDI data gives it an advantage over other approaches.

Technical Explanation

The researchers used five publicly available datasets of single-track piano MIDI files to pre-train and fine-tune the BERT models. For the pre-training stage, they created two 12-layer Transformer models using the BERT approach - one for the MIDI scores and one for the MIDI performances.

In the fine-tuning stage, the researchers used these pre-trained BERT models to tackle four downstream classification tasks:

Melody extraction: Identifying the main melody line in the music.
Velocity prediction: Predicting the loudness (velocity) of the notes.
Style classification: Identifying the musical style (e.g., classical, jazz, etc.).
Emotion classification: Identifying the emotional expression of the music.

The evaluation showed that the BERT-based models outperformed recurrent neural network (RNN)-based baselines on all four tasks. This suggests that BERT's ability to capture complex dependencies in the MIDI data gives it an advantage over more traditional sequence modeling approaches.

Critical Analysis

The paper provides a thorough and well-designed benchmark study using the BERT approach for symbolic piano music classification. The authors make a compelling case for the effectiveness of BERT-based models compared to RNN-based baselines.

However, the study is limited to single-track piano MIDI files and does not explore more complex musical data, such as multi-instrument or polyphonic compositions. Additionally, the paper does not delve into the interpretability of the BERT models or provide much insight into the specific patterns and features they learn.

Further research could explore applying the BERT approach to a wider range of musical data and tasks, as well as investigating the inner workings of the models to better understand how they are able to outperform RNN-based approaches. It would also be interesting to see how the BERT models perform on real-world applications, such as music recommendation or generation systems.

Conclusion

This paper presents a compelling benchmark study demonstrating the effectiveness of the BERT approach for symbolic piano music classification. The researchers show that BERT-based models outperform traditional RNN-based approaches on a variety of tasks, including melody extraction, velocity prediction, style classification, and emotion classification.

These findings suggest that the BERT approach, with its ability to capture complex dependencies in MIDI data, could have significant implications for the development of advanced music analysis and generation systems. As the field of AI-driven music continues to evolve, this research provides a valuable foundation for further exploration and innovation.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🏷️

BERT-like Pre-training for Symbolic Piano Music Classification Tasks

Yi-Hui Chou, I-Chun Chen, Chin-Jui Chang, Joann Ching, Yi-Hsuan Yang

This article presents a benchmark study of symbolic piano music classification using the masked language modelling approach of the Bidirectional Encoder Representations from Transformers (BERT). Specifically, we consider two types of MIDI data: MIDI scores, which are musical scores rendered directly into MIDI with no dynamics and precisely aligned with the metrical grid notated by its composer and MIDI performances, which are MIDI encodings of human performances of musical scoresheets. With five public-domain datasets of single-track piano MIDI files, we pre-train two 12-layer Transformer models using the BERT approach, one for MIDI scores and the other for MIDI performances, and fine-tune them for four downstream classification tasks. These include two note-level classification tasks (melody extraction and velocity prediction) and two sequence-level classification tasks (style classification and emotion classification). Our evaluation shows that the BERT approach leads to higher classification accuracy than recurrent neural network (RNN)-based baselines.

4/16/2024

PianoBART: Symbolic Piano Music Generation and Understanding with Large-Scale Pre-Training

Xiao Liang, Zijian Zhao, Weichao Zeng, Yutong He, Fupeng He, Yiyi Wang, Chengying Gao

Learning musical structures and composition patterns is necessary for both music generation and understanding, but current methods do not make uniform use of learned features to generate and comprehend music simultaneously. In this paper, we propose PianoBART, a pre-trained model that uses BART for both symbolic piano music generation and understanding. We devise a multi-level object selection strategy for different pre-training tasks of PianoBART, which can prevent information leakage or loss and enhance learning ability. The musical semantics captured in pre-training are fine-tuned for music generation and understanding tasks. Experiments demonstrate that PianoBART efficiently learns musical patterns and achieves outstanding performance in generating high-quality coherent pieces and comprehending music. Our code and supplementary material are available at https://github.com/RS2002/PianoBart.

7/8/2024

Adversarial-MidiBERT: Symbolic Music Understanding Model Based on Unbias Pre-training and Mask Fine-tuning

Zijian Zhao

As an important part of Music Information Retrieval (MIR), Symbolic Music Understanding (SMU) has gained substantial attention, as it can assist musicians and amateurs in learning and creating music. Recently, pre-trained language models have been widely adopted in SMU because the symbolic music shares a huge similarity with natural language, and the pre-trained manner also helps make full use of limited music data. However, the issue of bias, such as sexism, ageism, and racism, has been observed in pre-trained language models, which is attributed to the imbalanced distribution of training data. It also has a significant influence on the performance of downstream tasks, which also happens in SMU. To address this challenge, we propose Adversarial-MidiBERT, a symbolic music understanding model based on Bidirectional Encoder Representations from Transformers (BERT). We introduce an unbiased pre-training method based on adversarial learning to minimize the participation of tokens that lead to biases during training. Furthermore, we propose a mask fine-tuning method to narrow the data gap between pre-training and fine-tuning, which can help the model converge faster and perform better. We evaluate our method on four music understanding tasks, and our approach demonstrates excellent performance in all of them. The code for our model is publicly available at https://github.com/RS2002/Adversarial-MidiBERT.

7/12/2024

MMT-BERT: Chord-aware Symbolic Music Generation Based on Multitrack Music Transformer and MusicBERT

Jinlong Zhu, Keigo Sakurai, Ren Togo, Takahiro Ogawa, Miki Haseyama

We propose a novel symbolic music representation and Generative Adversarial Network (GAN) framework specially designed for symbolic multitrack music generation. The main theme of symbolic music generation primarily encompasses the preprocessing of music data and the implementation of a deep learning framework. Current techniques dedicated to symbolic music generation generally encounter two significant challenges: training data's lack of information about chords and scales and the requirement of specially designed model architecture adapted to the unique format of symbolic music representation. In this paper, we solve the above problems by introducing new symbolic music representation with MusicLang chord analysis model. We propose our MMT-BERT architecture adapting to the representation. To build a robust multitrack music generator, we fine-tune a pre-trained MusicBERT model to serve as the discriminator, and incorporate relativistic standard loss. This approach, supported by the in-depth understanding of symbolic music encoded within MusicBERT, fortifies the consonance and humanity of music generated by our method. Experimental results demonstrate the effectiveness of our approach which strictly follows the state-of-the-art methods.

9/4/2024