PiCoGen2: Piano cover generation with transfer learning approach and weakly aligned data

Read original: arXiv:2408.01551 - Published 8/6/2024 by Chih-Pin Tan, Hsin Ai, Yi-Hsin Chang, Shuen-Huei Guan, Yi-Hsuan Yang

PiCoGen2: Piano cover generation with transfer learning approach and weakly aligned data

Overview

PiCoGen2 is a system for generating piano cover versions of songs using a transfer learning approach and weakly aligned data.
The paper proposes a two-stage model that first generates piano melodies from audio input, then generates piano accompaniment based on the generated melodies.
The system is designed to work with limited paired audio-sheet music data, using a weakly aligned training approach.

Plain English Explanation

PiCoGen2 is a tool that can create piano cover versions of songs. It takes an audio recording as input and generates a piano arrangement that mimics the original song.

The key idea behind PiCoGen2 is to use a two-step approach. First, it generates the main melody that will be played on the piano. It does this by learning patterns from existing piano music and applying that knowledge to the input audio. Then, in the second step, it generates the piano accompaniment to support the melody. This accompaniment provides the chords, bassline, and other parts that flesh out the piano cover.

An important aspect of PiCoGen2 is that it can work with limited training data. Rather than requiring perfectly matched audio recordings and sheet music, it can learn from weakly aligned data - audio and sheet music that are related but not precisely synchronized. This makes it more practical to use in real-world scenarios where high-quality training data may be scarce.

Technical Explanation

PiCoGen2 is a two-stage model for generating piano covers of songs. The first stage takes audio input and generates a piano melody, while the second stage generates piano accompaniment based on the predicted melody.

The melody generation stage uses a transformer-based model that learns to map audio features to piano note sequences. This model is pre-trained on a large corpus of symbolic piano music, then fine-tuned on a dataset of weakly aligned audio-sheet music pairs.

The accompaniment generation stage uses a separate transformer model that predicts the piano chords, bassline, and other accompaniment parts given the generated melody. This model is also pre-trained on symbolic piano data before fine-tuning on the weakly aligned dataset.

By splitting the task into two stages and leveraging transfer learning, PiCoGen2 is able to generate high-quality piano covers even with limited training data. The authors demonstrate that it outperforms previous single-stage approaches in both objective and subjective evaluations.

Critical Analysis

The PiCoGen2 paper presents a compelling approach to piano cover generation that addresses some key challenges in the field. The use of a two-stage architecture and transfer learning allows the system to generate high-quality outputs even with limited training data, which is an important practical consideration.

However, the paper does note some limitations of the current system. For example, the melody generation stage is limited to monophonic piano output, whereas real-world piano covers often involve more complex voicings and polyphony. Additionally, the training dataset, while larger than previous work, may still not be representative of the full diversity of piano music styles and genres.

Further research could explore ways to extend the model to handle more complex piano textures, as well as investigate methods for leveraging even larger and more diverse training corpora. Incorporating user feedback or interaction into the cover generation process could also be a fruitful direction.

Overall, PiCoGen2 represents an impressive step forward in the field of automatic piano cover generation, demonstrating the power of transfer learning and weakly supervised techniques to tackle challenging music generation tasks.

Conclusion

PiCoGen2 is a novel system for generating high-quality piano covers of songs using a two-stage approach and transfer learning. By splitting the task into melody generation and accompaniment generation, and leveraging weakly aligned training data, the system can produce convincing piano arrangements even with limited resources.

This research represents an important advance in the field of automatic music generation, with potential applications in areas like music education, entertainment, and creative assistance. While the current system has some limitations, the general approach of combining transfer learning and weakly supervised techniques holds promise for tackling other complex music generation challenges in the future.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

PiCoGen2: Piano cover generation with transfer learning approach and weakly aligned data

Chih-Pin Tan, Hsin Ai, Yi-Hsin Chang, Shuen-Huei Guan, Yi-Hsuan Yang

Piano cover generation aims to create a piano cover from a pop song. Existing approaches mainly employ supervised learning and the training demands strongly-aligned and paired song-to-piano data, which is built by remapping piano notes to song audio. This would, however, result in the loss of piano information and accordingly cause inconsistencies between the original and remapped piano versions. To overcome this limitation, we propose a transfer learning approach that pre-trains our model on piano-only data and fine-tunes it on weakly-aligned paired data constructed without note remapping. During pre-training, to guide the model to learn piano composition concepts instead of merely transcribing audio, we use an existing lead sheet transcription model as the encoder to extract high-level features from the piano recordings. The pre-trained model is then fine-tuned on the paired song-piano data to transfer the learned composition knowledge to the pop song domain. Our evaluation shows that this training strategy enables our model, named PiCoGen2, to attain high-quality results, outperforming baselines on both objective and subjective metrics across five pop genres.

8/6/2024

PiCoGen: Generate Piano Covers with a Two-stage Approach

Chih-Pin Tan, Shuen-Huei Guan, Yi-Hsuan Yang

Cover song generation stands out as a popular way of music making in the music-creative community. In this study, we introduce Piano Cover Generation (PiCoGen), a two-stage approach for automatic cover song generation that transcribes the melody line and chord progression of a song given its audio recording, and then uses the resulting lead sheet as the condition to generate a piano cover in the symbolic domain. This approach is advantageous in that it does not required paired data of covers and their original songs for training. Compared to an existing approach that demands such paired data, our evaluation shows that PiCoGen demonstrates competitive or even superior performance across songs of different musical genres.

7/31/2024

🏷️

BERT-like Pre-training for Symbolic Piano Music Classification Tasks

Yi-Hui Chou, I-Chun Chen, Chin-Jui Chang, Joann Ching, Yi-Hsuan Yang

This article presents a benchmark study of symbolic piano music classification using the masked language modelling approach of the Bidirectional Encoder Representations from Transformers (BERT). Specifically, we consider two types of MIDI data: MIDI scores, which are musical scores rendered directly into MIDI with no dynamics and precisely aligned with the metrical grid notated by its composer and MIDI performances, which are MIDI encodings of human performances of musical scoresheets. With five public-domain datasets of single-track piano MIDI files, we pre-train two 12-layer Transformer models using the BERT approach, one for MIDI scores and the other for MIDI performances, and fine-tune them for four downstream classification tasks. These include two note-level classification tasks (melody extraction and velocity prediction) and two sequence-level classification tasks (style classification and emotion classification). Our evaluation shows that the BERT approach leads to higher classification accuracy than recurrent neural network (RNN)-based baselines.

4/16/2024

PianoBART: Symbolic Piano Music Generation and Understanding with Large-Scale Pre-Training

Xiao Liang, Zijian Zhao, Weichao Zeng, Yutong He, Fupeng He, Yiyi Wang, Chengying Gao

Learning musical structures and composition patterns is necessary for both music generation and understanding, but current methods do not make uniform use of learned features to generate and comprehend music simultaneously. In this paper, we propose PianoBART, a pre-trained model that uses BART for both symbolic piano music generation and understanding. We devise a multi-level object selection strategy for different pre-training tasks of PianoBART, which can prevent information leakage or loss and enhance learning ability. The musical semantics captured in pre-training are fine-tuned for music generation and understanding tasks. Experiments demonstrate that PianoBART efficiently learns musical patterns and achieves outstanding performance in generating high-quality coherent pieces and comprehending music. Our code and supplementary material are available at https://github.com/RS2002/PianoBart.

7/8/2024