Expressive MIDI-format Piano Performance Generation

Read original: arXiv:2408.00900 - Published 8/6/2024 by Jingwei Liu
Total Score

0

Expressive MIDI-format Piano Performance Generation

Sign in to get full access

or

If you already have an account, we'll log you in

Overview

  • Explores the use of convolutional networks for generating expressive MIDI-format piano performances
  • Focuses on processing raw audio data to extract relevant musical features for performance generation
  • Highlights the advantages of symbolic music generation over audio-based approaches

Plain English Explanation

The research paper discusses a method for generating expressive piano performances using MIDI-format data. Rather than generating audio directly, the approach focuses on processing raw audio data to extract relevant musical features, such as timing, dynamics, and articulation. These features are then used to train a convolutional neural network that can generate new piano performances with expressive characteristics.

The key advantage of this approach is that it operates on symbolic music data (MIDI) instead of raw audio. This allows for more precise control and manipulation of the musical elements, compared to generating audio directly. By focusing on the underlying musical structure rather than the audio waveform, the system can produce more coherent and musically meaningful performances.

Technical Explanation

The paper presents a convolutional network architecture for generating expressive MIDI-format piano performances. The network takes as input a series of raw audio waveforms and learns to extract relevant musical features, such as timing, dynamics, and articulation. These features are then used to generate new piano performances with expressive characteristics.

The listening-based data processing approach is a key aspect of the system. By analyzing the raw audio data, the system can capture the subtle nuances of human piano performances, which are often lost when working directly with symbolic music data (e.g., MIDI). This allows the network to learn and reproduce the expressive qualities that are essential for realistic and compelling piano performances.

Critical Analysis

The paper presents a promising approach to generating expressive piano performances using MIDI-format data. However, it is worth noting that the reliance on raw audio data may introduce some challenges in terms of scalability and computational complexity. Additionally, the potential limitations of the convolutional network architecture in capturing long-term musical dependencies could be an area for further exploration.

Conclusion

This research paper offers a novel approach to generating expressive piano performances using MIDI-format data. By focusing on processing raw audio data to extract relevant musical features, the system is able to generate performances with compelling expressive qualities. The advantages of this symbolic music generation approach over audio-based methods suggest potential applications in various music-related domains, such as computer-assisted composition, music education, and interactive music systems.



This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Expressive MIDI-format Piano Performance Generation
Total Score

0

Expressive MIDI-format Piano Performance Generation

Jingwei Liu

This work presents a generative neural network that's able to generate expressive piano performance in MIDI format. The musical expressivity is reflected by vivid micro-timing, rich polyphonic texture, varied dynamics, and the sustain pedal effects. This model is innovative from many aspects of data processing to neural network design. We claim that this symbolic music generation model overcame the common critics of symbolic music and is able to generate expressive music flows as good as, if not better than generations with raw audio. One drawback is that, due to the limited time for submission, the model is not fine-tuned and sufficiently trained, thus the generation may sound incoherent and random at certain points. Despite that, this model shows its powerful generative ability to generate expressive piano pieces.

Read more

8/6/2024

Emotion-driven Piano Music Generation via Two-stage Disentanglement and Functional Representation
Total Score

0

Emotion-driven Piano Music Generation via Two-stage Disentanglement and Functional Representation

Jingyue Huang, Ke Chen, Yi-Hsuan Yang

Managing the emotional aspect remains a challenge in automatic music generation. Prior works aim to learn various emotions at once, leading to inadequate modeling. This paper explores the disentanglement of emotions in piano performance generation through a two-stage framework. The first stage focuses on valence modeling of lead sheet, and the second stage addresses arousal modeling by introducing performance-level attributes. To further capture features that shape valence, an aspect less explored by previous approaches, we introduce a novel functional representation of symbolic music. This representation aims to capture the emotional impact of major-minor tonality, as well as the interactions among notes, chords, and key signatures. Objective and subjective experiments validate the effectiveness of our framework in both emotional valence and arousal modeling. We further leverage our framework in a novel application of emotional controls, showing a broad potential in emotion-driven music generation.

Read more

7/31/2024

Practical and Reproducible Symbolic Music Generation by Large Language Models with Structural Embeddings
Total Score

0

Practical and Reproducible Symbolic Music Generation by Large Language Models with Structural Embeddings

Seungyeon Rhyu, Kichang Yang, Sungjun Cho, Jaehyeon Kim, Kyogu Lee, Moontae Lee

Music generation introduces challenging complexities to large language models. Symbolic structures of music often include vertical harmonization as well as horizontal counterpoint, urging various adaptations and enhancements for large-scale Transformers. However, existing works share three major drawbacks: 1) their tokenization requires domain-specific annotations, such as bars and beats, that are typically missing in raw MIDI data; 2) the pure impact of enhancing token embedding methods is hardly examined without domain-specific annotations; and 3) existing works to overcome the aforementioned drawbacks, such as MuseNet, lack reproducibility. To tackle such limitations, we develop a MIDI-based music generation framework inspired by MuseNet, empirically studying two structural embeddings that do not rely on domain-specific annotations. We provide various metrics and insights that can guide suitable encoding to deploy. We also verify that multiple embedding configurations can selectively boost certain musical aspects. By providing open-source implementations via HuggingFace, our findings shed light on leveraging large language models toward practical and reproducible music generation.

Read more

7/30/2024

Sine, Transient, Noise Neural Modeling of Piano Notes
Total Score

0

Sine, Transient, Noise Neural Modeling of Piano Notes

Riccardo Simionato, Stefano Fasciani

This paper introduces a novel method for emulating piano sounds. We propose to exploit the sine, transient, and noise decomposition to design a differentiable spectral modeling synthesizer replicating piano notes. Three sub-modules learn these components from piano recordings and generate the corresponding harmonic, transient, and noise signals. Splitting the emulation into three independently trainable models reduces the modeling tasks' complexity. The quasi-harmonic content is produced using a differentiable sinusoidal model guided by physics-derived formulas, whose parameters are automatically estimated from audio recordings. The noise sub-module uses a learnable time-varying filter, and the transients are generated using a deep convolutional network. From singular notes, we emulate the coupling between different keys in trichords with a convolutional-based network. Results show the model matches the partial distribution of the target while predicting the energy in the higher part of the spectrum presents more challenges. The energy distribution in the spectra of the transient and noise components is accurate overall. While the model is more computationally and memory efficient, perceptual tests reveal limitations in accurately modeling the attack phase of notes. Despite this, it generally achieves perceptual accuracy in emulating single notes and trichords.

Read more

9/11/2024