The Sound Demixing Challenge 2023 $unicode{x2013}$ Music Demixing Track

Read original: arXiv:2308.06979 - Published 4/22/2024 by Giorgio Fabbro, Stefan Uhlich, Chieh-Hsin Lai, Woosung Choi, Marco Mart'inez-Ram'irez, Weihsiang Liao, Igor Gadelha, Geraldo Ramos, Eddie Hsu, Hugo Rodrigues and 17 others

🛠️

Overview

This paper summarizes the music demixing (MDX) track of the Sound Demixing Challenge (SDX'23).
It introduces the task of robust music source separation (MSS), which involves training MSS models in the presence of errors in the training data.
The paper proposes a formalization of the errors that can occur in the design of a training dataset for MSS systems and introduces two new datasets that simulate such errors: SDXDB23_LabelNoise and SDXDB23_Bleeding.
It describes the methods that achieved the highest scores in the competition and presents a direct comparison with the previous edition of the challenge, the Music Demixing Challenge 2021.
The paper also reports the results of a listening test with renowned producers and musicians to study the perceptual quality of the systems.

Plain English Explanation

This paper looks at a challenge focused on separating different musical instruments and vocals from a mixed audio recording, a task known as music source separation (MSS). The paper describes a new version of this challenge, called the Sound Demixing Challenge 2023 (SDX'23), which introduces a twist: the training data for the MSS models may contain errors, such as mislabeled instruments or "bleed" from other instruments.

The researchers propose ways to simulate these types of errors in training datasets, creating two new datasets called SDXDB23_LabelNoise and SDXDB23_Bleeding. They then describe the top-performing methods from the competition and show that the winning system was able to achieve a significant improvement in performance compared to the previous year's challenge winner, when evaluated on the same dataset.

In addition to the technical results, the paper also reports on a listening test where renowned music producers and musicians evaluated the perceptual quality of the different systems. This provides a more subjective assessment of the models' performance beyond just the numerical metrics.

Overall, this research aims to advance the state-of-the-art in robust music source separation, where the models can handle imperfections in the training data, which is an important practical consideration for real-world applications.

Technical Explanation

The paper introduces the music demixing (MDX) track of the Sound Demixing Challenge (SDX'23), which focuses on the task of robust music source separation (MSS). This involves training MSS models to separate individual instruments and vocals from mixed audio recordings, even when the training data contains errors.

The researchers propose a formalization of the types of errors that can occur in the design of an MSS training dataset, including label noise (where the instrument or vocal annotations are incorrect) and bleeding (where there is leakage of other instruments into a given source recording). They then create two new datasets, SDXDB23_LabelNoise and SDXDB23_Bleeding, to simulate these real-world challenges.

The paper describes the methods that achieved the highest scores in the SDX'23 competition. These approaches likely leveraged techniques like weakly supervised audio separation or cross-domain audio deepfake detection to handle the noisy training data.

When evaluating the top-performing system on the previous Music Demixing Challenge 2021 (MDXDB21) dataset, the authors found an improvement of over 1.6dB in signal-to-distortion ratio compared to the winner of the previous competition. This suggests significant progress in robust MSS techniques.

Critical Analysis

The paper provides a thorough and well-designed evaluation of the state-of-the-art in robust music source separation. By introducing new datasets that simulate real-world errors in training data, the researchers have created a more challenging and realistic benchmark for assessing MSS systems.

However, the paper does not go into detail on the specific architectures or training techniques used by the top-performing methods. While the results are impressive, more information on the underlying models and how they handle noisy data would be helpful for understanding the key innovations.

Additionally, the listening test results provide valuable insight into the perceptual quality of the systems, but it's unclear how the human evaluators were selected and whether their assessments could be biased or inconsistent. More details on the listening test methodology would strengthen this analysis.

Overall, this research represents an important step forward in developing MSS systems that can robustly handle real-world challenges. The introduction of the SDX'23 benchmark and the strong performance gains over the previous challenge are significant contributions to the field.

Conclusion

This paper summarizes the music demixing (MDX) track of the Sound Demixing Challenge 2023 (SDX'23), which focuses on the task of robust music source separation (MSS). The researchers proposed a formalization of the types of errors that can occur in MSS training data and introduced two new datasets, SDXDB23_LabelNoise and SDXDB23_Bleeding, to simulate these challenges.

The top-performing methods from the competition achieved impressive results, with the winning system showing over a 1.6dB improvement in signal-to-distortion ratio compared to the previous year's challenge winner, when evaluated on the same dataset. The paper also reported on a listening test with renowned music producers and musicians, providing a more subjective assessment of the systems' perceptual quality.

This research represents an important advancement in the field of robust MSS, as it addresses the practical challenge of training models to perform well even when the training data is noisy or imperfect. The new SDX'23 benchmark and the strong technical results suggest that the community is making progress towards more reliable and high-quality music source separation systems.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🛠️

The Sound Demixing Challenge 2023 $unicode{x2013}$ Music Demixing Track

Giorgio Fabbro, Stefan Uhlich, Chieh-Hsin Lai, Woosung Choi, Marco Mart'inez-Ram'irez, Weihsiang Liao, Igor Gadelha, Geraldo Ramos, Eddie Hsu, Hugo Rodrigues, Fabian-Robert Stoter, Alexandre D'efossez, Yi Luo, Jianwei Yu, Dipam Chakraborty, Sharada Mohanty, Roman Solovyev, Alexander Stempkovskiy, Tatiana Habruseva, Nabarun Goswami, Tatsuya Harada, Minseok Kim, Jun Hyung Lee, Yuanliang Dong, Xinran Zhang, Jiafeng Liu, Yuki Mitsufuji

This paper summarizes the music demixing (MDX) track of the Sound Demixing Challenge (SDX'23). We provide a summary of the challenge setup and introduce the task of robust music source separation (MSS), i.e., training MSS models in the presence of errors in the training data. We propose a formalization of the errors that can occur in the design of a training dataset for MSS systems and introduce two new datasets that simulate such errors: SDXDB23_LabelNoise and SDXDB23_Bleeding. We describe the methods that achieved the highest scores in the competition. Moreover, we present a direct comparison with the previous edition of the challenge (the Music Demixing Challenge 2021): the best performing system achieved an improvement of over 1.6dB in signal-to-distortion ratio over the winner of the previous competition, when evaluated on MDXDB21. Besides relying on the signal-to-distortion ratio as objective metric, we also performed a listening test with renowned producers and musicians to study the perceptual quality of the systems and report here the results. Finally, we provide our insights into the organization of the competition and our prospects for future editions.

4/22/2024

🧠

The Sound Demixing Challenge 2023 $unicode{x2013}$ Cinematic Demixing Track

Stefan Uhlich, Giorgio Fabbro, Masato Hirano, Shusuke Takahashi, Gordon Wichern, Jonathan Le Roux, Dipam Chakraborty, Sharada Mohanty, Kai Li, Yi Luo, Jianwei Yu, Rongzhi Gu, Roman Solovyev, Alexander Stempkovskiy, Tatiana Habruseva, Mikhail Sukhovei, Yuki Mitsufuji

This paper summarizes the cinematic demixing (CDX) track of the Sound Demixing Challenge 2023 (SDX'23). We provide a comprehensive summary of the challenge setup, detailing the structure of the competition and the datasets used. Especially, we detail CDXDB23, a new hidden dataset constructed from real movies that was used to rank the submissions. The paper also offers insights into the most successful approaches employed by participants. Compared to the cocktail-fork baseline, the best-performing system trained exclusively on the simulated Divide and Remaster (DnR) dataset achieved an improvement of 1.8 dB in SDR, whereas the top-performing system on the open leaderboard, where any data could be used for training, saw a significant improvement of 5.7 dB. A significant source of this improvement was making the simulated data better match real cinematic audio, which we further investigate in detail.

4/19/2024

👨‍🏫

Benchmarks and leaderboards for sound demixing tasks

Roman Solovyev, Alexander Stempkovskiy, Tatiana Habruseva

Music demixing is the task of separating different tracks from the given single audio signal into components, such as drums, bass, and vocals from the rest of the accompaniment. Separation of sources is useful for a range of areas, including entertainment and hearing aids. In this paper, we introduce two new benchmarks for the sound source separation tasks and compare popular models for sound demixing, as well as their ensembles, on these benchmarks. For the models' assessments, we provide the leaderboard at https://mvsep.com/quality_checker/, giving a comparison for a range of models. The new benchmark datasets are available for download. We also develop a novel approach for audio separation, based on the ensembling of different models that are suited best for the particular stem. The proposed solution was evaluated in the context of the Music Demixing Challenge 2023 and achieved top results in different tracks of the challenge. The code and the approach are open-sourced on GitHub.

5/8/2024

🌐

The Whole Is Greater than the Sum of Its Parts: Improving Music Source Separation by Bridging Network

Ryosuke Sawata, Naoya Takahashi, Stefan Uhlich, Shusuke Takahashi, Yuki Mitsufuji

This paper presents the crossing scheme (X-scheme) for improving the performance of deep neural network (DNN)-based music source separation (MSS) with almost no increasing calculation cost. It consists of three components: (i) multi-domain loss (MDL), (ii) bridging operation, which couples the individual instrument networks, and (iii) combination loss (CL). MDL enables the taking advantage of the frequency- and time-domain representations of audio signals. We modify the target network, i.e., the network architecture of the original DNN-based MSS, by adding bridging paths for each output instrument to share their information. MDL is then applied to the combinations of the output sources as well as each independent source; hence, we called it CL. MDL and CL can easily be applied to many DNN-based separation methods as they are merely loss functions that are only used during training and do not affect the inference step. Bridging operation does not increase the number of learnable parameters in the network. Experimental results showed that the validity of Open-Unmix (UMX), densely connected dilated DenseNet (D3Net) and convolutional time-domain audio separation network (Conv-TasNet) extended with our X-scheme, respectively called X-UMX, X-D3Net and X-Conv-TasNet, by comparing them with their original versions. We also verified the effectiveness of X-scheme in a large-scale data regime, showing its generality with respect to data size. X-UMX Large (X-UMXL), which was trained on large-scale internal data and used in our experiments, is newly available at https://github.com/asteroid-team/asteroid/tree/master/egs/musdb18/X-UMX.

8/7/2024