SongCreator: Lyrics-based Universal Song Generation

Read original: arXiv:2409.06029 - Published 9/11/2024 by Shun Lei, Yixuan Zhou, Boshi Tang, Max W. Y. Lam, Feng Liu, Hangyu Liu, Jingcheng Wu, Shiyin Kang, Zhiyong Wu, Helen Meng

SongCreator: Lyrics-based Universal Song Generation

Overview

SongCreator is a system that can generate complete song compositions based on just lyrics as input.
It aims to automate the process of music composition and production, making it more accessible to non-musicians.
The system utilizes deep learning models to generate melody, harmony, and other musical elements from the provided lyrics.

Plain English Explanation

SongCreator: Lyrics-based Universal Song Generation is a research project that has developed a system capable of creating full song compositions using only lyrics as the input. The goal is to streamline the music creation process, making it more accessible to people without formal musical training.

The core idea behind SongCreator is to leverage powerful deep learning models to automatically generate the different components of a song - melody, harmony, rhythm, and so on - from the provided lyrics. This allows users to focus on writing the lyrics, while the system takes care of composing the accompanying music.

By reducing the technical barriers to music production, SongCreator aims to democratize the creative process and enable more people to express themselves through songwriting. The system could be particularly useful for aspiring songwriters, hobbyists, or anyone who wants to turn their lyrical ideas into complete musical pieces.

Technical Explanation

SongCreator: Lyrics-based Universal Song Generation presents a deep learning-based approach for generating full song compositions from just the lyrics. The system utilizes a modular architecture, with specialized neural networks responsible for generating different musical elements.

The key components of the SongCreator system include:

Lyrical Feature Extractor: This module analyzes the input lyrics to extract relevant linguistic and semantic features that can be used to guide the music generation process.
Melody Generator: A deep learning model that takes the lyrical features and generates a corresponding melody.
Chord Progression Generator: Another model that determines the underlying chord progressions based on the lyrics and melody.
Accompaniment Generator: This component generates additional musical elements, such as harmonies, rhythms, and instrumentation, to create a complete song arrangement.

The researchers trained and evaluated the SongCreator system on a large dataset of song compositions, demonstrating its ability to generate coherent and musically-relevant outputs from lyrics alone. The system's performance was assessed both objectively, through metrics like melody and chord quality, and subjectively, through human evaluation of the generated songs.

Critical Analysis

The SongCreator: Lyrics-based Universal Song Generation paper presents an ambitious and innovative approach to automated music composition. By leveraging the power of deep learning, the researchers have made significant progress towards the goal of enabling non-musicians to create complete song compositions.

However, the paper also acknowledges several limitations and areas for further research. For example, the current system is still limited in its ability to capture the nuances of human musical creativity, such as the emotional expressiveness and creative spontaneity that can arise during the songwriting process. Additionally, the system may struggle with certain genres or styles of music that require more complex musical structures or artistic interpretation.

Further research could explore ways to better integrate human feedback and collaboration into the songwriting process, or to develop more advanced models that can better capture the subjective and creative aspects of music composition. Evaluating the system's performance on a wider range of musical styles and genres could also provide insights into its broader applicability.

Conclusion

SongCreator: Lyrics-based Universal Song Generation represents a significant step forward in the field of automated music composition. By leveraging deep learning techniques, the researchers have developed a system capable of generating complete song compositions from just the input of lyrics.

While the current system has limitations, this work demonstrates the potential for AI-powered tools to democratize the creative process of music composition and production. As the technology continues to evolve, systems like SongCreator could empower more people to express themselves through the creation of original musical works, and potentially open up new avenues for artistic exploration and collaboration between humans and machines.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

SongCreator: Lyrics-based Universal Song Generation

Shun Lei, Yixuan Zhou, Boshi Tang, Max W. Y. Lam, Feng Liu, Hangyu Liu, Jingcheng Wu, Shiyin Kang, Zhiyong Wu, Helen Meng

Music is an integral part of human culture, embodying human intelligence and creativity, of which songs compose an essential part. While various aspects of song generation have been explored by previous works, such as singing voice, vocal composition and instrumental arrangement, etc., generating songs with both vocals and accompaniment given lyrics remains a significant challenge, hindering the application of music generation models in the real world. In this light, we propose SongCreator, a song-generation system designed to tackle this challenge. The model features two novel designs: a meticulously designed dual-sequence language model (DSLM) to capture the information of vocals and accompaniment for song generation, and an additional attention mask strategy for DSLM, which allows our model to understand, generate and edit songs, making it suitable for various song-related generation tasks. Extensive experiments demonstrate the effectiveness of SongCreator by achieving state-of-the-art or competitive performances on all eight tasks. Notably, it surpasses previous works by a large margin in lyrics-to-song and lyrics-to-vocals. Additionally, it is able to independently control the acoustic conditions of the vocals and accompaniment in the generated song through different prompts, exhibiting its potential applicability. Our samples are available at https://songcreator.github.io/.

9/11/2024

Text-to-Song: Towards Controllable Music Generation Incorporating Vocals and Accompaniment

Zhiqing Hong, Rongjie Huang, Xize Cheng, Yongqi Wang, Ruiqi Li, Fuming You, Zhou Zhao, Zhimeng Zhang

A song is a combination of singing voice and accompaniment. However, existing works focus on singing voice synthesis and music generation independently. Little attention was paid to explore song synthesis. In this work, we propose a novel task called text-to-song synthesis which incorporating both vocals and accompaniments generation. We develop Melodist, a two-stage text-to-song method that consists of singing voice synthesis (SVS) and vocal-to-accompaniment (V2A) synthesis. Melodist leverages tri-tower contrastive pretraining to learn more effective text representation for controllable V2A synthesis. A Chinese song dataset mined from a music website is built up to alleviate data scarcity for our research. The evaluation results on our dataset demonstrate that Melodist can synthesize songs with comparable quality and style consistency. Audio samples can be found in https://text2songMelodist.github.io/Sample/.

5/21/2024

An End-to-End Approach for Chord-Conditioned Song Generation

Shuochen Gao, Shun Lei, Fan Zhuo, Hangyu Liu, Feng Liu, Boshi Tang, Qiaochu Huang, Shiyin Kang, Zhiyong Wu

The Song Generation task aims to synthesize music composed of vocals and accompaniment from given lyrics. While the existing method, Jukebox, has explored this task, its constrained control over the generations often leads to deficiency in music performance. To mitigate the issue, we introduce an important concept from music composition, namely chords, to song generation networks. Chords form the foundation of accompaniment and provide vocal melody with associated harmony. Given the inaccuracy of automatic chord extractors, we devise a robust cross-attention mechanism augmented with dynamic weight sequence to integrate extracted chord information into song generations and reduce frame-level flaws, and propose a novel model termed Chord-Conditioned Song Generator (CSG) based on it. Experimental evidence demonstrates our proposed method outperforms other approaches in terms of musical performance and control precision of generated songs.

9/11/2024

🛸

Singer separation for karaoke content generation

Hsuan-Yu Lin, Xuanjun Chen, Jyh-Shing Roger Jang

Due to the rapid development of deep learning, we can now successfully separate singing voice from mono audio music. However, this separation can only extract human voices from other musical instruments, which is undesirable for karaoke content generation applications that only require the separation of lead singers. For this karaoke application, we need to separate the music containing male and female duets into two vocals, or extract a single lead vocal from the music containing vocal harmony. For this reason, we propose in this article to use a singer separation system, which generates karaoke content for one or two separated lead singers. In particular, we introduced three models for the singer separation task and designed an automatic model selection scheme to distinguish how many lead singers are in the song. We also collected a large enough data set, MIR-SingerSeparation, which has been publicly released to advance the frontier of this research. Our singer separation is most suitable for sentimental ballads and can be directly applied to karaoke content generation. As far as we know, this is the first singer-separation work for real-world karaoke applications.

8/20/2024