musicgen

525

Last updated 6/17/2024

⛏️

Property	Value
Run this model	Run on Replicate
API spec	View on Replicate
Github link	View on Github
Paper link	No paper link provided

Create account to get full access

Model overview

musicgen is a deployment of Meta's MusicGen model, a state-of-the-art controllable text-to-music generation system. It was developed by the team at aussielabs. musicgen can generate high-quality music from text prompts or continue and mimic existing audio. It is part of the broader AudioCraft library, which contains other impressive audio generation models like AudioGen and EnCodec.

Model inputs and outputs

Inputs

Prompt: A description of the music you want to generate.
Input Audio: An audio file that will influence the generated music. The generated music can either continue the audio file's melody or mimic its style.
Duration: The desired duration of the generated audio in seconds.
Continuation Start/End: The start and end times of the audio file to use for continuation.
Model Version: The specific MusicGen model to use, such as the "melody" version.
Output Format: The desired format for the generated audio, such as WAV.
Normalization Strategy: The strategy for normalizing the output audio.
Temperature: Controls the "conservativeness" of the sampling process.
Top K/P: Reduces the sampling to the most likely tokens.
Classifier Free Guidance: Increases the influence of the input on the output.

Outputs

Output: The generated audio file in the specified format.

Capabilities

musicgen can generate diverse and high-quality musical compositions from text prompts. It can also continue and mimic existing audio, allowing for creative remixing and mashups. The model is highly controllable, with options to adjust the generated music's style, duration, and other parameters.

What can I use it for?

musicgen can be used for a variety of applications, such as:

Generating custom background music for videos, games, or podcasts
Creating unique musical compositions for personal or commercial projects
Experimenting with remixing and mashups by continuing or mimicking existing tracks
Exploring new musical ideas and styles through text-based prompts

Things to try

One interesting capability of musicgen is its ability to continue and mimic existing audio. Try providing an audio file as input and experiment with the "continuation" and "melody" options to see how the model can extend or transform the original music. You can also try adjusting the temperature and guidance settings to generate more diverse or controlled outputs.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Models

musicgen

meta

2.0K

musicgen is a simple and controllable model for music generation developed by Meta. Unlike existing methods like MusicLM, musicgen doesn't require a self-supervised semantic representation and generates all 4 codebooks in one pass. By introducing a small delay between the codebooks, the authors show they can predict them in parallel, thus having only 50 auto-regressive steps per second of audio. musicgen was trained on 20K hours of licensed music, including an internal dataset of 10K high-quality music tracks and music data from ShutterStock and Pond5. Model inputs and outputs musicgen takes in a text prompt or melody and generates corresponding music. The model's inputs include a description of the desired music, an optional input audio file to influence the generated output, and various parameters to control the generation process like temperature, top-k, and top-p sampling. The output is a generated audio file in WAV format. Inputs Prompt**: A description of the music you want to generate. Input Audio**: An optional audio file that will influence the generated music. If "continuation" is set to true, the generated music will be a continuation of the input audio. Otherwise, it will mimic the input audio's melody. Duration**: The duration of the generated audio in seconds. Continuation Start/End**: The start and end times of the input audio to use for continuation. Various generation parameters**: Settings like temperature, top-k, top-p, etc. to control the diversity and quality of the generated output. Outputs Generated Audio**: A WAV file containing the generated music. Capabilities musicgen can generate a wide variety of music styles and genres based on the provided text prompt. For example, you could ask it to generate "tense, staccato strings with plucked dissonant strings, like a scary movie soundtrack" and it would produce corresponding music. The model can also continue or mimic the melody of an input audio file, allowing for more coherent and controlled music generation. What can I use it for? musicgen could be used for a variety of applications, such as: Background music generation**: Automatically generating custom music for videos, games, or other multimedia projects. Music composition assistance**: Helping musicians and composers come up with new musical ideas or sketches to build upon. Audio creation for content creators**: Allowing YouTubers, podcasters, and other content creators to easily add custom music to their projects. Things to try One interesting aspect of musicgen is its ability to generate music in parallel by predicting the different codebook components separately. This allows for faster generation compared to previous autoregressive music models. You could try experimenting with different generation parameters to find the right balance between generation speed, diversity, and quality for your use case. Additionally, the model's ability to continue or mimic input audio opens up possibilities for interactive music creation workflows, where users could iterate on an initial seed melody or prompt to refine the generated output.

Updated Invalid Date

Text-to-Audio

music-gen

pollinations

music-gen is a text-to-music generation model developed by the team at pollinations. It is part of the Audiocraft library, which is a PyTorch-based library for deep learning research on audio generation. music-gen is a state-of-the-art controllable text-to-music model that can generate music from a given text prompt. It is similar to other music generation models like musicgen, audiogen, and musicgen-choral, but it offers a unique approach with its own strengths. Model inputs and outputs music-gen takes a text prompt and an optional duration as inputs, and generates an audio file as output. The text prompt can be used to specify the desired genre, mood, or other attributes of the generated music. Inputs Text**: A text prompt that describes the desired music Duration**: The duration of the generated music in seconds Outputs Audio file**: An audio file containing the generated music Capabilities music-gen is capable of generating high-quality, controllable music from text prompts. It uses a single-stage auto-regressive Transformer model trained on a large dataset of licensed music, which allows it to generate diverse and coherent musical compositions. Unlike some other music generation models, music-gen does not require a self-supervised semantic representation, and it can generate all the necessary audio components (such as melody, harmony, and rhythm) in a single pass. What can I use it for? music-gen can be used for a variety of creative and practical applications, such as: Generating background music for videos, games, or other multimedia projects Composing music for specific moods or genres, such as relaxing ambient music or upbeat dance tracks Experimenting with different musical styles and ideas by prompting the model with different text descriptions Assisting composers and musicians in the creative process by providing inspiration or starting points for new compositions Things to try One interesting aspect of music-gen is its ability to generate music with a specified melody. By providing the model with a pre-existing melody, such as a fragment of a classical composition, you can prompt it to create new music that incorporates and builds upon that melody. This can be a powerful tool for exploring new musical ideas and variations on existing themes.

Updated Invalid Date

Audio-to-Audio

musicgen-songstarter-v0.2

nateraw

musicgen-songstarter-v0.2 is a large, stereo MusicGen model fine-tuned by nateraw on a dataset of melody loops from their Splice sample library. It is intended to be a useful tool for music producers to generate song ideas. Compared to the previous version musicgen-songstarter-v0.1, this new model was trained on 3x more unique, manually-curated samples and is double the size, using a larger large transformer language model. Similar models include the original musicgen from Meta, which can generate music from a prompt or melody, as well as other fine-tuned versions like musicgen-fine-tuner and musicgen-stereo-chord. Model inputs and outputs musicgen-songstarter-v0.2 takes a variety of inputs to control the generated music, including a text prompt, audio file, and various parameters to adjust the sampling and normalization. The model outputs stereo audio at 32kHz. Inputs Prompt**: A description of the music you want to generate Input Audio**: An audio file that will influence the generated music Continuation**: Whether the generated music should continue from the provided audio file or mimic its melody Continuation Start/End**: The start and end times of the audio file to use for continuation Duration**: The duration of the generated audio in seconds Sampling Parameters**: Controls like top_k, top_p, temperature, and classifier_free_guidance to adjust the diversity and influence of the inputs Outputs Audio**: Stereo audio samples in the requested format (e.g. WAV) Capabilities musicgen-songstarter-v0.2 can generate a variety of musical styles and genres based on the provided prompt, including genres like hip hop, soul, jazz, and more. It can also continue or mimic the melody of an existing audio file, making it useful for music producers looking to build on existing ideas. What can I use it for? musicgen-songstarter-v0.2 is a great tool for music producers looking to generate song ideas and sketches. By providing a textual prompt and/or an existing audio file, the model can produce new musical ideas that can be used as a starting point for further development. The model's ability to generate in stereo and mimic existing melodies makes it particularly useful for quickly prototyping new songs. Things to try One interesting capability of musicgen-songstarter-v0.2 is its ability to generate music that adheres closely to the provided inputs, thanks to the "classifier free guidance" parameter. By increasing this value, you can produce outputs that are less diverse but more closely aligned with the desired style and melody. This can be useful for quickly generating variations on a theme or refining a specific musical idea.

Updated Invalid Date

Audio-to-Audio

musicgen-remixer

sakemin

musicgen-remixer is a Cog implementation of the MusicGen Chord model, a modified version of Meta's MusicGen Melody model. It can generate music by remixing an input audio file into a different style based on a text prompt. This model is created by sakemin, who has also developed similar models like musicgen-fine-tuner and musicgen. Model inputs and outputs The musicgen-remixer model takes in an audio file and a text prompt describing the desired musical style. It then generates a remix of the input audio in the specified style. The model supports various configuration options, such as adjusting the sampling temperature, controlling the influence of the input, and selecting the output format. Inputs prompt: A text description of the desired musical style for the remix. music_input: An audio file to be remixed. Outputs The remixed audio file in the requested style. Capabilities The musicgen-remixer model can transform input audio into a variety of musical styles based on a text prompt. For example, you could input a rock song and a prompt like "bossa nova" to generate a bossa nova-style remix of the original track. What can I use it for? The musicgen-remixer model could be useful for musicians, producers, or creators who want to experiment with remixing and transforming existing audio content. It could be used to create new, unique musical compositions, add variety to playlists, or generate backing tracks for live performances. Things to try Try inputting different types of audio, from vocals to full-band recordings, and see how the model handles the transformation. Experiment with various prompts, from specific genres to more abstract descriptors, to see the range of styles the model can produce.

Updated Invalid Date

Audio-to-Audio