musicgen

Maintainer: meta

2.0K

Last updated 9/16/2024

Property	Value
Run this model	Run on Replicate
API spec	View on Replicate
Github link	View on Github
Paper link	View on Arxiv

Create account to get full access

Model overview

musicgen is a simple and controllable model for music generation developed by Meta. Unlike existing methods like MusicLM, musicgen doesn't require a self-supervised semantic representation and generates all 4 codebooks in one pass. By introducing a small delay between the codebooks, the authors show they can predict them in parallel, thus having only 50 auto-regressive steps per second of audio. musicgen was trained on 20K hours of licensed music, including an internal dataset of 10K high-quality music tracks and music data from ShutterStock and Pond5.

Model inputs and outputs

musicgen takes in a text prompt or melody and generates corresponding music. The model's inputs include a description of the desired music, an optional input audio file to influence the generated output, and various parameters to control the generation process like temperature, top-k, and top-p sampling. The output is a generated audio file in WAV format.

Inputs

Prompt: A description of the music you want to generate.
Input Audio: An optional audio file that will influence the generated music. If "continuation" is set to true, the generated music will be a continuation of the input audio. Otherwise, it will mimic the input audio's melody.
Duration: The duration of the generated audio in seconds.
Continuation Start/End: The start and end times of the input audio to use for continuation.
Various generation parameters: Settings like temperature, top-k, top-p, etc. to control the diversity and quality of the generated output.

Outputs

Generated Audio: A WAV file containing the generated music.

Capabilities

musicgen can generate a wide variety of music styles and genres based on the provided text prompt. For example, you could ask it to generate "tense, staccato strings with plucked dissonant strings, like a scary movie soundtrack" and it would produce corresponding music. The model can also continue or mimic the melody of an input audio file, allowing for more coherent and controlled music generation.

What can I use it for?

musicgen could be used for a variety of applications, such as:

Background music generation: Automatically generating custom music for videos, games, or other multimedia projects.
Music composition assistance: Helping musicians and composers come up with new musical ideas or sketches to build upon.
Audio creation for content creators: Allowing YouTubers, podcasters, and other content creators to easily add custom music to their projects.

Things to try

One interesting aspect of musicgen is its ability to generate music in parallel by predicting the different codebook components separately. This allows for faster generation compared to previous autoregressive music models. You could try experimenting with different generation parameters to find the right balance between generation speed, diversity, and quality for your use case.

Additionally, the model's ability to continue or mimic input audio opens up possibilities for interactive music creation workflows, where users could iterate on an initial seed melody or prompt to refine the generated output.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Models

⛏️

musicgen

aussielabs

525

musicgen is a deployment of Meta's MusicGen model, a state-of-the-art controllable text-to-music generation system. It was developed by the team at aussielabs. musicgen can generate high-quality music from text prompts or continue and mimic existing audio. It is part of the broader AudioCraft library, which contains other impressive audio generation models like AudioGen and EnCodec. Model inputs and outputs Inputs Prompt**: A description of the music you want to generate. Input Audio**: An audio file that will influence the generated music. The generated music can either continue the audio file's melody or mimic its style. Duration**: The desired duration of the generated audio in seconds. Continuation Start/End**: The start and end times of the audio file to use for continuation. Model Version**: The specific MusicGen model to use, such as the "melody" version. Output Format**: The desired format for the generated audio, such as WAV. Normalization Strategy**: The strategy for normalizing the output audio. Temperature**: Controls the "conservativeness" of the sampling process. Top K/P**: Reduces the sampling to the most likely tokens. Classifier Free Guidance**: Increases the influence of the input on the output. Outputs Output**: The generated audio file in the specified format. Capabilities musicgen can generate diverse and high-quality musical compositions from text prompts. It can also continue and mimic existing audio, allowing for creative remixing and mashups. The model is highly controllable, with options to adjust the generated music's style, duration, and other parameters. What can I use it for? musicgen can be used for a variety of applications, such as: Generating custom background music for videos, games, or podcasts Creating unique musical compositions for personal or commercial projects Experimenting with remixing and mashups by continuing or mimicking existing tracks Exploring new musical ideas and styles through text-based prompts Things to try One interesting capability of musicgen is its ability to continue and mimic existing audio. Try providing an audio file as input and experiment with the "continuation" and "melody" options to see how the model can extend or transform the original music. You can also try adjusting the temperature and guidance settings to generate more diverse or controlled outputs.

Updated Invalid Date

Audio-to-Audio

musicgen-songstarter-v0.2

nateraw

musicgen-songstarter-v0.2 is a large, stereo MusicGen model fine-tuned by nateraw on a dataset of melody loops from their Splice sample library. It is intended to be a useful tool for music producers to generate song ideas. Compared to the previous version musicgen-songstarter-v0.1, this new model was trained on 3x more unique, manually-curated samples and is double the size, using a larger large transformer language model. Similar models include the original musicgen from Meta, which can generate music from a prompt or melody, as well as other fine-tuned versions like musicgen-fine-tuner and musicgen-stereo-chord. Model inputs and outputs musicgen-songstarter-v0.2 takes a variety of inputs to control the generated music, including a text prompt, audio file, and various parameters to adjust the sampling and normalization. The model outputs stereo audio at 32kHz. Inputs Prompt**: A description of the music you want to generate Input Audio**: An audio file that will influence the generated music Continuation**: Whether the generated music should continue from the provided audio file or mimic its melody Continuation Start/End**: The start and end times of the audio file to use for continuation Duration**: The duration of the generated audio in seconds Sampling Parameters**: Controls like top_k, top_p, temperature, and classifier_free_guidance to adjust the diversity and influence of the inputs Outputs Audio**: Stereo audio samples in the requested format (e.g. WAV) Capabilities musicgen-songstarter-v0.2 can generate a variety of musical styles and genres based on the provided prompt, including genres like hip hop, soul, jazz, and more. It can also continue or mimic the melody of an existing audio file, making it useful for music producers looking to build on existing ideas. What can I use it for? musicgen-songstarter-v0.2 is a great tool for music producers looking to generate song ideas and sketches. By providing a textual prompt and/or an existing audio file, the model can produce new musical ideas that can be used as a starting point for further development. The model's ability to generate in stereo and mimic existing melodies makes it particularly useful for quickly prototyping new songs. Things to try One interesting capability of musicgen-songstarter-v0.2 is its ability to generate music that adheres closely to the provided inputs, thanks to the "classifier free guidance" parameter. By increasing this value, you can produce outputs that are less diverse but more closely aligned with the desired style and melody. This can be useful for quickly generating variations on a theme or refining a specific musical idea.

Updated Invalid Date

Audio-to-Audio

audiogen

sepal

audiogen is a model developed by Sepal that can generate sounds from text prompts. It is similar to other audio-related models like musicgen from Meta, which generates music from prompts, and styletts2 from Adirik, which generates speech from text. audiogen can be used to create a wide variety of sounds, from ambient noise to sound effects, based on the text prompt provided. Model inputs and outputs audiogen takes a text prompt as the main input, along with several optional parameters to control the output, such as duration, temperature, and output format. The model then generates an audio file in the specified format that represents the sounds described by the prompt. Inputs Prompt**: A text description of the sounds to be generated Duration**: The maximum duration of the generated audio (in seconds) Temperature**: Controls the "conservativeness" of the sampling process, with higher values producing more diverse outputs Classifier Free Guidance**: Increases the influence of the input prompt on the output Output Format**: The desired output format for the generated audio (e.g., WAV) Outputs Audio File**: The generated audio file in the specified format Capabilities audiogen can create a wide range of sounds based on text prompts, from simple ambient noise to more complex sound effects. For example, you could use it to generate the sound of a babbling brook, a thunderstorm, or even the roar of a lion. The model's ability to generate diverse and realistic-sounding audio makes it a useful tool for tasks like audio production, sound design, and even voice user interface development. What can I use it for? audiogen could be used in a variety of projects that require audio generation, such as video game sound effects, podcast or audiobook background music, or even sound design for augmented reality or virtual reality applications. The model's versatility and ease of use make it a valuable tool for creators and developers working in these and other audio-related fields. Things to try One interesting aspect of audiogen is its ability to generate sounds that are both realistic and evocative. By crafting prompts that tap into specific emotions or sensations, users can explore the model's potential to create immersive audio experiences. For example, you could try generating the sound of a cozy fireplace or the peaceful ambiance of a forest, and then incorporate these sounds into a multimedia project or relaxation app.

Updated Invalid Date

Text-to-Audio

musicgen-looper

andreasjansson

The musicgen-looper is a Cog implementation of the MusicGen model, a simple and controllable model for music generation developed by Facebook Research. Unlike existing music generation models like MusicLM, MusicGen does not require a self-supervised semantic representation and generates all four audio codebooks in a single pass. By introducing a small delay between the codebooks, MusicGen can predict them in parallel, reducing the number of auto-regressive steps per second of audio. The model was trained on 20,000 hours of licensed music data, including an internal dataset of 10,000 high-quality tracks as well as music from ShutterStock and Pond5. The musicgen-looper model is similar to other music generation models like music-inpainting-bert, cantable-diffuguesion, and looptest in its ability to generate music from prompts. However, the key differentiator of musicgen-looper is its focus on generating fixed-BPM loops from text prompts. Model inputs and outputs The musicgen-looper model takes in a text prompt describing the desired music, as well as various parameters to control the generation process, such as tempo, seed, and sampling parameters. It outputs a WAV file containing the generated audio loop. Inputs Prompt**: A description of the music you want to generate. BPM**: Tempo of the generated loop in beats per minute. Seed**: Seed for the random number generator. If not provided, a random seed will be used. Top K**: Reduces sampling to the k most likely tokens. Top P**: Reduces sampling to tokens with cumulative probability of p. When set to 0 (default), top_k sampling is used. Temperature**: Controls the "conservativeness" of the sampling process. Higher temperature means more diversity. Classifier Free Guidance**: Increases the influence of inputs on the output. Higher values produce lower-variance outputs that adhere more closely to the inputs. Max Duration**: Maximum duration of the generated loop in seconds. Variations**: Number of variations to generate. Model Version**: Selects the model to use for generation. Output Format**: Specifies the output format for the generated audio (currently only WAV is supported). Outputs WAV file**: The generated audio loop. Capabilities The musicgen-looper model can generate a wide variety of musical styles and textures from text prompts, including tense, dissonant strings, plucked strings, and more. By controlling parameters like tempo, sampling, and classifier free guidance, users can fine-tune the generated output to match their desired style and mood. What can I use it for? The musicgen-looper model could be useful for a variety of applications, such as: Soundtrack generation**: Generating background music or sound effects for videos, games, or other multimedia projects. Music composition**: Providing a starting point or inspiration for composers and musicians to build upon. Audio manipulation**: Experimenting with different prompts and parameters to create unique and interesting musical textures. The model's ability to generate fixed-BPM loops makes it particularly well-suited for applications where a seamless, loopable audio track is required. Things to try One interesting aspect of the musicgen-looper model is its ability to generate variations on a given prompt. By adjusting the "Variations" parameter, users can explore how the model interprets and reinterprets a prompt in different ways. This could be a useful tool for composers and musicians looking to generate a diverse set of ideas or explore the model's creative boundaries. Another interesting feature is the model's use of classifier free guidance, which helps the generated output adhere more closely to the input prompt. By experimenting with different levels of classifier free guidance, users can find the right balance between adhering to the prompt and introducing their own creative flair.

Updated Invalid Date

Text-to-Audio