Last updated 10/4/2024
Model overview

audiogen is a model developed by Sepal that can generate sounds from text prompts. It is similar to other audio-related models like musicgen from Meta, which generates music from prompts, and styletts2 from Adirik, which generates speech from text. audiogen can be used to create a wide variety of sounds, from ambient noise to sound effects, based on the text prompt provided.

Model inputs and outputs

audiogen takes a text prompt as the main input, along with several optional parameters to control the output, such as duration, temperature, and output format. The model then generates an audio file in the specified format that represents the sounds described by the prompt.


  • Prompt: A text description of the sounds to be generated
  • Duration: The maximum duration of the generated audio (in seconds)
  • Temperature: Controls the "conservativeness" of the sampling process, with higher values producing more diverse outputs
  • Classifier Free Guidance: Increases the influence of the input prompt on the output
  • Output Format: The desired output format for the generated audio (e.g., WAV)


  • Audio File: The generated audio file in the specified format


audiogen can create a wide range of sounds based on text prompts, from simple ambient noise to more complex sound effects. For example, you could use it to generate the sound of a babbling brook, a thunderstorm, or even the roar of a lion. The model's ability to generate diverse and realistic-sounding audio makes it a useful tool for tasks like audio production, sound design, and even voice user interface development.

What can I use it for?

audiogen could be used in a variety of projects that require audio generation, such as video game sound effects, podcast or audiobook background music, or even sound design for augmented reality or virtual reality applications. The model's versatility and ease of use make it a valuable tool for creators and developers working in these and other audio-related fields.

Things to try

One interesting aspect of audiogen is its ability to generate sounds that are both realistic and evocative. By crafting prompts that tap into specific emotions or sensations, users can explore the model's potential to create immersive audio experiences. For example, you could try generating the sound of a cozy fireplace or the peaceful ambiance of a forest, and then incorporate these sounds into a multimedia project or relaxation app.

