Declare-lab

Models by this creator

mustango

289

Mustango is an exciting addition to the world of Multimodal Large Language Models designed for controlled music generation. Developed by the declare-lab team, Mustango leverages Latent Diffusion Model (LDM), Flan-T5, and musical features to generate music from text prompts. It builds upon the work of similar models like MusicGen and MusicGen Remixer, but with a focus on more fine-grained control and improved overall music quality. Model inputs and outputs Mustango takes in a text prompt describing the desired music and generates an audio file in response. The model can be used to create a wide range of musical styles, from ambient to pop, by crafting the right prompts. Inputs Prompt**: A text description of the desired music, including details about the instrumentation, genre, tempo, and mood. Outputs Audio file**: A generated audio file containing the music based on the input prompt. Capabilities Mustango demonstrates impressive capabilities in generating music that closely matches the provided text prompt. The model is able to capture details like instrumentation, rhythm, and mood, and translate them into coherent musical compositions. Compared to earlier text-to-music models, Mustango shows significant improvements in terms of overall musical quality and coherence. What can I use it for? Mustango opens up a world of possibilities for content creators, musicians, and hobbyists alike. The model can be used to generate custom background music for videos, podcasts, or video games. Composers could leverage Mustango to quickly prototype musical ideas or explore new creative directions. Advertisers and marketers may find the model useful for generating jingles or soundtracks for their campaigns. Things to try One interesting aspect of Mustango is its ability to generate music in a variety of styles based on the input prompt. Try experimenting with different genres, moods, and levels of detail in your prompts to see the diverse range of musical compositions the model can produce. Additionally, the team has released several pre-trained models, including a Mustango Pretrained version, which may be worth exploring for specific use cases.

Updated 9/18/2024

Text-to-Audio

🐍

flan-alpaca-xl

declare-lab

117

flan-alpaca-xl is a large language model developed by the declare-lab team. It is an instruction-tuned model based on combining the Flan and Alpaca datasets. The model was fine-tuned on a 3 billion parameter base model using a single NVIDIA A6000 GPU. Similar instruction-tuned models like flan-t5-xl and flan-ul2 have shown strong performance on a variety of benchmarks, including reasoning and question answering tasks. The declare-lab team has also evaluated the safety of these types of models using the Red-Eval framework, finding that GPT-4 and ChatGPT can be "jailbroken" with concerning frequency. Model inputs and outputs Inputs Text**: The model accepts natural language text as input, which can include instructions, questions, or other prompts for the model to respond to. Outputs Text**: The model generates natural language text in response to the input. This can include answers to questions, completions of instructions, or other relevant text. Capabilities The flan-alpaca-xl model has been shown to excel at a variety of language tasks, including problem-solving, reasoning, and question answering. The declare-lab team has also benchmarked the model on the large-scale InstructEval benchmark, demonstrating strong performance compared to other open-source instruction-tuned models. What can I use it for? The flan-alpaca-xl model could be useful for a wide range of natural language processing tasks, such as: Question answering: The model can be used to answer questions on a variety of topics by generating relevant and informative responses. Task completion: The model can be used to complete instructions or perform specific tasks, such as code generation, summarization, or translation. Conversational AI: The model's language understanding and generation capabilities could be leveraged to build more natural and engaging conversational AI systems. However, as noted in the declare-lab maintainer profile, these types of models should be used with caution and their safety and fairness should be carefully assessed before deployment in real-world applications. Things to try One interesting aspect of the flan-alpaca-xl model is its ability to leverage instruction-tuning from both human and machine-generated data. This approach, exemplified by the Flacuna model, has shown promising results in improving the model's problem-solving capabilities compared to the original Vicuna model. Researchers and developers interested in exploring the boundaries of language model safety and robustness may also find the Red-Eval framework and the declare-lab team's work on "jailbreaking" large language models to be a useful area of investigation.

Updated 5/28/2024

Text-to-Text

📊

flan-alpaca-large

declare-lab

The flan-alpaca-large model is a large language model developed by the declare-lab team. It is an instruction-tuned model that combines the capabilities of the Flan collection, which covers over 1,000 diverse tasks, and the Alpaca dataset, which provides high-quality synthetic instructions for fine-tuning. This hybrid approach aims to create a model that excels at both general language understanding and following specific instructions. The flan-alpaca-large model is one of several variants released by declare-lab, ranging from a 220M parameter base model to an 11B parameter XXL model. These models can be accessed through the Hugging Face platform and are available for research and experimentation purposes. Compared to similar models like LaMini-Flan-T5-783M and LaMini-Flan-T5-248M from MBZUAI, the flan-alpaca-large model benefits from a larger training dataset that combines Flan and Alpaca, potentially leading to stronger performance on a wider range of tasks. Model inputs and outputs Inputs Text prompts that can be used to instruct the model to perform a variety of tasks, such as answering questions, generating text, and completing specific instructions. Outputs Text responses generated by the model to complete the given prompts and instructions. Capabilities The flan-alpaca-large model is designed to excel at a wide range of language tasks, from open-ended conversations to specific, goal-oriented instructions. The model's capabilities include: General language understanding**: The Flan training data allows the model to demonstrate strong performance on a diverse set of NLP tasks, including question answering, reading comprehension, and text generation. Instruction following**: The Alpaca fine-tuning process helps the model understand and follow complex instructions, making it suitable for tasks like task planning, step-by-step guidance, and creative writing prompts. Multilingual support**: The model is capable of understanding and generating text in multiple languages, including English, Spanish, Japanese, and more. What can I use it for? The flan-alpaca-large model can be a valuable tool for a variety of applications, including: Research and experimentation**: Researchers can use the model to explore advancements in areas like few-shot learning, language model safety, and the development of more capable AI assistants. Prototyping and proof-of-concept**: Developers can leverage the model's capabilities to quickly build and test language-based applications, such as chatbots, virtual assistants, and content generation tools. Education and learning**: Educators and students can use the model to aid in language learning, generate creative writing prompts, and explore the capabilities of large language models. Things to try Some interesting things to try with the flan-alpaca-large model include: Exploring the model's multilingual capabilities**: Try prompting the model in different languages and observe its ability to understand and generate responses in those languages. Testing the model's safety and robustness**: Use the provided Red-Eval tool to evaluate the model's safety and resilience against potential jailbreaking attempts. Evaluating the model's performance on specific tasks**: Benchmark the model's capabilities using the InstructEval framework, which provides a comprehensive set of evaluation tasks. Leveraging the model for text-to-audio generation**: Explore the declare-lab's Tango project, which demonstrates the use of FLAN-T5 for this purpose.

Updated 9/6/2024

Text-to-Text

💬

flan-alpaca-gpt4-xl

declare-lab

flan-alpaca-gpt4-xl is an AI model developed by declare-lab that combines the instruction-tuning approaches of Flan and Alpaca. It is a 3 billion parameter model fine-tuned on the Flan dataset of over 1,000 language tasks as well as the synthetic Alpaca dataset. This allows the model to excel at a wide variety of instruction-following tasks, from text generation to question answering and problem-solving. Similar models developed by declare-lab include the Flan-Alpaca-Large and Flan-Alpaca-XL which scale the model up to 770M and 3B parameters respectively. The team has also explored other instruction-tuned models like Flacuna, which fine-tunes Vicuna-13B on the Flan dataset. Model inputs and outputs Inputs Natural language instructions or prompts for the model to follow Outputs Responses generated by the model to complete the given instruction or task, such as text generation, question answering, or problem-solving. Capabilities The flan-alpaca-gpt4-xl model is highly capable at understanding and executing a wide variety of natural language instructions. It can generate human-like text, answer questions, solve problems, and complete tasks across many domains. For example, it can write an email from the perspective of an alpaca who enjoys eating flan, or provide thoughtful commentary on why a place like Barcelona deserves to be visited. What can I use it for? The flan-alpaca-gpt4-xl model would be well-suited for any application that requires natural language understanding and generation, such as chatbots, virtual assistants, content creation tools, and creative writing applications. Its strong performance on instruction-following tasks makes it useful for building interactive AI systems that can engage in open-ended dialogue and complete complex multi-step requests. Things to try One interesting thing to try with the flan-alpaca-gpt4-xl model is to provide it with prompts that require reasoning, analysis, or creativity. For instance, you could ask it to write a short story about an alpaca exploring a new city, or have it brainstorm ideas for a sustainable business. The model's broad knowledge and language understanding capabilities should allow it to generate thoughtful and coherent responses to such open-ended prompts. Another avenue to explore is the model's multilingual abilities, as it has been trained on data in over 50 languages. You could try providing instructions or prompts in different languages and see how the model performs on translation, text generation, and other cross-language tasks.

Updated 9/6/2024

Text-to-Audio

tango

declare-lab

Tango is a latent diffusion model (LDM) for text-to-audio (TTA) generation, capable of generating realistic audios including human sounds, animal sounds, natural and artificial sounds, and sound effects from textual prompts. It uses the frozen instruction-tuned language model Flan-T5 as the text encoder and trains a UNet-based diffusion model for audio generation. Compared to current state-of-the-art TTA models, Tango performs comparably across both objective and subjective metrics, despite training on a dataset 63 times smaller. The maintainer has released the model, training, and inference code for the research community. Tango 2 is a follow-up to Tango, built upon the same foundation but with additional alignment training using Direct Preference Optimization (DPO) on the Audio-alpaca dataset, a pairwise text-to-audio preference dataset. This helps Tango 2 generate higher-quality and more aligned audio outputs. Model inputs and outputs Inputs Prompt**: A textual description of the desired audio to be generated. Steps**: The number of steps to use for the diffusion-based audio generation process, with more steps typically producing higher-quality results at the cost of longer inference time. Guidance**: The guidance scale, which controls the trade-off between sample quality and sample diversity during the audio generation process. Outputs Audio**: The generated audio clip corresponding to the input prompt, in WAV format. Capabilities Tango and Tango 2 can generate a wide variety of realistic audio clips, including human sounds, animal sounds, natural and artificial sounds, and sound effects. For example, they can generate sounds of an audience cheering and clapping, rolling thunder with lightning strikes, or a car engine revving. What can I use it for? The Tango and Tango 2 models can be used for a variety of applications, such as: Audio content creation**: Generating audio clips for videos, games, podcasts, and other multimedia projects. Sound design**: Creating custom sound effects for various applications. Music composition**: Generating musical elements or accompaniment for songwriting and composition. Accessibility**: Generating audio descriptions for visually impaired users. Things to try You can try generating various types of audio clips by providing different prompts to the Tango and Tango 2 models, such as: Everyday sounds (e.g., a dog barking, water flowing, a car engine revving) Natural phenomena (e.g., thunderstorms, wind, rain) Musical instruments and soundscapes (e.g., a piano playing, a symphony orchestra) Human vocalizations (e.g., laughter, cheering, singing) Ambient and abstract sounds (e.g., a futuristic machine, alien landscapes) Experiment with the number of steps and guidance scale to find the right balance between sample quality and generation time for your specific use case.

Updated 9/18/2024

Text-to-Audio