stable-diffusion-v1-5

Maintainer: runwayml

10.8K

Last updated 5/28/2024

🎯

Property	Value
Model Link	View on HuggingFace
API Spec	View on HuggingFace
Github Link	No Github link provided
Paper Link	No paper link provided

Create account to get full access

Model overview

stable-diffusion-v1-5 is a latent text-to-image diffusion model developed by runwayml that can generate photo-realistic images from text prompts. It was initialized with the weights of the Stable-Diffusion-v1-2 checkpoint and then fine-tuned on 595k steps at 512x512 resolution on the "laion-aesthetics v2 5+" dataset. This fine-tuning included a 10% dropping of the text-conditioning to improve classifier-free guidance sampling.

Similar models include the Stable-Diffusion-v1-4 checkpoint, which was trained on 225k steps at 512x512 resolution on "laion-aesthetics v2 5+" with 10% text-conditioning dropping, as well as the coreml-stable-diffusion-v1-5 model, which is a version of the stable-diffusion-v1-5 model converted for use on Apple Silicon hardware.

Model inputs and outputs

Inputs

Text prompt: A textual description of the desired image to generate.

Outputs

Generated image: A photo-realistic image that matches the provided text prompt.

Capabilities

The stable-diffusion-v1-5 model can generate a wide variety of photo-realistic images from text prompts. For example, it can create images of imaginary scenes, like "a photo of an astronaut riding a horse on mars", as well as more realistic images, like "a photo of a yellow cat sitting on a park bench". The model is able to capture details like lighting, textures, and composition, resulting in highly convincing and visually appealing outputs.

What can I use it for?

The stable-diffusion-v1-5 model is intended for research purposes only. Potential use cases include:

Generating artwork and creative content for design, education, or personal projects (using the Diffusers library)
Probing the limitations and biases of generative models
Developing safe deployment strategies for models with the potential to generate harmful content

The model should not be used to create content that is disturbing, offensive, or propagates harmful stereotypes. Excluded uses include generating demeaning representations, impersonating individuals without consent, or sharing copyrighted material.

Things to try

One interesting aspect of the stable-diffusion-v1-5 model is its ability to generate highly detailed and visually compelling images, even for complex or fantastical prompts. Try experimenting with prompts that combine multiple elements, like "a photo of a robot unicorn fighting a giant mushroom in a cyberpunk city". The model's strong grasp of composition and lighting can result in surprisingly coherent and imaginative outputs.

Another area to explore is the model's flexibility in handling different styles and artistic mediums. Try prompts that reference specific art movements, like "a Monet-style painting of a sunset over a lake" or "a cubist portrait of a person". The model's latent diffusion approach allows it to capture a wide range of visual styles and aesthetics.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Models

🧪

stable-diffusion-v1-4

CompVis

6.3K

stable-diffusion-v1-4 is a latent text-to-image diffusion model developed by CompVis that is capable of generating photo-realistic images given any text input. It was initialized with the weights of the Stable-Diffusion-v1-2 checkpoint and subsequently fine-tuned on 225k steps at resolution 512x512 on "laion-aesthetics v2 5+" and 10% dropping of the text-conditioning to improve classifier-free guidance sampling. Model inputs and outputs stable-diffusion-v1-4 is a text-to-image generation model. It takes text prompts as input and outputs corresponding images. Inputs Text prompts**: The model generates images based on the provided text descriptions. Outputs Images**: The model outputs photo-realistic images that match the provided text prompt. Capabilities stable-diffusion-v1-4 can generate a wide variety of images from text inputs, including scenes, objects, and even abstract concepts. The model excels at producing visually striking and detailed images that capture the essence of the textual prompt. What can I use it for? The stable-diffusion-v1-4 model can be used for a range of creative and artistic applications, such as generating illustrations, conceptual art, and product visualizations. Its text-to-image capabilities make it a powerful tool for designers, artists, and content creators looking to bring their ideas to life. However, it's important to use the model responsibly and avoid generating content that could be harmful or offensive. Things to try One interesting thing to try with stable-diffusion-v1-4 is experimenting with different text prompts to see the variety of images the model can produce. You could also try combining the model with other techniques, such as image editing or style transfer, to create unique and compelling visual content.

Updated Invalid Date

Text-to-Image

🤷

stable-diffusion-v1-1

CompVis

stable-diffusion-v1-1 is a latent text-to-image diffusion model capable of generating photo-realistic images given any text input. It was trained on 237,000 steps at resolution 256x256 on laion2B-en, followed by 194,000 steps at resolution 512x512 on laion-high-resolution. The model is intended to be used with the Diffusers library. It is a Latent Diffusion Model that uses a fixed, pretrained text encoder (CLIP ViT-L/14) as suggested in the Imagen paper. Similar models like stable-diffusion-v1-4 have been trained for longer and are usually better in terms of image generation quality. The stable-diffusion model provides an overview of the various Stable Diffusion model checkpoints. Model inputs and outputs Inputs Text prompt**: A text description of the desired image to generate. Outputs Generated image**: A photo-realistic image matching the input text prompt. Capabilities stable-diffusion-v1-1 can generate a wide variety of images from text prompts, including realistic scenes, abstract art, and imaginative creations. For example, it can create images of "a photo of an astronaut riding a horse on mars", "a painting of a unicorn in a fantasy landscape", or "a surreal portrait of a robot musician". What can I use it for? The stable-diffusion-v1-1 model is intended for research purposes only. Possible use cases include: Safe deployment of models that can generate potentially harmful content Probing and understanding the limitations and biases of generative models Generation of artworks and use in design and other creative processes Applications in educational or creative tools Research on generative models The model should not be used to intentionally create or disseminate images that are disturbing, offensive, or propagate harmful stereotypes. Things to try Some interesting things to try with stable-diffusion-v1-1 include: Experimenting with different text prompts to see the range of images the model can generate Trying out different noise schedulers to see how they affect the output Exploring the model's capabilities and limitations, such as its ability to render text or handle complex compositions Investigating ways to mitigate potential biases and harmful outputs from the model

Updated Invalid Date

Text-to-Image

🎲

coreml-stable-diffusion-v1-5

apple

The coreml-stable-diffusion-v1-5 model is a version of the Stable Diffusion v1-5 model that has been converted to Core ML format for use on Apple Silicon hardware. It was developed by Hugging Face using Apple's repository, which has an ASCL license. The Stable Diffusion v1-5 model is a latent text-to-image diffusion model capable of generating photo-realistic images from text prompts. This model was initialized with the weights of the Stable-Diffusion-v1-2 checkpoint and subsequently fine-tuned to improve classifier-free guidance sampling. There are four variants of the Core ML weights available, including different attention implementations and compilation options for Swift and Python inference. Model inputs and outputs Inputs Text prompt**: The text prompt describing the desired image to be generated. Outputs Generated image**: The photo-realistic image generated based on the input text prompt. Capabilities The coreml-stable-diffusion-v1-5 model is capable of generating a wide variety of photo-realistic images from text prompts, ranging from landscapes and scenes to intricate illustrations and creative concepts. Like other Stable Diffusion models, it excels at rendering detailed, imaginative imagery, but may struggle with tasks involving more complex compositionality or generating legible text. What can I use it for? The coreml-stable-diffusion-v1-5 model is intended for research purposes, such as exploring the capabilities and limitations of generative models, generating artworks and creative content, and developing educational or creative tools. However, the model should not be used to intentionally create or disseminate images that could be harmful, disturbing, or offensive, or to impersonate individuals without their consent. Things to try One interesting aspect of the coreml-stable-diffusion-v1-5 model is the availability of different attention implementations and compilation options, which can affect the performance and memory usage of the model on Apple Silicon hardware. Developers may want to experiment with these variants to find the best balance of speed and efficiency for their specific use cases.

Updated Invalid Date

Text-to-Image

🐍

stable-diffusion-v-1-4-original

CompVis

2.7K

stable-diffusion-v-1-4-original is a latent text-to-image diffusion model developed by CompVis that can generate photo-realistic images from text prompts. It is an improved version of the Stable-Diffusion-v1-2 model, with additional fine-tuning on the "laion-aesthetics v2 5+" dataset and 10% dropping of the text-conditioning to improve classifier-free guidance sampling. This model is capable of generating a wide variety of images based on text descriptions, though it may struggle with more complex tasks involving compositionality or generating realistic human faces. Model inputs and outputs Inputs Text prompt**: A natural language description of the desired image to generate. Outputs Generated image**: A photo-realistic image that matches the provided text prompt. Capabilities The stable-diffusion-v-1-4-original model is capable of generating a wide range of photo-realistic images from text prompts, including scenes, objects, and even some abstract concepts. For example, it can generate images of "a photo of an astronaut riding a horse on mars", "a vibrant oil painting of a hummingbird in a garden", or "a surreal landscape with floating islands and glowing mushrooms". However, the model may struggle with more complex tasks that require fine-grained control over the composition, such as rendering a "red cube on top of a blue sphere". What can I use it for? The stable-diffusion-v-1-4-original model is intended for research purposes only, and may have applications in areas such as safe deployment of AI systems, understanding model limitations and biases, generating artwork and design, and educational or creative tools. However, the model should not be used to intentionally create or disseminate images that are harmful, offensive, or propagate stereotypes. Things to try One interesting aspect of the stable-diffusion-v-1-4-original model is its ability to generate images with a wide range of artistic styles, from photorealistic to more abstract and surreal. You could try experimenting with different prompts to see the range of styles the model can produce, or explore how the model performs on tasks that require more complex compositional reasoning.

Updated Invalid Date

Text-to-Image