coreml-stable-diffusion-2-base

Maintainer: apple

Last updated 5/28/2024

🌿

Property	Value
Model Link	View on HuggingFace
API Spec	View on HuggingFace
Github Link	No Github link provided
Paper Link	No paper link provided

Create account to get full access

Model overview

The coreml-stable-diffusion-2-base model is a text-to-image generation model developed by Apple. It is a version of the Stable Diffusion v2 model that has been converted for use on Apple Silicon hardware. This model is capable of generating high-quality images from text prompts and can be used with the [object Object] library.

The model was trained on a filtered subset of the large-scale LAION-5B dataset, with a focus on images with high aesthetic quality and the removal of explicit pornographic content. It uses a Latent Diffusion Model architecture that combines an autoencoder with a diffusion model, along with a fixed, pretrained text encoder (OpenCLIP-ViT/H).

There are four variants of the Core ML weights available, with different attention mechanisms and compilation targets. Users can choose the version that best fits their needs, whether that's Swift-based inference or Python-based inference, and the "original" or "split_einsum" attention mechanisms.

Model inputs and outputs

Inputs

Text prompt: A natural language description of the desired image.

Outputs

Generated image: The model outputs a high-quality image that corresponds to the input text prompt.

Capabilities

The coreml-stable-diffusion-2-base model is capable of generating a wide variety of images from text prompts, including scenes, objects, and abstract concepts. It can produce photorealistic images, as well as more stylized or imaginative compositions. The model performs well on a range of prompts, though it may struggle with more complex or compositional tasks.

What can I use it for?

The coreml-stable-diffusion-2-base model is intended for research purposes only. Possible applications include:

Safe deployment of generative models: Researching techniques to safely deploy models that have the potential to generate harmful content.
Understanding model biases: Probing the limitations and biases of the model to improve future iterations.
Creative applications: Generating artwork, designs, and other creative content.
Educational tools: Developing interactive educational or creative applications.
Generative model research: Furthering the state of the art in text-to-image generation.

The model should not be used to create content that is harmful, offensive, or in violation of copyrights.

Things to try

One interesting aspect of the coreml-stable-diffusion-2-base model is the availability of different attention mechanisms and compilation targets. Users can experiment with the "original" and "split_einsum" attention variants to see how they perform on their specific use cases and hardware setups. Additionally, the model's ability to generate high-quality images at 512x512 resolution makes it a compelling tool for creative applications and research.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Models

🎲

coreml-stable-diffusion-v1-5

apple

The coreml-stable-diffusion-v1-5 model is a version of the Stable Diffusion v1-5 model that has been converted to Core ML format for use on Apple Silicon hardware. It was developed by Hugging Face using Apple's repository, which has an ASCL license. The Stable Diffusion v1-5 model is a latent text-to-image diffusion model capable of generating photo-realistic images from text prompts. This model was initialized with the weights of the Stable-Diffusion-v1-2 checkpoint and subsequently fine-tuned to improve classifier-free guidance sampling. There are four variants of the Core ML weights available, including different attention implementations and compilation options for Swift and Python inference. Model inputs and outputs Inputs Text prompt**: The text prompt describing the desired image to be generated. Outputs Generated image**: The photo-realistic image generated based on the input text prompt. Capabilities The coreml-stable-diffusion-v1-5 model is capable of generating a wide variety of photo-realistic images from text prompts, ranging from landscapes and scenes to intricate illustrations and creative concepts. Like other Stable Diffusion models, it excels at rendering detailed, imaginative imagery, but may struggle with tasks involving more complex compositionality or generating legible text. What can I use it for? The coreml-stable-diffusion-v1-5 model is intended for research purposes, such as exploring the capabilities and limitations of generative models, generating artworks and creative content, and developing educational or creative tools. However, the model should not be used to intentionally create or disseminate images that could be harmful, disturbing, or offensive, or to impersonate individuals without their consent. Things to try One interesting aspect of the coreml-stable-diffusion-v1-5 model is the availability of different attention implementations and compilation options, which can affect the performance and memory usage of the model on Apple Silicon hardware. Developers may want to experiment with these variants to find the best balance of speed and efficiency for their specific use cases.

Updated Invalid Date

Text-to-Image

↗️

stable-diffusion-2-base

stabilityai

329

The stable-diffusion-2-base model is a diffusion-based text-to-image generation model developed by Stability AI. It is a Latent Diffusion Model that uses a fixed, pretrained text encoder (OpenCLIP-ViT/H). The model was trained from scratch on a subset of LAION-5B filtered for explicit pornographic material, using the LAION-NSFW classifier. This base model can be used to generate and modify images based on text prompts. Similar models include the stable-diffusion-2-1-base and the stable-diffusion-2 models, which build upon this base model with additional training and modifications. Model inputs and outputs Inputs Text prompt**: A natural language description of the desired image. Outputs Image**: The generated image based on the provided text prompt. Capabilities The stable-diffusion-2-base model can generate a wide range of photorealistic images from text prompts. For example, it can create images of landscapes, animals, people, and fantastical scenes. However, the model does have some limitations, such as difficulty rendering legible text and accurately depicting complex compositions. What can I use it for? The stable-diffusion-2-base model is intended for research purposes only. Potential use cases include the generation of artworks and designs, the creation of educational or creative tools, and the study of the limitations and biases of generative models. The model should not be used to intentionally create or disseminate images that are harmful or offensive. Things to try One interesting aspect of the stable-diffusion-2-base model is its ability to generate high-resolution images up to 512x512 pixels. Experimenting with different text prompts and exploring the model's capabilities at this resolution can yield some fascinating results. Additionally, comparing the outputs of this model to those of similar models, such as stable-diffusion-2-1-base and stable-diffusion-2, can provide insights into the unique strengths and limitations of each model.

Updated Invalid Date

Text-to-Image

🤯

coreml-stable-diffusion-xl-base

apple

The coreml-stable-diffusion-xl-base model is a text-to-image generation model developed by Apple. It is based on the Stable Diffusion XL (SDXL) model, which consists of an ensemble of experts pipeline for latent diffusion. The base model generates initial noisy latents, which are then further processed with a refinement model to produce the final denoised image. Alternatively, the base model can be used on its own in a two-stage pipeline to first generate latents and then apply a specialized high-resolution model for the final image. Model inputs and outputs The coreml-stable-diffusion-xl-base model takes text prompts as input and generates corresponding images as output. The text prompts can describe a wide variety of scenes, objects, and concepts, which the model then translates into visual form. Inputs Text prompt**: A natural language description of the desired image, such as "a photo of an astronaut riding a horse on mars". Outputs Generated image**: The model outputs a corresponding image based on the input text prompt. Capabilities The coreml-stable-diffusion-xl-base model is capable of generating high-quality, photorealistic images from text prompts. It can create a wide range of scenes, objects, and concepts, and performs significantly better than previous versions of Stable Diffusion. The model can also be used in a two-stage pipeline with a specialized high-resolution refinement model to further improve image quality. What can I use it for? The coreml-stable-diffusion-xl-base model is intended for research purposes, such as the generation of artworks, applications in educational or creative tools, and probing the limitations and biases of generative models. The model should not be used to create content that is harmful, offensive, or misrepresents people or events. Things to try Experiment with different text prompts to see the variety of images the model can generate. Try combining the base model with the stable-diffusion-xl-refiner-1.0 model to see if the additional refinement step improves the image quality. Explore the model's capabilities and limitations, and consider how it could be applied in creative or educational contexts.

Updated Invalid Date

Text-to-Image

👨‍🏫

stable-diffusion-2

stabilityai

1.8K

The stable-diffusion-2 model is a diffusion-based text-to-image generation model developed by Stability AI. It is an improved version of the original Stable Diffusion model, trained for 150k steps using a v-objective on the same dataset as the base model. The model is capable of generating high-resolution images (768x768) from text prompts, and can be used with the stablediffusion repository or the diffusers library. Similar models include the SDXL-Turbo and Stable Cascade models, which are also developed by Stability AI. The SDXL-Turbo model is a distilled version of the SDXL 1.0 model, optimized for real-time synthesis, while the Stable Cascade model uses a novel multi-stage architecture to achieve high-quality image generation with a smaller latent space. Model inputs and outputs Inputs Text prompt**: A text description of the desired image, which the model uses to generate the corresponding image. Outputs Image**: The generated image based on the input text prompt, with a resolution of 768x768 pixels. Capabilities The stable-diffusion-2 model can be used to generate a wide variety of images from text prompts, including photorealistic scenes, imaginative concepts, and abstract compositions. The model has been trained on a large and diverse dataset, allowing it to handle a broad range of subject matter and styles. Some example use cases for the model include: Creating original artwork and illustrations Generating concept art for games, films, or other media Experimenting with different visual styles and aesthetics Assisting with visual brainstorming and ideation What can I use it for? The stable-diffusion-2 model is intended for both non-commercial and commercial usage. For non-commercial or research purposes, you can use the model under the CreativeML Open RAIL++-M License. Possible research areas and tasks include: Research on generative models Research on the impact of real-time generative models Probing and understanding the limitations and biases of generative models Generation of artworks and use in design and other artistic processes Applications in educational or creative tools For commercial use, please refer to https://stability.ai/membership. Things to try One interesting aspect of the stable-diffusion-2 model is its ability to generate highly detailed and photorealistic images, even for complex scenes and concepts. Try experimenting with detailed prompts that describe intricate settings, characters, or objects, and see the model's ability to bring those visions to life. Additionally, you can explore the model's versatility by generating images in a variety of styles, from realism to surrealism, impressionism to expressionism. Experiment with different artistic styles and see how the model interprets and renders them.

Updated Invalid Date

Text-to-Image