ControlNet-Models-For-Core-ML

Last updated 5/28/2024

⛏️

Property	Value
Run this model	Run on HuggingFace
API spec	View on HuggingFace
Github link	No Github link provided
Paper link	No paper link provided

Create account to get full access

Model overview

The ControlNet-Models-For-Core-ML is a collection of ControlNet models converted to the Apple CoreML format by the coreml-community maintainer. ControlNet is a neural network structure that allows controlling pretrained large diffusion models like Stable Diffusion by adding extra conditioning inputs. These CoreML models are specifically designed for use with Swift apps like MOCHI DIFFUSION or the SwiftCLI, and are not compatible with Python-based Diffusers pipelines.

The models in this repository include both "Original" and "Split-Einsum" versions, all built for Stable Diffusion v1.5. They feature various conditioning inputs such as Canny edge detection, Midas depth estimation, HED edge detection, MLSD line detection, surface normal estimation, OpenPose pose detection, scribbles, and semantic segmentation. These conditioning inputs can be used to guide and control the image generation process.

Model inputs and outputs

Inputs

Conditioning Image: An image that provides additional input information to guide the image generation process, such as edge maps, depth maps, poses, etc.
Text Prompt: A text description that specifies the desired output image.

Outputs

Generated Image: The final output image generated by the model, based on the provided text prompt and conditioning image.

Capabilities

The ControlNet-Models-For-Core-ML models excel at generating images that adhere to specific visual constraints or guidelines, such as incorporating detailed edge information, depth cues, or semantic segmentation. This allows for more precise control over the generated imagery, enabling users to create images that closely match their desired visual characteristics.

What can I use it for?

These ControlNet models are well-suited for various creative and artistic applications, such as generating concept art, illustrations, or visualizations that require a high degree of control over the output. Developers of Swift apps focused on image generation or manipulation can leverage these models to offer users more advanced capabilities beyond standard text-to-image generation.

Things to try

Experiment with different conditioning inputs and prompts to see how the models respond. Try using edge maps, depth information, or pose data to guide the generation of specific types of images, such as architectural renderings, character designs, or product visualizations. Additionally, explore the differences between the "Original" and "Split-Einsum" versions to see how they impact the quality and performance of the generated outputs.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Models

🧪

ControlNet

lllyasviel

3.5K

ControlNet is a neural network structure developed by Lvmin Zhang and Maneesh Agrawala to control diffusion models by adding extra conditions. It allows large diffusion models like Stable Diffusion to be augmented with various types of conditional inputs like edge maps, segmentation maps, keypoints, and more. This can enrich the methods to control large diffusion models and facilitate related applications. The maintainer, lllyasviel, has released 14 different ControlNet checkpoints, each trained on Stable Diffusion v1-5 with a different type of conditioning. These include models for canny edge detection, depth estimation, line art generation, pose estimation, and more. The checkpoints allow users to guide the generation process with these auxiliary inputs, resulting in images that adhere to the specified conditions. Model inputs and outputs Inputs Conditioning image**: An image that provides additional guidance to the model, such as edges, depth, segmentation, poses, etc. The type of conditioning image depends on the specific ControlNet checkpoint being used. Outputs Generated image**: The image generated by the diffusion model, guided by the provided conditioning image. Capabilities ControlNet enables fine-grained control over the output of large diffusion models like Stable Diffusion. By incorporating specific visual conditions, users can generate images that adhere to the desired constraints, such as having a particular edge structure, depth map, or pose arrangement. This can be useful for a variety of applications, from product design to creative art generation. What can I use it for? The ControlNet models can be used in a wide range of applications that require precise control over the generated imagery. Some potential use cases include: Product design**: Generating product renderings based on 3D models or sketches Architectural visualization**: Creating photorealistic architectural scenes from floor plans or massing models Creative art generation**: Producing unique artworks by combining diffusion with specific visual elements Illustration and comics**: Generating illustrations or comic panels with desired line art, poses, or color palettes Educational tools**: Creating custom training datasets or visualization aids for computer vision tasks Things to try One interesting aspect of ControlNet is the ability to combine multiple conditioning inputs to guide the generation process. For example, you could use a depth map and a segmentation map together to create a more detailed and coherent output. Additionally, experimenting with the conditioning scales and the balance between the text prompt and the visual input can lead to unique and unexpected results. Another area to explore is the potential of ControlNet to enable interactive, iterative image generation. By allowing users to gradually refine the conditioning images, the model can be guided towards a desired output in an incremental fashion, similar to how artists work.

Updated Invalid Date

Image-to-Image

✨

controlnet-union-sdxl-1.0

xinsir

854

The controlnet-union-sdxl-1.0 model, developed by xinsir, is a powerful ControlNet model that can support 10+ control types in condition text-to-image generation. It is based on the original ControlNet architecture and proposes two new modules to extend the model to support different image conditions using the same network parameters, and to support multiple conditions input without increasing computation offload. This allows designers to edit images in detail using different conditions with the same model. The model achieves superior performance in control ability and aesthetic score compared to other SOTA models. Model inputs and outputs Inputs Image**: The model takes an image as a control input, which can be a variety of types such as OpenPose, Depth, Canny, HED, PIDI, and Lineart. Prompt**: The text prompt that describes the desired output image. Outputs Image**: The model generates a high resolution image that visually matches the provided prompt and control image. Capabilities The controlnet-union-sdxl-1.0 model can generate images that are visually comparable to Midjourney, demonstrating its impressive control abilities. It supports a wide range of control types, allowing for fine-grained control over the generated images. The model's ability to use the same network parameters for different control types and multiple conditions inputs makes it efficient and user-friendly for designers and artists. What can I use it for? The controlnet-union-sdxl-1.0 model can be used for a variety of image generation and editing tasks, such as: Conceptual art and illustrations**: The model's strong control abilities allow users to translate their creative visions into detailed, high-quality images. Product design and visualization**: The model can be used to generate photorealistic images of products, packages, or other design concepts. Character design and animation**: The model's support for different control types, like OpenPose and Lineart, makes it well-suited for creating detailed character designs and animating them. Architectural visualization**: The model can be used to generate realistic renderings of buildings, interiors, and landscapes based on sketches or other control inputs. Things to try One key insight about the controlnet-union-sdxl-1.0 model is its ability to adapt to different control types and inputs without significantly increasing computational requirements. This makes it a versatile tool for designers and artists who need to quickly iterate on their ideas and try different approaches. For example, you could start with a simple OpenPose control image and a high-level prompt, then progressively refine the control image with more detailed Canny or Lineart information to achieve your desired result. The model's efficiency allows you to explore different variations and control types without lengthy processing times. Another interesting aspect to explore is the model's ability to combine multiple control inputs, such as using both Depth and Canny information to guide the image generation. This can lead to unique and unexpected results that blend different visual elements in compelling ways.

Updated Invalid Date

Text-to-Image

🧪

ControlNet

furusu

ControlNet is a neural network structure developed by Lvmin Zhang and Maneesh Agrawala that can be used to control large pretrained diffusion models like Stable Diffusion. The model allows for additional input conditions, such as edge maps, segmentation maps, and keypoints, to be incorporated into the text-to-image generation process. This can enrich the control and capabilities of the diffusion model. The maintainer, furusu, has provided several ControlNet checkpoint models that were trained on the Waifu Diffusion 1.5 beta2 base model. These include models for edge detection, depth estimation, pose estimation, and more. The models were trained on datasets ranging from 11,000 to 60,000 1-girl images, with training epochs from 2 to 5 and batch sizes of 8 to 16. Model inputs and outputs Inputs Control Image**: An image that provides additional conditional information to guide the text-to-image generation process. This can be an edge map, depth map, pose keypoints, etc. Outputs Generated Image**: The final output image that is generated using both the text prompt and the control image. Capabilities The ControlNet models can enhance the capabilities of the base Stable Diffusion model by allowing more precise control over the generated images. For example, the edge detection model can be used to generate images with specific edge structures, while the pose estimation model can be used to create images with particular human poses. What can I use it for? The ControlNet models can be particularly useful for tasks that require more fine-grained control over the generated images, such as character design, product visualization, and architectural rendering. By incorporating additional input conditions, users can generate images that more closely match their specific requirements. Additionally, the ability to control the diffusion process can also be leveraged for creative experimentation, allowing users to explore novel image generation possibilities. Things to try One interesting aspect of the ControlNet models is the ability to combine multiple input conditions. For example, you could use both the edge detection and pose estimation models to generate images with specific edge structures and human poses. This can lead to more complex and nuanced outputs. Another thing to try is using the ControlNet models with different base diffusion models, such as the more recent Stable Diffusion 2.1. While the models were trained on Waifu Diffusion 1.5, they may still provide useful additional control when used with other diffusion models.

Updated Invalid Date

Image-to-Image

🛠️

controlnet_qrcode-control_v1p_sd15

DionTimmer

211

The controlnet_qrcode-control_v1p_sd15 model is a ControlNet model trained to generate QR code-based artwork while maintaining the integral QR code shape. It was developed by DionTimmer and is a version tailored for Stable Diffusion 1.5. A separate model for Stable Diffusion 2.1 is also available. These ControlNet models have been trained on a large dataset of 150,000 QR code + QR code artwork couples, providing a solid foundation for generating QR code-based artwork that is aesthetically pleasing. Model inputs and outputs Inputs Prompt**: A text description of the desired image. QR code image**: An image containing a QR code that will be used as a conditioning input to the model. Initial image**: An optional initial image that can be used as a starting point for the generation process. Outputs Generated image**: An image generated based on the provided prompt and QR code conditioning. Capabilities The controlnet_qrcode-control_v1p_sd15 model excels at generating QR code-based artwork that maintains the integral QR code shape while also being visually appealing. It can be used to create a wide variety of QR code-themed artworks, such as billboards, logos, and patterns. What can I use it for? The controlnet_qrcode-control_v1p_sd15 model can be used for a variety of creative and commercial applications. Some ideas include: Generating QR code-based artwork for promotional materials, product packaging, or advertising campaigns. Creating unique and eye-catching QR code designs for branding and identity purposes. Exploring the intersection of technology and art by generating QR code-inspired digital artworks. Things to try One key aspect of the controlnet_qrcode-control_v1p_sd15 model is the ability to balance the QR code shape and the overall aesthetic of the generated artwork. By adjusting the guidance scale, controlnet conditioning scale, and strength parameters, you can experiment with finding the right balance between maintaining the QR code structure and achieving a desired artistic style. Additionally, you can try generating QR code-based artwork with different prompts and initial images to see the variety of outputs the model can produce. This can be a fun and creative way to explore the capabilities of the model and find new ways to incorporate QR codes into your designs.

Updated Invalid Date

Image-to-Image