cafe_aesthetic

Maintainer: cafeai

Last updated 8/15/2024

👨‍🏫

Property	Value
Run this model	Run on HuggingFace
API spec	View on HuggingFace
Github link	No Github link provided
Paper link	No paper link provided

Create account to get full access

Model overview

The cafe_aesthetic model is an image classifier fine-tuned on the microsoft/beit-base-patch16-384 model. Its purpose is to remove "aesthetically worthless" images from the dataset used to train the Waifu Diffusion project, a fine-tune effort for Stable Diffusion. The model was trained on approximately 3.5k real-life and anime/manga images to classify images as either "aesthetic" or "not_aesthetic". Similarly, the cafe-instagram-sd-1-5-v6 and waifu-diffusion-xl models have also been developed to support the Waifu Diffusion project.

Model inputs and outputs

The cafe_aesthetic model takes an image as input and outputs a classification of the image as either "aesthetic" or "not_aesthetic". The model was trained to err on the side of caution, generally including images unless they are in a "manga-like" format, have messy lines and/or are sketches, or include an unacceptable amount of text.

Inputs

Image: An image to be classified as aesthetic or not aesthetic.

Outputs

Classification: The model will output a classification of the input image as either "aesthetic" or "not_aesthetic".

Capabilities

The cafe_aesthetic model is designed to assist in the dataset conditioning step for the Waifu Diffusion project by removing images that are not aesthetically suitable for the final training dataset. By automating this process, the model helps to scale the dataset curation for a project with a "significantly large dataset" of around 15 million images.

What can I use it for?

The cafe_aesthetic model can be used to help filter large image datasets for projects like Waifu Diffusion, where manual curation of millions of images is not feasible. By automating the process of identifying and removing "aesthetically worthless" images, the model can save significant time and effort in preparing high-quality datasets for training AI models.

Things to try

One interesting aspect of the cafe_aesthetic model is its tendency to err on the side of caution when classifying images. This approach ensures that fewer "good" images are filtered out, even if it results in some "bad" images making it through. This trade-off is an important consideration when using the model, as the impact of false positives (keeping bad images) may be less severe than the impact of false negatives (removing good images) for a project like Waifu Diffusion.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Models

🤷

cafe-instagram-sd-1-5-v6

cafeai

106

The cafe-instagram-sd-1-5-v6 model is a test model created by the maintainer cafeai to assess the Waifu Diffusion training code. It is not intended to be a full-featured or official release. This model has been trained on 1.2 million images from various Instagram accounts, primarily Japanese, for approximately 1.6 epochs. As the model is undertrained, its performance is marginal, and the maintainer recommends mixing the model for better performance. The model uses natural language descriptions (using BLIP), as well as booru tags and any Instagram hashtags to assist in captioning the generated images. The model was trained using various aspect ratios, with a base resolution of 768x768, and the penultimate CLIP layer. The maintainer recommends a CLIP skip of 2 and a resolution of 768x768 or higher for generations. Similar models to the cafe-instagram-sd-1-5-v6 include waifu-diffusion, waifu-diffusion-xl, Baka-Diffusion, and waifu-diffusion-v1-3, all of which are latent text-to-image diffusion models that have been conditioned on high-quality anime images through fine-tuning. Model inputs and outputs Inputs Text prompt**: A natural language description of the desired image, which can include tags, such as "1girl, solo, loli, masterpiece". Outputs Generated image**: A 768x768 or higher resolution image that represents the provided text prompt. Capabilities The cafe-instagram-sd-1-5-v6 model is capable of generating images of anime-style characters and scenes, such as cute girls, idols, and fashion-themed imagery. However, due to its undertrained nature, the model may struggle with coherency and overall quality compared to more fully-featured anime text-to-image models. What can I use it for? The cafe-instagram-sd-1-5-v6 model can be used for entertainment and generative art purposes, such as creating anime-inspired illustrations and concepts. While the model's performance is not as polished as more advanced models, it can still be a useful tool for experimentation and exploration within the anime art style. Things to try To get the most out of the cafe-instagram-sd-1-5-v6 model, users may want to experiment with different text prompts, focusing on specific tags or visual elements to see how the model responds. Additionally, trying out the model with different resolutions and CLIP skip values can help find the optimal configuration for your use case. As the maintainer suggests, mixing the model with other resources, such as the EasyNegative dataset, may also help improve the model's performance and output quality.

Updated Invalid Date

Text-to-Image

⛏️

waifu-diffusion-xl

hakurei

145

waifu-diffusion-xl is a latent text-to-image diffusion model that has been conditioned on high-quality anime images through fine-tuning StabilityAI's SDXL 0.9 model. It was developed by the maintainer hakurei. The model can generate anime-style images based on textual descriptions, building upon the capabilities of earlier waifu-diffusion models. Similar models include the waifu-diffusion and waifu-diffusion-v1-3 models, which also focus on generating anime-style imagery. The Baka-Diffusion model by Hosioka is another related project that aims to push the boundaries of SD1.x-based models. Model inputs and outputs Inputs Text prompt**: A textual description of the desired anime-style image, such as "1girl, aqua eyes, baseball cap, blonde hair, closed mouth, earrings, green background, hat, hoop earrings, jewelry, looking at viewer, shirt, short hair, simple background, solo, upper body, yellow shirt". Outputs Generated image**: An anime-style image that matches the input text prompt, produced through the diffusion process. Capabilities waifu-diffusion-xl can generate high-quality anime-inspired images from text prompts, leveraging the fine-tuning on a large dataset of anime images. The model is capable of producing a wide variety of anime-style characters, scenes, and visual styles, with a focus on aesthetic appeal. What can I use it for? The waifu-diffusion-xl model can be used for various entertainment and creative purposes, such as generating anime-style artwork, character designs, and illustrations. It can serve as a generative art assistant, allowing users to explore and experiment with different visual concepts based on textual descriptions. Things to try One interesting aspect of waifu-diffusion-xl is its ability to capture the nuances of anime-style art, such as character expressions, clothing, and backgrounds. Users can try experimenting with more detailed or specific prompts to see how the model handles different visual elements and styles. Additionally, combining waifu-diffusion-xl with other techniques, such as textual inversion or FreeU, can lead to further refinements and enhancements in the generated images.

Updated Invalid Date

Text-to-Image

🎲

waifu-diffusion

hakurei

2.4K

waifu-diffusion is a latent text-to-image diffusion model that has been fine-tuned on high-quality anime images. It was developed by the creator hakurei. Similar models include cog-a1111-ui, a collection of anime stable diffusion models, stable-diffusion-inpainting for filling in masked parts of images, and masactrl-stable-diffusion-v1-4 for editing real or generated images. Model inputs and outputs The waifu-diffusion model takes textual prompts as input and generates corresponding anime-style images. The input prompts can describe a wide range of subjects, characters, and scenes, and the model will attempt to render them in a unique anime aesthetic. Inputs Textual prompts describing the desired image Outputs Generated anime-style images corresponding to the input prompts Capabilities waifu-diffusion can generate a variety of anime-inspired images based on text prompts. It is capable of rendering detailed characters, scenes, and environments in a consistent anime art style. The model has been trained on a large dataset of high-quality anime images, allowing it to capture the nuances and visual conventions of the anime genre. What can I use it for? The waifu-diffusion model can be used for a variety of creative and entertainment purposes. It can serve as a generative art assistant, allowing users to create unique anime-style illustrations and artworks. The model could also be used in the development of anime-themed games, animations, or other multimedia projects. Additionally, the model could be utilized for personal hobbies or professional creative work involving anime-inspired visual content. Things to try With waifu-diffusion, you can experiment with a wide range of text prompts to generate diverse anime-style images. Try mixing and matching different elements like characters, settings, and moods to see the model's versatility. You can also explore the model's capabilities by providing more detailed or specific prompts, such as including references to particular anime tropes or visual styles.

Updated Invalid Date

Text-to-Image

🔍

waifu-diffusion-v1-3

hakurei

596

The waifu-diffusion-v1-3 model is a latent text-to-image diffusion model that has been fine-tuned on high-quality anime images. It was originally based on the Stable Diffusion 1.4 model, which was trained on the LAION2B-en dataset. The current waifu-diffusion-v1-3 model has been further fine-tuned for 10 epochs on 680k anime-styled images. Similar models include the waifu-diffusion model, which is a previous version of the waifu-diffusion-v1-3 model, as well as the Plat Diffusion, Baka-Diffusion, and EimisAnimeDiffusion_1.0v models, all of which are anime-focused text-to-image diffusion models. Model inputs and outputs Inputs Text prompts**: The model takes in text prompts that describe the desired image, such as "1girl, aqua eyes, baseball cap, blonde hair, closed mouth, earrings, green background, hat, hoop earrings, jewelry, looking at viewer, shirt, short hair, simple background, solo, upper body, yellow shirt". Outputs Images**: The model outputs high-quality, detailed images that match the provided text prompt. The generated images capture the specified visual elements like the character, clothing, and background. Capabilities The waifu-diffusion-v1-3 model excels at generating anime-styled images with high fidelity and intricate details. It can produce a wide range of characters, scenes, and settings, from portraits of individual girls to complex fantasy landscapes. The model's fine-tuning on a large dataset of anime art allows it to capture the unique stylistic elements of the anime aesthetic, such as vibrant colors, expressive facial features, and detailed clothing and accessories. What can I use it for? The waifu-diffusion-v1-3 model can be used for a variety of entertainment and creative applications, such as generating character designs, illustrations, and concept art for anime-inspired projects. It could be particularly useful for artists, designers, and content creators looking to quickly and easily produce high-quality anime-style visuals. Things to try One interesting aspect of the waifu-diffusion-v1-3 model is its ability to generate detailed and cohesive scenes, beyond just individual character portraits. Try experimenting with prompts that incorporate complex backgrounds, environments, and storytelling elements to see what kinds of immersive, anime-inspired worlds the model can create. Additionally, the model may respond well to prompts that combine anime-style elements with other genres or themes, allowing you to explore the boundaries of the anime aesthetic.

Updated Invalid Date

Text-to-Image