Cafeai

Models by this creator

🤷

cafe-instagram-sd-1-5-v6

106

The cafe-instagram-sd-1-5-v6 model is a test model created by the maintainer cafeai to assess the Waifu Diffusion training code. It is not intended to be a full-featured or official release. This model has been trained on 1.2 million images from various Instagram accounts, primarily Japanese, for approximately 1.6 epochs. As the model is undertrained, its performance is marginal, and the maintainer recommends mixing the model for better performance. The model uses natural language descriptions (using BLIP), as well as booru tags and any Instagram hashtags to assist in captioning the generated images. The model was trained using various aspect ratios, with a base resolution of 768x768, and the penultimate CLIP layer. The maintainer recommends a CLIP skip of 2 and a resolution of 768x768 or higher for generations. Similar models to the cafe-instagram-sd-1-5-v6 include waifu-diffusion, waifu-diffusion-xl, Baka-Diffusion, and waifu-diffusion-v1-3, all of which are latent text-to-image diffusion models that have been conditioned on high-quality anime images through fine-tuning. Model inputs and outputs Inputs Text prompt**: A natural language description of the desired image, which can include tags, such as "1girl, solo, loli, masterpiece". Outputs Generated image**: A 768x768 or higher resolution image that represents the provided text prompt. Capabilities The cafe-instagram-sd-1-5-v6 model is capable of generating images of anime-style characters and scenes, such as cute girls, idols, and fashion-themed imagery. However, due to its undertrained nature, the model may struggle with coherency and overall quality compared to more fully-featured anime text-to-image models. What can I use it for? The cafe-instagram-sd-1-5-v6 model can be used for entertainment and generative art purposes, such as creating anime-inspired illustrations and concepts. While the model's performance is not as polished as more advanced models, it can still be a useful tool for experimentation and exploration within the anime art style. Things to try To get the most out of the cafe-instagram-sd-1-5-v6 model, users may want to experiment with different text prompts, focusing on specific tags or visual elements to see how the model responds. Additionally, trying out the model with different resolutions and CLIP skip values can help find the optimal configuration for your use case. As the maintainer suggests, mixing the model with other resources, such as the EasyNegative dataset, may also help improve the model's performance and output quality.

Updated 5/28/2024

Text-to-Image

👨‍🏫

cafe_aesthetic

cafeai

The cafe_aesthetic model is an image classifier fine-tuned on the microsoft/beit-base-patch16-384 model. Its purpose is to remove "aesthetically worthless" images from the dataset used to train the Waifu Diffusion project, a fine-tune effort for Stable Diffusion. The model was trained on approximately 3.5k real-life and anime/manga images to classify images as either "aesthetic" or "not_aesthetic". Similarly, the cafe-instagram-sd-1-5-v6 and waifu-diffusion-xl models have also been developed to support the Waifu Diffusion project. Model inputs and outputs The cafe_aesthetic model takes an image as input and outputs a classification of the image as either "aesthetic" or "not_aesthetic". The model was trained to err on the side of caution, generally including images unless they are in a "manga-like" format, have messy lines and/or are sketches, or include an unacceptable amount of text. Inputs Image**: An image to be classified as aesthetic or not aesthetic. Outputs Classification**: The model will output a classification of the input image as either "aesthetic" or "not_aesthetic". Capabilities The cafe_aesthetic model is designed to assist in the dataset conditioning step for the Waifu Diffusion project by removing images that are not aesthetically suitable for the final training dataset. By automating this process, the model helps to scale the dataset curation for a project with a "significantly large dataset" of around 15 million images. What can I use it for? The cafe_aesthetic model can be used to help filter large image datasets for projects like Waifu Diffusion, where manual curation of millions of images is not feasible. By automating the process of identifying and removing "aesthetically worthless" images, the model can save significant time and effort in preparing high-quality datasets for training AI models. Things to try One interesting aspect of the cafe_aesthetic model is its tendency to err on the side of caution when classifying images. This approach ensures that fewer "good" images are filtered out, even if it results in some "bad" images making it through. This trade-off is an important consideration when using the model, as the impact of false positives (keeping bad images) may be less severe than the impact of false negatives (removing good images) for a project like Waifu Diffusion.

Updated 8/15/2024

Image-to-Text