Ostris

Models by this creator

flux-dev-lora-trainer

2.0K

The flux-dev-lora-trainer is an AI model developed by Ostris that allows users to fine-tune the FLUX.1-dev model using the AI-toolkit. This model is part of Ostris' research efforts and is designed to be a flexible and experimental platform for exploring different AI training techniques. Similar models created by Ostris include the ai-toolkit, flux-dev-lora, flux-dev-multi-lora, flux-dev-realism, and flux-schnell-lora, all of which focus on different aspects of FLUX.1-dev and FLUX.1-schnell models. Model inputs and outputs The flux-dev-lora-trainer model is designed to fine-tune the FLUX.1-dev model using the AI-toolkit. The model accepts a variety of inputs, including the prompt, seed, aspect ratio, and other parameters that control the generation process. Inputs Prompt**: The text prompt that describes the desired image. Seed**: The random seed used for generating the image. Model**: The version of the FLUX.1 model to use, either the "dev" or "schnell" version. Width and Height**: The desired width and height of the generated image. Aspect Ratio**: The aspect ratio of the generated image, which can be set to a predefined value or "custom". Number of Outputs**: The number of images to generate. Lora Scale**: The strength of the LoRA (Low-Rank Adaptation) to be applied. Guidance Scale**: The guidance scale for the diffusion process. Number of Inference Steps**: The number of steps to take during the diffusion process. Outputs Generated Images**: The model outputs one or more images based on the provided inputs. Capabilities The flux-dev-lora-trainer model is designed to be a flexible and experimental platform for fine-tuning the FLUX.1-dev model. It allows users to experiment with different training techniques and settings, such as adjusting the LoRA scale, guidance scale, and number of inference steps. This can be useful for exploring how these parameters affect the quality and characteristics of the generated images. What can I use it for? The flux-dev-lora-trainer model can be used for a variety of research and development purposes, such as: Experimenting with different training techniques and settings for the FLUX.1-dev model Generating custom images based on specific prompts and requirements Exploring the capabilities and limitations of the FLUX.1-dev model Integrating the fine-tuned model into other applications or projects Things to try Some interesting things to try with the flux-dev-lora-trainer model include: Experimenting with different LoRA scales to see how they affect the generated images Adjusting the guidance scale to find the optimal balance between image quality and creativity Exploring the differences between the FLUX.1-dev and FLUX.1-schnell models and how they perform on various tasks Integrating the fine-tuned model into other applications or projects to see how it performs in real-world scenarios

Updated 9/18/2024

Text-to-Image

🤷

ikea-instructions-lora-sdxl

ostris

197

The ikea-instructions-lora-sdxl model is a LORA (Low-Rank Adaptation) model trained on SDXL (Stable Diffusion XL) to generate images that follow step-by-step instructions. This model was created by ostris, who maintains the model on Hugging Face. The model is able to generate images that depict specific steps or actions, such as assembling furniture, cooking a hamburger, or recreating scenes from movies. It can take simple prompts describing the desired outcome and generate the corresponding step-by-step visual instructions. Compared to similar models like the sdxl-wrong-lora and the Personal_Lora_collections, the ikea-instructions-lora-sdxl model is specifically focused on generating step-by-step visual instructions rather than character-focused or general image generation. Model inputs and outputs Inputs Prompt**: A simple text description of the desired outcome, such as "hamburger" or "sleep". Negative prompt** (optional): Words to avoid in the generated images, such as "blurry" or "low quality". Outputs Step-by-step images**: The model generates a series of images that visually depict the steps to achieve the desired outcome described in the prompt. Capabilities The ikea-instructions-lora-sdxl model excels at generating clear, step-by-step visual instructions for a wide variety of tasks and objects. It can take simple prompts and break them down into a series of instructional images, making it useful for tasks like assembling furniture, cooking recipes, or recreating scenes from movies or books. For example, with the prompt "hamburger, lettuce, mayo, lettuce, no tomato", the model generates a series of images showing the steps to assemble a hamburger with the specified toppings. Similarly, the prompt "barbie and ken" results in a series of images depicting a Barbie and Ken doll scene. What can I use it for? The ikea-instructions-lora-sdxl model could be useful for a variety of applications, such as: Instructional content creation**: Generate step-by-step visual instructions for assembling products, cooking recipes, or completing other tasks. Educational resources**: Create interactive learning materials that visually demonstrate concepts or processes. Entertainment and media**: Generate visuals for storytelling, creative projects, or movie/TV show recreations. ostris, the maintainer of the model, suggests that it can be useful for a wide range of prompts, and that the model is able to "figure out the steps" to create the desired images. Things to try One interesting aspect of the ikea-instructions-lora-sdxl model is its ability to take simple prompts and break them down into a series of instructional images. Try experimenting with different types of prompts, from everyday tasks like "make a sandwich" to more complex or creative prompts like "the dude, from the movie the big lebowski, drinking, rug wet, bowling ball". Additionally, you can explore the use of negative prompts to refine the generated images, such as avoiding "blurry" or "low quality" outputs. This can help the model generate cleaner, more polished instructional images.

Updated 5/28/2024

Image-to-Image

🏋️

ip-composition-adapter

ostris

152

The ip-composition-adapter is a unique AI model designed to inject the general composition of an image into the Stable Diffusion 1.5 and SDXL models, while mostly ignoring the style and content. This means that an input image of a person waving their left hand can produce an output image of a completely different person waving their left hand. This sets it apart from control nets, which are more rigid and aim to spatially align the output image to the control image. The model was created by ostris, who gives full credit to POM and BANODOCO for the original idea. It can be used similarly to other IP+ adapters from the h94/IP-Adapter repository, requiring the CLIP vision encoder (CLIP-H). Model inputs and outputs Inputs Prompt**: The text prompt describing the desired image Control Image**: An image that provides the general composition for the output Outputs Generated Image**: A new image that matches the provided prompt and the general composition of the control image Capabilities The ip-composition-adapter allows for more flexible control over the composition of generated images compared to control nets. Rather than rigidly aligning the output to the control image, it uses the control image to influence the overall composition while still generating a unique image based on the input prompt. What can I use it for? The ip-composition-adapter could be useful for creative projects where you want to generate images that follow a specific composition, but with different subject matter. For example, you could use a portrait of a person waving as the control image, and generate a variety of different people waving in that same pose. This could be beneficial for designers, artists, or anyone looking to create a consistent visual style across a series of images. Things to try One interesting aspect of the ip-composition-adapter is its ability to generate images that maintain the overall composition but with completely different subject matter. You could experiment with using a wide variety of control images, from landscapes to abstract patterns, and see how the generated images reflect those underlying compositions. This could lead to some unexpected and creative results.

Updated 5/28/2024

Image-to-Image

✨

OpenFLUX.1

ostris

OpenFLUX.1 is a work-in-progress model being developed by ostris. It is not ready for general use yet, but the goal is to create a non-distilled version of the impressive FLUX.1-schnell model, which was created by Black Forest Labs. The FLUX.1-schnell model is a 12 billion parameter rectified flow transformer capable of generating high-quality images from text descriptions. However, since FLUX.1-schnell is a distilled model, it cannot be fine-tuned with techniques like LoRAs, IP adapters, or control nets. The OpenFLUX.1 model aims to address this limitation by providing a non-distilled base that can be used to train these types of adapters, which can then be used with the FLUX.1-schnell model. Model inputs and outputs OpenFLUX.1 is a text-to-image generation model. It takes text prompts as input and generates corresponding images as output. Inputs Text prompts**: The model accepts natural language descriptions of the desired image as input. Outputs Generated images**: The model outputs images that attempt to visually represent the input text prompt. Capabilities The OpenFLUX.1 model is still in development, so its current capabilities are limited. Since it is breaking the distillation of the FLUX.1-schnell model, it may not produce images of the same high quality. Additionally, the model currently lacks guidance embeddings, which can negatively impact image generation. However, the goal is for OpenFLUX.1 to serve as a base model for training adapters that can then be used with the FLUX.1-schnell model to enable fine-tuning and other advanced techniques. What can I use it for? At this stage, OpenFLUX.1 is primarily useful for researchers and developers interested in exploring the potential of training adapters on a non-distilled version of the FLUX.1-schnell model. While the generated images may not be of the highest quality, the model could be a valuable tool for experimenting with different fine-tuning approaches and techniques. Once the model is more mature, it may have broader applications in text-to-image generation, but for now, its primary use case is as a research and development platform. Things to try Since OpenFLUX.1 is a work-in-progress, the best thing to try is experimenting with different fine-tuning techniques and monitoring the impact on image quality and performance. Researchers and developers interested in advancing the field of text-to-image generation may find this model a useful starting point for their own work.

Updated 9/11/2024

Text-to-Image

⛏️

vae-kl-f8-d16

ostris

The vae-kl-f8-d16 is a 16-channel Variational Autoencoder (VAE) with an 8x downsampling factor, created by maintainer ostris. It was trained from scratch on a balanced dataset of photos, artistic works, text, cartoons, and vector images. Compared to other VAEs like the SD3 VAE, the vae-kl-f8-d16 is lighter weight with only 57,266,643 parameters, yet it scores quite similarly on real images in terms of PSNR and LPIPS metrics. It is released under the MIT license, allowing users to use it freely. The vae-kl-f8-d16 can be used as a drop-in replacement for the VAE in the Stable Diffusion 1.5 pipeline. It provides a more efficient alternative to the larger VAEs used in Stable Diffusion models, while maintaining similar performance. Model inputs and outputs Inputs Latent representations of images Outputs Reconstructed images from the provided latent representations Capabilities The vae-kl-f8-d16 VAE is capable of reconstructing a wide variety of image types, including photos, artwork, text, and vector graphics, with a high level of fidelity. Its lighter weight compared to larger VAEs makes it an attractive option for those looking to reduce the computational and memory requirements of their image generation pipelines, without sacrificing too much in terms of output quality. What can I use it for? The vae-kl-f8-d16 VAE can be used as a drop-in replacement for the VAE component in Stable Diffusion 1.5 pipelines, as demonstrated in the provided example code. This allows for faster and more efficient image generation, while maintaining the quality of the outputs. Additionally, the open-source nature of the model means that users can experiment with it, fine-tune it, or incorporate it into their own custom image generation models and workflows. Things to try One interesting thing to try with the vae-kl-f8-d16 VAE is to explore how its latent space and reconstruction capabilities differ from those of larger VAEs, such as the SD3 VAE. Comparing the outputs and performance on various types of images can provide insights into the tradeoffs between model size, efficiency, and output quality. Additionally, users may want to experiment with fine-tuning the VAE on specialized datasets to tailor its performance for their specific use cases.

Updated 8/7/2024

Image-to-Image

❗

FLUX.1-schnell-training-adapter

ostris

FLUX.1-schnell-training-adapter is an adapter developed by ostris that allows you to train LoRAs directly on the FLUX.1-schnell model. The FLUX.1-schnell model is a 12 billion parameter rectified flow transformer that can generate high-quality images from text descriptions in just 1-4 steps. This adapter addresses the issue that FLUX.1-schnell is a distilled model, making it impossible to train on directly. By using this adapter during training, the compression breakdown caused by training on the distilled model is avoided, allowing LoRAs to be trained more effectively. Model inputs and outputs Inputs This adapter does not have direct inputs. It is designed to be used with a training pipeline that supports it, such as ostris' ai-toolkit. Outputs This adapter does not have direct outputs. It is designed to enhance the training process of LoRAs on the FLUX.1-schnell model. Capabilities The FLUX.1-schnell-training-adapter enables more effective training of LoRAs on the FLUX.1-schnell model. This allows for better compatibility between the LoRAs and the base FLUX.1-schnell model, as well as faster sampling speeds during the training process. What can I use it for? You can use this adapter to train LoRAs that can be used with the FLUX.1-schnell model for a variety of image generation tasks. The faster sampling speeds and improved compatibility can be beneficial for applications where rapid iteration and testing is important, such as product design, concept art, or rapid prototyping. Things to try Try training LoRAs on the FLUX.1-schnell model using this adapter and compare the results to LoRAs trained on the non-distilled OpenFLUX.1 model. Observe the differences in compatibility and sampling speed to see the benefits of this adapter.

Updated 9/18/2024

Image-to-Image