propainter

Maintainer: jd7h

Last updated 9/20/2024

Property	Value
Run this model	Run on Replicate
API spec	View on Replicate
Github link	View on Github
Paper link	View on Arxiv

Create account to get full access

Model overview

ProPainter is an AI model developed by researchers at the S-Lab of Nanyang Technological University for object removal, video completion, and video outpainting. The model builds upon prior work on video inpainting like xmem-propainter-inpainting and object-removal, with improvements to the propagation and transformer components. ProPainter can be used to seamlessly fill in missing regions in videos, remove unwanted objects, and even extend video frames beyond their original boundaries.

Model inputs and outputs

ProPainter takes in a video file and an optional mask file as inputs. The mask can be a static image or a video, and it specifies the regions to be inpainted or outpainted. The model outputs a completed or extended video, addressing the specified missing or unwanted regions.

Inputs

Video: The input video file to be processed.
Mask: An optional mask file (image or video) indicating the regions to be inpainted or outpainted.

Outputs

Completed/Extended Video: The output video with the specified regions filled in or extended.

Capabilities

ProPainter excels at both object removal and video completion tasks. For object removal, the model can seamlessly remove unwanted objects from a video while preserving the surrounding context. For video completion, ProPainter can fill in missing regions caused by occlusions or artifacts, generating plausible content that blends seamlessly with the original video.

What can I use it for?

The ProPainter model can be useful for a variety of video editing and post-production tasks. For example, you could use it to remove unwanted objects or logos from videos, fill in missing regions caused by camera obstructions, or even extend the boundaries of a video to create new content. These capabilities make ProPainter a valuable tool for filmmakers, video editors, and content creators who need to enhance the quality and appearance of their video footage.

Things to try

One interesting aspect of ProPainter is its ability to perform video outpainting, where the model can extend the video frames beyond their original boundaries. This could be useful for creating cinematic video expansions or generating new content to fit specific aspect ratios or dimensions. Additionally, the model's memory-efficient inference features, such as adjustable neighbor length and reference stride, make it possible to process longer videos without running into GPU memory constraints.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Models

xmem-propainter-inpainting

jd7h

The xmem-propainter-inpainting model is a generative AI pipeline that combines two models - XMem, a model for video object segmentation, and ProPainter, a model for video inpainting. This pipeline allows for easy video inpainting by using XMem to generate a video mask from a source video and an annotated first frame, and then using ProPainter to fill the masked areas with inpainting. The model is similar to other inpainting models like GFPGAN, Stable Diffusion Inpainting, LaMa, SDXL Outpainting, and SDXL Inpainting, which all aim to fill in or remove elements from images and videos. Model inputs and outputs The xmem-propainter-inpainting model takes a source video and a segmentation mask for the first frame of that video as inputs. The mask should outline the object(s) that you want to remove or inpaint. The model then generates a video mask using XMem and uses that mask for inpainting with ProPainter, resulting in an output video with the masked areas filled in. Inputs Video**: The source video for object segmentation. Mask**: A segmentation mask for the first frame of the video, outlining the object(s) to be inpainted. Mask Dilation**: An optional parameter to add an extra border around the mask in pixels. Fp16**: A boolean flag to use half-precision (fp16) processing for faster results. Return Intermediate Outputs**: A boolean flag to return the intermediate processing results. Outputs An array of URIs pointing to the output video(s) with the inpainted areas. Capabilities The xmem-propainter-inpainting model can perform video inpainting by leveraging the capabilities of the XMem and ProPainter models. XMem is able to generate a video mask from a source video and an annotated first frame, and ProPainter can then use that mask to fill in the masked areas with inpainting. This allows for easy video editing and object removal, making it useful for tasks like removing unwanted elements from videos, fixing damaged or occluded areas, or creating special effects. What can I use it for? The xmem-propainter-inpainting model can be useful for a variety of video editing and post-production tasks. For example, you could use it to remove unwanted objects or people from a video, fix damaged or occluded areas, or create special effects like object removal or replacement. The model's ability to work with video data makes it well-suited for tasks like video cleanup, VFX, and content creation. Potential use cases include film and TV production, social media content creation, and video tutorials or presentations. Things to try One interesting thing to try with the xmem-propainter-inpainting model is using it to remove dynamic objects from a video, such as moving people or animals. By annotating the first frame to mask these objects, the model can then generate a video mask that tracks their movement and inpaint the areas they occupied. This could be useful for creating clean background plates or isolating specific elements in a video. You can also experiment with different mask dilation and fp16 settings to find the optimal balance of quality and processing speed for your needs.

Updated Invalid Date

Video-to-Video

repaint

cjwbw

repaint is an AI model for inpainting, or filling in missing parts of an image, using denoising diffusion probabilistic models. It was developed by cjwbw, who has created several other notable AI models like stable-diffusion-v2-inpainting, analog-diffusion, and pastel-mix. The repaint model can fill in missing regions of an image while keeping the known parts harmonized, and can handle a variety of mask shapes and sizes, including extreme cases like every other line or large upscaling. Model inputs and outputs The repaint model takes in an input image, a mask indicating which regions are missing, and a model to use (e.g. CelebA-HQ, ImageNet, Places2). It then generates a new image with the missing regions filled in, while maintaining the integrity of the known parts. The user can also adjust the number of inference steps to control the speed vs. quality tradeoff. Inputs Image**: The input image, which is expected to be aligned for facial images. Mask**: The type of mask to apply to the image, such as random strokes, half the image, or a sparse pattern. Model**: The pre-trained model to use for inpainting, based on the content of the input image. Steps**: The number of denoising steps to perform, which affects the speed and quality of the output. Outputs Mask**: The mask used to generate the output image. Masked Image**: The input image with the mask applied. Inpaint**: The final output image with the missing regions filled in. Capabilities The repaint model can handle a wide variety of inpainting tasks, from filling in random strokes or half an image, to more extreme cases like upscaling an image or inpainting every other line. It is able to generate meaningful and harmonious fillings, incorporating details like expressions, features, and logos into the missing regions. The model outperforms state-of-the-art autoregressive and GAN-based inpainting methods in user studies across multiple datasets and mask types. What can I use it for? The repaint model could be useful for a variety of image editing and content creation tasks, such as: Repairing damaged or corrupted images Removing unwanted elements from photos (e.g. power lines, obstructions) Generating new image content to expand or modify existing images Upscaling low-resolution images while maintaining visual coherence By leveraging the power of denoising diffusion models, repaint can produce high-quality, realistic inpaintings that seamlessly blend with the known parts of the image. Things to try One interesting aspect of the repaint model is its ability to handle extreme inpainting cases, such as filling in every other line of an image or upscaling with a large mask. These challenging scenarios can showcase the model's strengths in generating coherent and meaningful fillings, even when faced with a significant amount of missing information. Another intriguing possibility is to experiment with the number of denoising steps, as this allows the user to balance the speed and quality of the inpainting. Reducing the number of steps can lead to faster inference, but may result in less harmonious fillings, while increasing the steps can improve the visual quality at the cost of longer processing times. Overall, the repaint model represents a powerful tool for image inpainting and manipulation, with the potential to unlock new creative possibilities for artists, designers, and content creators.

Updated Invalid Date

Image-to-Image

deoldify_video

arielreplicate

The deoldify_video model is a deep learning-based video colorization model developed by Ariel Replicate, the maintainer of this project. It builds upon the open-source DeOldify project, which aims to colorize and restore old images and film footage. The deoldify_video model is specifically optimized for stable, consistent, and flicker-free video colorization. The deoldify_video model is one of three DeOldify models available, along with the "artistic" and "stable" image colorization models. Each model has its own strengths and use cases - the video model prioritizes stability and consistency over maximum vibrance, making it well-suited for colorizing old film footage. Model inputs and outputs Inputs input_video**: The path to a video file to be colorized. render_factor**: An integer that determines the resolution at which the color portion of the image is rendered. Lower values will render faster but may result in less detailed colorization, while higher values can produce more vibrant colors but take longer to process. Outputs Output**: The path to the colorized video output. Capabilities The deoldify_video model is capable of adding realistic color to old black-and-white video footage while maintaining a high degree of stability and consistency. Unlike the previous version of DeOldify, this model is able to produce colorized videos with minimal flickering or artifacts, making it well-suited for processing historical footage. The model has been trained using a novel "NoGAN" technique, which combines the benefits of Generative Adversarial Network (GAN) training with more conventional methods to achieve high-quality results efficiently. This approach helps to eliminate many of the common issues associated with GAN-based colorization, such as inconsistent coloration and visual artifacts. What can I use it for? The deoldify_video model can be used to breathe new life into old black-and-white films and footage, making them more engaging and accessible to modern audiences. This could be particularly useful for historical documentaries, educational materials, or personal archival projects. By colorizing old video, the deoldify_video model can help preserve and showcase cultural heritage, enabling viewers to better connect with the people and events depicted. The consistent and stable colorization results make it suitable for professional-quality video productions. Things to try One interesting aspect of the DeOldify project is the way the models seem to arrive at consistent colorization decisions, even for seemingly arbitrary details like clothing and special effects. This suggests the models are learning underlying rules about how to colorize based on subtle cues in the black-and-white footage. When using the deoldify_video model, you can experiment with adjusting the render_factor parameter to find the sweet spot between speed and quality for your particular use case. Higher render factors can produce more detailed and vibrant results, but may take longer to process. Additionally, the maintainer notes that using a ResNet101 backbone for the generator network, rather than the smaller ResNet34, can help improve the consistency of skin tones and other key details in the colorized output.

Updated Invalid Date

Video-to-Image

test

anhappdev

The test model is an image inpainting AI, which means it can fill in missing or damaged parts of an image based on the surrounding context. This is similar to other inpainting models like controlnet-inpaint-test, realisitic-vision-v3-inpainting, ad-inpaint, inpainting-xl, and xmem-propainter-inpainting. These models can be used to remove unwanted elements from images or fill in missing parts to create a more complete and cohesive image. Model inputs and outputs The test model takes in an image, a mask for the area to be inpainted, and a text prompt to guide the inpainting process. It outputs one or more inpainted images based on the input. Inputs Image**: The image which will be inpainted. Parts of the image will be masked out with the mask_image and repainted according to the prompt. Mask Image**: A black and white image to use as a mask for inpainting over the image provided. White pixels in the mask will be repainted, while black pixels will be preserved. Prompt**: The text prompt to guide the image generation. You can use ++ to emphasize and -- to de-emphasize parts of the sentence. Negative Prompt**: Specify things you don't want to see in the output. Num Outputs**: The number of images to output. Higher numbers may cause out-of-memory errors. Guidance Scale**: The scale for classifier-free guidance, which affects the strength of the text prompt. Num Inference Steps**: The number of denoising steps. More steps usually lead to higher quality but slower inference. Seed**: The random seed. Leave blank to randomize. Preview Input Image**: Include the input image with the mask overlay in the output. Outputs An array of one or more inpainted images. Capabilities The test model can be used to remove unwanted elements from images or fill in missing parts based on the surrounding context and a text prompt. This can be useful for tasks like object removal, background replacement, image restoration, and creative image generation. What can I use it for? You can use the test model to enhance or modify existing images in all kinds of creative ways. For example, you could remove unwanted distractions from a photo, replace a boring background with a more interesting one, or add fantastical elements to an image based on a creative prompt. The model's inpainting capabilities make it a versatile tool for digital artists, photographers, and anyone looking to get creative with their images. Things to try Try experimenting with different prompts and mask patterns to see how the model responds. You can also try varying the guidance scale and number of inference steps to find the right balance of speed and quality. Additionally, you could try using the preview_input_image option to see how the model is interpreting the mask and input image.

Updated Invalid Date

Image-to-Image