Towards Understanding Unsafe Video Generation

Read original: arXiv:2407.12581 - Published 7/18/2024 by Yan Pang, Aiping Xiong, Yang Zhang, Tianhao Wang

🤔

Overview

The paper explores the potential for video generation models (VGMs) to produce unsafe content, such as violent or terrifying videos.
The researchers choose unsafe content generation prompts, use three state-of-the-art VGMs to generate videos, and analyze the results to identify 5 unsafe video categories.
They then propose a new defense mechanism called Latent Variable Defense (LVD) to prevent the generation of unsafe videos.

Plain English Explanation

Video generation models (VGMs) are AI systems that can create high-quality videos based on text prompts. However, there is a concern that these models could be used to generate unsafe or harmful content, such as violent or disturbing videos.

To explore this issue, the researchers in this study chose a set of prompts that might lead to unsafe video generation, and used three different VGMs to create videos based on those prompts. They then analyzed the videos that were generated and identified 5 main categories of unsafe content: distorted/weird, terrifying, pornographic, violent/bloody, and political.

To help detect and prevent the generation of unsafe videos, the researchers developed a new technique called Latent Variable Defense (LVD). This method works by monitoring the internal processes of the VGM during video generation, and can accurately identify and block the creation of unsafe videos while using less computing power than previous approaches.

Technical Explanation

The researchers first confirmed the ability of VGMs to generate unsafe videos by using prompts collected from online communities like 4chan and Lexica. They used three state-of-the-art open-source VGMs to generate an initial set of 5,607 videos, which they then filtered to remove duplicates and low-quality content, resulting in 2,112 unsafe videos.

Through a process of clustering and thematic coding, the researchers identified 5 categories of unsafe video content: Distorted/Weird, Terrifying, Pornographic, Violent/Bloody, and Political. They then recruited 403 online participants to help label the videos, resulting in a dataset of 937 unsafe videos along with their corresponding prompts.

To address the challenge of preventing unsafe video generation, the researchers propose a new approach called Latent Variable Defense (LVD). Unlike previous methods that focus on filtering input prompts or output results, LVD operates within the model's internal sampling process. This allows LVD to achieve a 0.90 defense accuracy while reducing the time and computing resources required by 10x when sampling a large number of unsafe prompts, compared to other approaches.

Critical Analysis

The researchers acknowledge that their study is limited to a specific set of VGMs and prompts, and that further research is needed to understand the full scope of unsafe video generation capabilities across different models and prompts. Additionally, the dataset of unsafe videos generated in this study may not be representative of the full range of potential unsafe content.

While the Latent Variable Defense (LVD) approach shows promising results, it is still an early-stage defense mechanism that requires further testing and refinement. There may be additional ways to improve the safety and robustness of VGMs beyond just blocking the generation of unsafe content, such as through SAFE-GEN or To Generate or Not techniques.

Moreover, the ethical and societal implications of unsafe video generation, and the potential for misuse, warrant careful consideration and ongoing research. Frameworks like SAFE-CLIP may provide additional insights and approaches for ensuring the responsible development and deployment of video generation technologies.

Conclusion

This study provides a comprehensive understanding of the potential for video generation models to produce unsafe content, such as violent or terrifying videos. By identifying specific categories of unsafe video generation and proposing a new defense mechanism called Latent Variable Defense, the researchers have taken an important step towards addressing this emerging challenge.

However, further research is needed to fully understand and mitigate the risks of unsafe video generation, as well as to explore broader approaches for ensuring the responsible development and use of these powerful technologies. Continued collaboration between researchers, policymakers, and the public will be crucial in shaping the future of video generation in a way that promotes societal wellbeing.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🤔

Towards Understanding Unsafe Video Generation

Yan Pang, Aiping Xiong, Yang Zhang, Tianhao Wang

Video generation models (VGMs) have demonstrated the capability to synthesize high-quality output. It is important to understand their potential to produce unsafe content, such as violent or terrifying videos. In this work, we provide a comprehensive understanding of unsafe video generation. First, to confirm the possibility that these models could indeed generate unsafe videos, we choose unsafe content generation prompts collected from 4chan and Lexica, and three open-source SOTA VGMs to generate unsafe videos. After filtering out duplicates and poorly generated content, we created an initial set of 2112 unsafe videos from an original pool of 5607 videos. Through clustering and thematic coding analysis of these generated videos, we identify 5 unsafe video categories: Distorted/Weird, Terrifying, Pornographic, Violent/Bloody, and Political. With IRB approval, we then recruit online participants to help label the generated videos. Based on the annotations submitted by 403 participants, we identified 937 unsafe videos from the initial video set. With the labeled information and the corresponding prompts, we created the first dataset of unsafe videos generated by VGMs. We then study possible defense mechanisms to prevent the generation of unsafe videos. Existing defense methods in image generation focus on filtering either input prompt or output results. We propose a new approach called Latent Variable Defense (LVD), which works within the model's internal sampling process. LVD can achieve 0.90 defense accuracy while reducing time and computing resources by 10x when sampling a large number of unsafe prompts.

7/18/2024

🖼️

Moderating Illicit Online Image Promotion for Unsafe User-Generated Content Games Using Large Vision-Language Models

Keyan Guo, Ayush Utkarsh, Wenbo Ding, Isabelle Ondracek, Ziming Zhao, Guo Freeman, Nishant Vishwamitra, Hongxin Hu

Online user generated content games (UGCGs) are increasingly popular among children and adolescents for social interaction and more creative online entertainment. However, they pose a heightened risk of exposure to explicit content, raising growing concerns for the online safety of children and adolescents. Despite these concerns, few studies have addressed the issue of illicit image-based promotions of unsafe UGCGs on social media, which can inadvertently attract young users. This challenge arises from the difficulty of obtaining comprehensive training data for UGCG images and the unique nature of these images, which differ from traditional unsafe content. In this work, we take the first step towards studying the threat of illicit promotions of unsafe UGCGs. We collect a real-world dataset comprising 2,924 images that display diverse sexually explicit and violent content used to promote UGCGs by their game creators. Our in-depth studies reveal a new understanding of this problem and the urgent need for automatically flagging illicit UGCG promotions. We additionally create a cutting-edge system, UGCG-Guard, designed to aid social media platforms in effectively identifying images used for illicit UGCG promotions. This system leverages recently introduced large vision-language models (VLMs) and employs a novel conditional prompting strategy for zero-shot domain adaptation, along with chain-of-thought (CoT) reasoning for contextual identification. UGCG-Guard achieves outstanding results, with an accuracy rate of 94% in detecting these images used for the illicit promotion of such games in real-world scenarios.

8/13/2024

T2VSafetyBench: Evaluating the Safety of Text-to-Video Generative Models

Yibo Miao, Yifan Zhu, Yinpeng Dong, Lijia Yu, Jun Zhu, Xiao-Shan Gao

The recent development of Sora leads to a new era in text-to-video (T2V) generation. Along with this comes the rising concern about its security risks. The generated videos may contain illegal or unethical content, and there is a lack of comprehensive quantitative understanding of their safety, posing a challenge to their reliability and practical deployment. Previous evaluations primarily focus on the quality of video generation. While some evaluations of text-to-image models have considered safety, they cover fewer aspects and do not address the unique temporal risk inherent in video generation. To bridge this research gap, we introduce T2VSafetyBench, a new benchmark designed for conducting safety-critical assessments of text-to-video models. We define 12 critical aspects of video generation safety and construct a malicious prompt dataset including real-world prompts, LLM-generated prompts and jailbreak attack-based prompts. Based on our evaluation results, we draw several important findings, including: 1) no single model excels in all aspects, with different models showing various strengths; 2) the correlation between GPT-4 assessments and manual reviews is generally high; 3) there is a trade-off between the usability and safety of text-to-video generative models. This indicates that as the field of video generation rapidly advances, safety risks are set to surge, highlighting the urgency of prioritizing video safety. We hope that T2VSafetyBench can provide insights for better understanding the safety of video generation in the era of generative AI.

9/10/2024

SafeGen: Mitigating Unsafe Content Generation in Text-to-Image Models

Xinfeng Li, Yuchen Yang, Jiangyi Deng, Chen Yan, Yanjiao Chen, Xiaoyu Ji, Wenyuan Xu

Text-to-image (T2I) models, such as Stable Diffusion, have exhibited remarkable performance in generating high-quality images from text descriptions in recent years. However, text-to-image models may be tricked into generating not-safe-for-work (NSFW) content, particularly in sexually explicit scenarios. Existing countermeasures mostly focus on filtering inappropriate inputs and outputs, or suppressing improper text embeddings, which can block sexually explicit content (e.g., naked) but may still be vulnerable to adversarial prompts -- inputs that appear innocent but are ill-intended. In this paper, we present SafeGen, a framework to mitigate sexual content generation by text-to-image models in a text-agnostic manner. The key idea is to eliminate explicit visual representations from the model regardless of the text input. In this way, the text-to-image model is resistant to adversarial prompts since such unsafe visual representations are obstructed from within. Extensive experiments conducted on four datasets and large-scale user studies demonstrate SafeGen's effectiveness in mitigating sexually explicit content generation while preserving the high-fidelity of benign images. SafeGen outperforms eight state-of-the-art baseline methods and achieves 99.4% sexual content removal performance. Furthermore, our constructed benchmark of adversarial prompts provides a basis for future development and evaluation of anti-NSFW-generation methods.

9/17/2024