SafeSora: Towards Safety Alignment of Text2Video Generation via a Human Preference Dataset

Read original: arXiv:2406.14477 - Published 6/21/2024 by Josef Dai, Tianle Chen, Xuyao Wang, Ziran Yang, Taiye Chen, Jiaming Ji, Yaodong Yang

🛸

Overview

Introduces the SafeSora dataset to promote research on aligning text-to-video generation with human values
The dataset captures human preferences in text-to-video generation tasks along two primary dimensions: helpfulness and harmlessness
Includes 14,711 unique prompts, 57,333 unique videos, and 51,691 pairs of preference annotations
Demonstrates the dataset's utility through applications like training text-video moderation models and aligning large vision models with human preferences

Plain English Explanation

The researchers have created the SafeSora dataset to help develop large vision models (LVMs) that can generate text-to-video content in a way that is helpful and harmless to humans. The dataset includes a large number of text prompts, the videos generated by different LVMs in response to those prompts, and annotations from humans on how helpful and safe the videos are.

The researchers broke down the concepts of helpfulness and harmlessness into more specific sub-categories. For example, helpfulness was divided into 4 sub-dimensions, and harmlessness was divided into 12 sub-categories. This allows for more detailed and structured feedback from the human annotators.

The researchers then demonstrate how this dataset can be used to train models that can moderate text-to-video content, as well as fine-tune the LVMs themselves to better align with human preferences for helpful and safe outputs. This lays the groundwork for future research on developing LVMs that are better at generating text-to-video content that is beneficial to humans.

Technical Explanation

The SafeSora dataset was created to promote research on aligning text-to-video generation with human values. It encompasses human preferences in text-to-video generation tasks along two primary dimensions: helpfulness and harmlessness.

To capture these preferences in more detail, the researchers subdivided helpfulness into 4 sub-dimensions and harmlessness into 12 sub-categories. This provided a structured framework for crowdworkers to annotate the dataset, which includes 14,711 unique prompts, 57,333 unique videos generated by 4 distinct LVMs, and 51,691 pairs of preference annotations.

The researchers demonstrate the utility of the SafeSora dataset through several applications. This includes training a text-video moderation model and aligning LVMs with human preference by fine-tuning a prompt augmentation module or the diffusion model. These applications highlight the dataset's potential as a foundation for text-to-video alignment research, such as human preference modeling and the development and validation of alignment algorithms.

Critical Analysis

The paper provides a comprehensive dataset and demonstrates its utility, but it also acknowledges some limitations. For example, the annotations are based on crowdsourced human preferences, which may not capture the full nuance and complexity of safety and helpfulness. Additionally, the dataset focuses on text-to-video generation, but the insights may not directly translate to other modalities like text-to-image or audio-to-video.

Further research could explore ways to refine the annotation process, potentially incorporating expert feedback or more contextual information. Expanding the dataset to cover a wider range of modalities and use cases could also help broaden its applicability and impact.

Conclusion

The SafeSora dataset represents a significant step forward in the effort to develop large vision models that can generate text-to-video content in a way that is aligned with human values. By capturing detailed human preferences and demonstrating the dataset's utility through various applications, the researchers have laid the groundwork for future research on text-to-video alignment and the development of more responsible and beneficial large vision models.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🛸

SafeSora: Towards Safety Alignment of Text2Video Generation via a Human Preference Dataset

Josef Dai, Tianle Chen, Xuyao Wang, Ziran Yang, Taiye Chen, Jiaming Ji, Yaodong Yang

To mitigate the risk of harmful outputs from large vision models (LVMs), we introduce the SafeSora dataset to promote research on aligning text-to-video generation with human values. This dataset encompasses human preferences in text-to-video generation tasks along two primary dimensions: helpfulness and harmlessness. To capture in-depth human preferences and facilitate structured reasoning by crowdworkers, we subdivide helpfulness into 4 sub-dimensions and harmlessness into 12 sub-categories, serving as the basis for pilot annotations. The SafeSora dataset includes 14,711 unique prompts, 57,333 unique videos generated by 4 distinct LVMs, and 51,691 pairs of preference annotations labeled by humans. We further demonstrate the utility of the SafeSora dataset through several applications, including training the text-video moderation model and aligning LVMs with human preference by fine-tuning a prompt augmentation module or the diffusion model. These applications highlight its potential as the foundation for text-to-video alignment research, such as human preference modeling and the development and validation of alignment algorithms.

6/21/2024

T2VSafetyBench: Evaluating the Safety of Text-to-Video Generative Models

Yibo Miao, Yifan Zhu, Yinpeng Dong, Lijia Yu, Jun Zhu, Xiao-Shan Gao

The recent development of Sora leads to a new era in text-to-video (T2V) generation. Along with this comes the rising concern about its security risks. The generated videos may contain illegal or unethical content, and there is a lack of comprehensive quantitative understanding of their safety, posing a challenge to their reliability and practical deployment. Previous evaluations primarily focus on the quality of video generation. While some evaluations of text-to-image models have considered safety, they cover fewer aspects and do not address the unique temporal risk inherent in video generation. To bridge this research gap, we introduce T2VSafetyBench, a new benchmark designed for conducting safety-critical assessments of text-to-video models. We define 12 critical aspects of video generation safety and construct a malicious prompt dataset including real-world prompts, LLM-generated prompts and jailbreak attack-based prompts. Based on our evaluation results, we draw several important findings, including: 1) no single model excels in all aspects, with different models showing various strengths; 2) the correlation between GPT-4 assessments and manual reviews is generally high; 3) there is a trade-off between the usability and safety of text-to-video generative models. This indicates that as the field of video generation rapidly advances, safety risks are set to surge, highlighting the urgency of prioritizing video safety. We hope that T2VSafetyBench can provide insights for better understanding the safety of video generation in the era of generative AI.

9/10/2024

SPA-VL: A Comprehensive Safety Preference Alignment Dataset for Vision Language Model

Yongting Zhang, Lu Chen, Guodong Zheng, Yifeng Gao, Rui Zheng, Jinlan Fu, Zhenfei Yin, Senjie Jin, Yu Qiao, Xuanjing Huang, Feng Zhao, Tao Gui, Jing Shao

The emergence of Vision Language Models (VLMs) has brought unprecedented advances in understanding multimodal information. The combination of textual and visual semantics in VLMs is highly complex and diverse, making the safety alignment of these models challenging. Furthermore, due to the limited study on the safety alignment of VLMs, there is a lack of large-scale, high-quality datasets. To address these limitations, we propose a Safety Preference Alignment dataset for Vision Language Models named SPA-VL. In terms of breadth, SPA-VL covers 6 harmfulness domains, 13 categories, and 53 subcategories, and contains 100,788 samples of the quadruple (question, image, chosen response, rejected response). In terms of depth, the responses are collected from 12 open- (e.g., QwenVL) and closed-source (e.g., Gemini) VLMs to ensure diversity. The experimental results indicate that models trained with alignment techniques on the SPA-VL dataset exhibit substantial improvements in harmlessness and helpfulness while maintaining core capabilities. SPA-VL, as a large-scale, high-quality, and diverse dataset, represents a significant milestone in ensuring that VLMs achieve both harmlessness and helpfulness. We have made our code https://github.com/EchoseChen/SPA-VL-RLHF and SPA-VL dataset url https://huggingface.co/datasets/sqrti/SPA-VL publicly available.

6/19/2024

PKU-SafeRLHF: A Safety Alignment Preference Dataset for Llama Family Models

Jiaming Ji, Donghai Hong, Borong Zhang, Boyuan Chen, Josef Dai, Boren Zheng, Tianyi Qiu, Boxun Li, Yaodong Yang

In this work, we introduce the PKU-SafeRLHF dataset, designed to promote research on safety alignment in large language models (LLMs). As a sibling project to SafeRLHF and BeaverTails, we separate annotations of helpfulness and harmlessness for question-answering pairs, providing distinct perspectives on these coupled attributes. Overall, we provide 44.6k refined prompts and 265k question-answer pairs with safety meta-labels for 19 harm categories and three severity levels ranging from minor to severe, with answers generated by Llama-family models. Based on this, we collected 166.8k preference data, including dual-preference (helpfulness and harmlessness decoupled) and single-preference data (trade-off the helpfulness and harmlessness from scratch), respectively. Using the large-scale annotation data, we further train severity-sensitive moderation for the risk control of LLMs and safety-centric RLHF algorithms for the safety alignment of LLMs. We believe this dataset will be a valuable resource for the community, aiding in the safe deployment of LLMs.

6/26/2024