Factors Influencing User Willingness To Use SORA

2405.03986

Published 5/8/2024 by Gustave Florentin Nkoulou Mvondo, Ben Niu

🌿

Abstract

Sora promises to redefine the way visual content is created. Despite its numerous forecasted benefits, the drivers of user willingness to use the text-to-video (T2V) model are unknown. This study extends the extended unified theory of acceptance and use of technology (UTAUT2) with perceived realism and novelty value. Using a purposive sampling method, we collected data from 940 respondents in the US and analyzed the sample using covariance-based structural equation modeling and fuzzy set qualitative comparative analysis (fsQCA). The findings reveal that all hypothesized relationships are supported, with perceived realism emerging as the most influential driver, followed by novelty value. Moreover, fsQCA identifies five configurations leading to high and low willingness to use, and the model demonstrates high predictive validity, contributing to theory advancement. Our study provides valuable insights for developers and marketers, offering guidance for strategic decisions to promote the widespread adoption of T2V models.

Get summaries of the top AI research delivered straight to your inbox:

Overview

The study examines the factors that influence user willingness to use a text-to-video (T2V) model, which aims to redefine visual content creation.
It extends the UTAUT2 framework by incorporating perceived realism and novelty value.
The researchers collected data from 940 respondents in the US and analyzed it using structural equation modeling and fuzzy set qualitative comparative analysis.

Plain English Explanation

The study investigates what motivates people to use a new technology called "text-to-video" (T2V) that can automatically generate video content from text. T2V has the potential to dramatically change how videos are created, but the researchers wanted to understand what factors would make people actually want to use this new tool.

To do this, the researchers built on a well-known framework called UTAUT2 that looks at things like how easy a technology is to use and how much it improves someone's life. They also added two new factors - how "realistic" the T2V videos seem, and how novel or unique the technology is perceived to be.

The researchers surveyed nearly 1,000 people in the US and analyzed the results using advanced statistical techniques. They found that both the perceived realism of the T2V videos and the novelty of the technology were key drivers of people's willingness to use it. In fact, realism was the most important factor of all.

The researchers also identified several different "paths" or combinations of factors that could lead to high or low willingness to use T2V. This gives developers and marketers insights on how to promote and position the technology to encourage widespread adoption.

Technical Explanation

The study used the extended unified theory of acceptance and use of technology (UTAUT2) as a starting point, adding the constructs of perceived realism and novelty value to explore their influence on user willingness to use T2V models.

Data was collected from 940 respondents in the US using a purposive sampling method. Covariance-based structural equation modeling was employed to test the hypothesized relationships, while fuzzy set qualitative comparative analysis (fsQCA) was used to identify different configurations of factors leading to high and low willingness to use.

The results showed that all the hypothesized relationships were supported, with perceived realism emerging as the most influential driver, followed by novelty value. The fsQCA analysis revealed five distinct configurations that could result in high or low willingness to use T2V, demonstrating the model's high predictive validity.

Critical Analysis

The study provides valuable insights into the factors that drive user acceptance of T2V technology, which can help guide development and marketing strategies to promote wider adoption. However, the research is limited to the US context, and the generalizability to other cultural settings is unclear.

Additionally, the study focuses on user perceptions and intentions, but does not examine actual usage behavior over time. Further research is needed to understand how these factors play out in real-world usage scenarios.

The researchers also acknowledge that their model does not capture all the potential drivers of T2V acceptance, and there may be other important factors not considered, such as social and cultural influences or the need for transparent and explainable AI systems.

Conclusion

This study takes an important step in understanding the key factors that shape user willingness to adopt T2V technology, which has the potential to revolutionize visual content creation. The findings highlight the critical roles of perceived realism and novelty value, providing valuable guidance for developers and marketers to foster widespread acceptance of this innovative technology.

As T2V models continue to advance, further research will be needed to address the limitations of this study and explore the longer-term behavioral and societal implications of this transformative technology.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models

Yixin Liu, Kai Zhang, Yuan Li, Zhiling Yan, Chujie Gao, Ruoxi Chen, Zhengqing Yuan, Yue Huang, Hanchi Sun, Jianfeng Gao, Lifang He, Lichao Sun

Sora is a text-to-video generative AI model, released by OpenAI in February 2024. The model is trained to generate videos of realistic or imaginative scenes from text instructions and show potential in simulating the physical world. Based on public technical reports and reverse engineering, this paper presents a comprehensive review of the model's background, related technologies, applications, remaining challenges, and future directions of text-to-video AI models. We first trace Sora's development and investigate the underlying technologies used to build this world simulator. Then, we describe in detail the applications and potential impact of Sora in multiple industries ranging from film-making and education to marketing. We discuss the main challenges and limitations that need to be addressed to widely deploy Sora, such as ensuring safe and unbiased video generation. Lastly, we discuss the future development of Sora and video generation models in general, and how advancements in the field could enable new ways of human-AI interaction, boosting productivity and creativity of video generation.

4/19/2024

cs.CV cs.AI cs.LG

✨

Gauging Public Acceptance of Conditionally Automated Vehicles in the United States

Antonios Saravanos (New York University), Eleftheria K. Pissadaki (New York University), Wayne S. Singh (New York University), Donatella Delfino (New York University)

Public acceptance of conditionally automated vehicles is a crucial step in the realization of smart cities. Prior research in Europe has shown that the factors of hedonic motivation, social influence, and performance expectancy, in decreasing order of importance, influence acceptance. Moreover, a generally positive acceptance of the technology was reported. However, there is a lack of information regarding the public acceptance of conditionally automated vehicles in the United States. In this study, we carried out a web-based experiment where participants were provided information regarding the technology and then completed a questionnaire on their perceptions. The collected data was analyzed using PLS-SEM to examine the factors that may lead to public acceptance of the technology in the United States. Our findings showed that social influence, performance expectancy, effort expectancy, hedonic motivation, and facilitating conditions determine conditionally automated vehicle acceptance. Additionally, certain factors were found to influence the perception of how useful the technology is, the effort required to use it, and the facilitating conditions for its use. By integrating the insights gained from this study, stakeholders can better facilitate the adoption of autonomous vehicle technology, contributing to safer, more efficient, and user-friendly transportation systems in the future that help realize the vision of the smart city.

4/15/2024

cs.CY cs.AI

New!From Sora What We Can See: A Survey of Text-to-Video Generation

Rui Sun, Yumin Zhang, Tejal Shah, Jiahao Sun, Shuoying Zhang, Wenqi Li, Haoran Duan, Bo Wei, Rajiv Ranjan

With impressive achievements made, artificial intelligence is on the path forward to artificial general intelligence. Sora, developed by OpenAI, which is capable of minute-level world-simulative abilities can be considered as a milestone on this developmental path. However, despite its notable successes, Sora still encounters various obstacles that need to be resolved. In this survey, we embark from the perspective of disassembling Sora in text-to-video generation, and conducting a comprehensive review of literature, trying to answer the question, textit{From Sora What We Can See}. Specifically, after basic preliminaries regarding the general algorithms are introduced, the literature is categorized from three mutually perpendicular dimensions: evolutionary generators, excellent pursuit, and realistic panorama. Subsequently, the widely used datasets and metrics are organized in detail. Last but more importantly, we identify several challenges and open problems in this domain and propose potential future directions for research and development.

5/20/2024

cs.CV cs.AI

Sora Detector: A Unified Hallucination Detection for Large Text-to-Video Models

Zhixuan Chu, Lei Zhang, Yichen Sun, Siqiao Xue, Zhibo Wang, Zhan Qin, Kui Ren

The rapid advancement in text-to-video (T2V) generative models has enabled the synthesis of high-fidelity video content guided by textual descriptions. Despite this significant progress, these models are often susceptible to hallucination, generating contents that contradict the input text, which poses a challenge to their reliability and practical deployment. To address this critical issue, we introduce the SoraDetector, a novel unified framework designed to detect hallucinations across diverse large T2V models, including the cutting-edge Sora model. Our framework is built upon a comprehensive analysis of hallucination phenomena, categorizing them based on their manifestation in the video content. Leveraging the state-of-the-art keyframe extraction techniques and multimodal large language models, SoraDetector first evaluates the consistency between extracted video content summary and textual prompts, then constructs static and dynamic knowledge graphs (KGs) from frames to detect hallucination both in single frames and across frames. Sora Detector provides a robust and quantifiable measure of consistency, static and dynamic hallucination. In addition, we have developed the Sora Detector Agent to automate the hallucination detection process and generate a complete video quality report for each input video. Lastly, we present a novel meta-evaluation benchmark, T2VHaluBench, meticulously crafted to facilitate the evaluation of advancements in T2V hallucination detection. Through extensive experiments on videos generated by Sora and other large T2V models, we demonstrate the efficacy of our approach in accurately detecting hallucinations. The code and dataset can be accessed via GitHub.

5/8/2024

cs.LG