Yixin Liu Lehigh University, Kai Zhang Lehigh University, Yuan Li, Zhiling Yan Lehigh University, Chujie Gao Lehigh University, Ruoxi Chen Lehigh University, Zhengqing Yuan, Yue Huang Lehigh University, Hanchi Sun Lehigh University, Jianfeng Gao Lehigh University Microsoft Research, Lifang He Lehigh University, Lichao Sun Lehigh University, † Lichao Lehigh University (2024)
This paper provides a comprehensive review of Sora, a groundbreaking text-to-video generative AI model released by OpenAI in February 2024. It highlights Sora's development, underlying technology, applications across various industries, and the challenges and limitations it faces. Key features of Sora include its ability to generate 1-minute long videos with high visual quality based on text instructions, and its implementation of a pre-trained diffusion transformer that processes video data efficiently using latent spacetime patches. Various potential applications of Sora are discussed, including improvements in film-making, education, and marketing. Limitations such as issues with physical realism, spatial and temporal complexities, and the current restriction to 1-minute videos are also noted. The paper emphasizes the future opportunities for AI-driven video generation and Sora’s role in enhancing user creativity and productivity.
This paper employs the following methods:
The following datasets were used in this research:
The authors identified the following limitations: