Release Information
Temporary access to OpenAI's video generation model Sora (turbo) was provided by the HF repo PR-Puppet-Sora, on November 26th. After a few hours, OpenAI revoked the API key used by the repo and removed access to the generated videos. In anticipation of that event, the publicly displayed videos and their prompts were archived.
This release contains 87 archived videos (~702 MB) and 83 of their prompts, and dedicated to the public domain (CC0 1.0 Universal).
The generation parameters may be found in the app.py of the original repo here. An archive of this script is available here.
User prompts are often "augmented" (changed by some LLM) before generating videos, and this may be true for these videos as well.
The Sora backend that was used for generation was https://sora.openai.com/backend/video_gen
Contrary to claims online, the generations were not uncensored. User prompts, as well as the generated videos, passed through OpenAI's content moderation normally.
This is partly the reason why none of the videos in this archive are NSFW, or similar, despite a few brave attempts in the prompts.
It is also incorrect that "Sora leaked", since the model itself (its model parameters) had not been acquired by outsiders.
The only thing that "leaked" was previewer/beta tester access to Sora video generation, via a single HF repo - while keeping its API keys secret.
Archive versions
All videos are .mp4
, of varying resolutions, and a framerate of 30 FPS.
Not all of the videos that were generated were able to be archived, due to HF server load issues.
The prompts used for four videos are not known, and these are denoted as [unknown_n].
Hugging Face performs File Security Scans of uploaded files, and you can click on the icon next to each file to see the result of this.
sora-turbo-vids.zip
This is the original archive containing both videos and their prompts, and some users experienced encoding/compatibility issues with it.
Consider using the more recent "separated" uploads if you encounter similar issues.
The filenames in the short_prompts
directory are the full prompts used for each video generation request.
The filenames in the long_prompts
directory are shortened versions of the long prompts (above 256 chars), and their full versions are found in full_long_prompts.txt
.
videos_only.zip & videos_only.7z
These identical archives (in different compression formats) contain only the original videos, with names such as video_24.mp4
.
The video_24
part is the video ID, and the prompt used for a specific video ID is listed in the separate CSV and JSONL files (video_id, prompt).
You may easily view both those files in a text editor, and they are easy to import and process in various programming languages.
YouTube Versions
You can watch the videos from this dataset on YouTube:
Even though this is a dataset upload, I went with a model repo because a) the URL is shorter, and b) the original upload wasn't compatible with the HF dataset viewer.
~ desuAnon
PUBLIC DOMAIN: CC0 1.0 Universal
This public release of content produced by generative ML is intended for educational, artistic, and research purposes.
Sora is a pending trademark of OpenAI, Inc, and is used for descriptive purposes only.
The original videos were watermarked by OpenAI to reflect the origin of the generated content.
This work is not endorsed by, or affiliated with, OpenAI.