--- thumbnail: "https://user-images.githubusercontent.com/54370274/243292723-fa703668-a931-41e1-8bcf-19c72203980b.png" tags: - TextTovideo - Text2Video --- # Potat 1️⃣ First Open-Source 1024x576 Text To Video Model 🥳 ### Info Prototype Model
Trained with https://lambdalabs.com ❤ 1xA100 (40GB)
2197 clips, 68388 tagged frames ( [salesforce/blip2-opt-6.7b-coco](https://huggingface.co/Salesforce/blip2-opt-6.7b-coco) )
train_steps: 10000
### Dataset & Config https://huggingface.co/camenduru/potat1_dataset/tree/main ### Finetuning https://github.com/Breakthrough/PySceneDetect
https://github.com/ExponentialML/Video-BLIP2-Preprocessor
https://github.com/ExponentialML/Text-To-Video-Finetuning
https://github.com/camenduru/Text-To-Video-Finetuning-colab
### Base Model https://huggingface.co/damo-vilab/modelscope-damo-text-to-video-synthesis
https://www.modelscope.cn/models/damo/text-to-video-synthesis
Thanks to [damo-vilab](https://damo.alibaba.com/) ❤ [ExponentialML](https://github.com/ExponentialML) ❤ [kabachuha](https://github.com/kabachuha) ❤ [@DiffusersLib](https://twitter.com/DiffusersLib) ❤ [@LambdaAPI](https://twitter.com/LambdaAPI) ❤ [@cerspense](https://twitter.com/cerspense) ❤ [@CiaraRowles1](https://twitter.com/CiaraRowles1) ❤ [@p1atdev_art](https://twitter.com/p1atdev_art) ❤
Please try it 🐣
https://github.com/camenduru/text-to-video-synthesis-colab

Potat 2️⃣ is in the oven ♨