WenhaoWang
commited on
Commit
•
98c3828
1
Parent(s):
c8b1297
Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,88 @@
|
|
1 |
---
|
2 |
license: cc-by-nc-4.0
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
license: cc-by-nc-4.0
|
3 |
+
datasets:
|
4 |
+
- WenhaoWang/VidProM
|
5 |
+
language:
|
6 |
+
- en
|
7 |
+
pipeline_tag: text-generation
|
8 |
+
tags:
|
9 |
+
- text-to-video generation
|
10 |
+
- VidProM
|
11 |
+
- Automatical text-to-video prompt
|
12 |
---
|
13 |
+
|
14 |
+
|
15 |
+
# The first model for automatic text-to-video prompt completion: Given a few words as input, the model will generate a few whole text-to-video prompts.
|
16 |
+
|
17 |
+
# Details
|
18 |
+
|
19 |
+
It is fine-tuned on the [VidProM](https://huggingface.co/datasets/WenhaoWang/VidProM) dataset using [Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) and 8 A100 80G GPUs.
|
20 |
+
|
21 |
+
# Usage
|
22 |
+
|
23 |
+
## Download the model
|
24 |
+
```
|
25 |
+
from transformers import pipeline
|
26 |
+
pipe = pipeline("text-generation", model="WenhaoWang/Meta-Llama-3-8B-AutoT2VPrompt")
|
27 |
+
```
|
28 |
+
|
29 |
+
## Set the Parameters
|
30 |
+
```
|
31 |
+
input = "An underwater world" # The input text to generate text-to-video prompt.
|
32 |
+
max_length = 50 # The maximum length of the generated text.
|
33 |
+
temperature = 1.2 # Controls the randomness of the generation. Higher values lead to more random outputs.
|
34 |
+
top_k = 8 # Limits the number of words considered at each step to the top k most likely words.
|
35 |
+
num_return_sequences = 10 # The number of different text-to-video prompts to generate from the same input.
|
36 |
+
```
|
37 |
+
|
38 |
+
## Generation
|
39 |
+
```
|
40 |
+
all_prompts = pipe(input, max_length = max_length, do_sample = True, temperature = temperature, top_k = top_k, num_return_sequences=num_return_sequences)
|
41 |
+
|
42 |
+
def process(text):
|
43 |
+
text = text.replace('\n', '.')
|
44 |
+
text = text.replace(' .', '.')
|
45 |
+
text = text[:text.rfind('.')]
|
46 |
+
text = text + '.'
|
47 |
+
return text
|
48 |
+
|
49 |
+
for i in range(num_return_sequences):
|
50 |
+
print(process(all_prompts[i]['generated_text']))
|
51 |
+
```
|
52 |
+
|
53 |
+
You will get 10 text-to-video prompts, and you can pick one you like most.
|
54 |
+
|
55 |
+
```
|
56 |
+
An underwater world, 25 ye boy, with aqua-green eyes, dk sandy blond hair, from the back, and on his back a fish, 23 ye old, weing glasses,ctoon chacte.
|
57 |
+
An underwater world, the video should capture the essence of tranquility and the beauty of nature.. a woman with short hair weing a green dress sitting at the desk.
|
58 |
+
An underwater world, the ocean is full of discded items, the water flows, and the light penetrating through the water.
|
59 |
+
An underwater world.. a woman with red eyes and red lips is looking forwd.
|
60 |
+
An underwater world.. an old man sitting in a chair, smoking a pipe, a little smoke coming out of the chair, a man is drinking a glass.
|
61 |
+
An underwater world. The ocean is filled with bioluminess as the water reflects a soft glow from a bioluminescent phosphorescent light source. The camera slowly moves away and zooms in..
|
62 |
+
An underwater world. the girl looks at the camera and smiles with happiness..
|
63 |
+
An underwater world, 1960s horror film..
|
64 |
+
An underwater world.. 4 men in 1940s style clothes walk ound a gothic castle. night, fe. A girl is running, and there e some flowers along the river.
|
65 |
+
An underwater world, -camera pan up . A girl is playing with her cat on a sunny day in the pk. A man is running and then falling down and dying.
|
66 |
+
```
|
67 |
+
|
68 |
+
# License
|
69 |
+
|
70 |
+
The model is licensed under the [CC BY-NC 4.0 license](https://creativecommons.org/licenses/by-nc/4.0/deed.en).
|
71 |
+
|
72 |
+
# Citation
|
73 |
+
```
|
74 |
+
@article{wang2024vidprom,
|
75 |
+
title={VidProM: A Million-scale Real Prompt-Gallery Dataset for Text-to-Video Diffusion Models},
|
76 |
+
author={Wang, Wenhao and Yang, Yi},
|
77 |
+
journal={arXiv preprint arXiv:2403.06098},
|
78 |
+
year={2024}
|
79 |
+
}
|
80 |
+
```
|
81 |
+
|
82 |
+
# Acknowledgment
|
83 |
+
|
84 |
+
The fine-tuning process is helped by [Yaowei Zheng](https://github.com/hiyouga).
|
85 |
+
|
86 |
+
# Contact
|
87 |
+
|
88 |
+
If you have any questions, feel free to contact [Wenhao Wang](https://wangwenhao0716.github.io) ([email protected]).
|