File size: 3,683 Bytes
7d1049e
 
58303c5
 
 
 
2cdebfd
df7c506
 
 
 
7d1049e
965be9e
5c8879d
6d21ded
 
 
f360e5a
58303c5
 
 
744c751
 
 
62ed335
 
744c751
 
 
 
397957b
cf3ca94
 
 
 
744c751
 
 
 
 
 
 
 
 
 
 
 
 
28bae25
1fd99a2
744c751
 
14741a8
 
1fd99a2
397957b
 
 
 
 
 
 
 
 
 
965be9e
 
 
 
 
 
 
0279224
965be9e
 
ae471b0
965be9e
 
 
0279224
 
 
 
 
965be9e
 
 
0032fbe
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
---
license: cc-by-nc-4.0
datasets:
- WenhaoWang/VidProM
language:
- en
pipeline_tag: text-generation
tags:
- text-to-video generation
- VidProM
- Automatical text-to-video prompt
---

# The first model for automatic text-to-video prompt completion: Given a few words as input, the model will generate a few whole text-to-video prompts.

# Details

It is fine-tuned on the [VidProM](https://huggingface.co/datasets/WenhaoWang/VidProM) dataset using [Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) and 8 A100 GPUs.

# Usage

## Download the model
```
from transformers import pipeline
import torch
pipe = pipeline("text-generation", model="WenhaoWang/AutoT2VPrompt", model_kwargs={"torch_dtype": torch.bfloat16}, device_map="cuda:0")
```

## Set the Parameters
```
input = "An underwater world"      # The input text to generate text-to-video prompt.
max_length = 50                    # The maximum length of the generated text.
temperature = 1.2                  # Controls the randomness of the generation. Higher values lead to more random outputs.
top_k = 8                          # Limits the number of words considered at each step to the top k most likely words.
num_return_sequences = 10          # The number of different text-to-video prompts to generate from the same input.
```

## Generation
```
all_prompts = pipe(input, max_length = max_length, do_sample = True, temperature = temperature, top_k = top_k, num_return_sequences=num_return_sequences)

def process(text):
    text = text.replace('\n', '.')
    text = text.replace('  .', '.')
    text = text[:text.rfind('.')]
    text = text + '.'
    return text

for i in range(num_return_sequences):
    print(process(all_prompts[i]['generated_text']))
```

You will get 10 text-to-video prompts, and you can pick one you like most.

```
An underwater world, 25 ye boy, with aqua-green eyes, dk sandy blond hair, from the back, and on his back a fish, 23 ye old, weing glasses,ctoon chacte.
An underwater world, the video should capture the essence of tranquility and the beauty of nature.. a woman with short hair weing a green dress sitting at the desk.
An underwater world, the ocean is full of discded items, the water flows, and the light penetrating through the water.
An underwater world.. a woman with red eyes and red lips  is looking forwd.
An underwater world.. an old man sitting in a chair, smoking a pipe, a little smoke coming out of the chair, a man is drinking a glass.
An underwater world. The ocean is filled with bioluminess as the water reflects a soft glow from a bioluminescent phosphorescent light source. The camera slowly moves away and zooms in..
An underwater world. the girl looks at the camera and smiles with happiness..
An underwater world, 1960s horror film..
An underwater world.. 4 men in 1940s style clothes walk ound a gothic castle. night, fe. A girl is running, and there e some flowers along the river.
An underwater world,  -camera pan up . A girl is playing with her cat on a sunny day in the pk. A man is running and then falling down and dying.
```

# License

The model is licensed under the [CC BY-NC 4.0 license](https://creativecommons.org/licenses/by-nc/4.0/deed.en).

# Citation
```
@article{wang2024vidprom,
  title={VidProM: A Million-scale Real Prompt-Gallery Dataset for Text-to-Video Diffusion Models},
  author={Wang, Wenhao and Yang, Yi},
  journal={arXiv preprint arXiv:2403.06098},
  year={2024}
}
```

# Acknowledgment

The fine-tuning process is helped by [Yaowei Zheng](https://github.com/hiyouga).

# Contact

If you have any questions, feel free to contact [Wenhao Wang](https://wangwenhao0716.github.io) ([email protected]).