osanseviero commited on
Commit
0d82822
1 Parent(s): 1c62eb5

Change license to non commercial

Browse files

As per discussion in [the paper page](https://huggingface.co/papers/2104.00650) of Webvid, one of the datasets used for training, the dataset can be used for non-commercial purposes only, primarily for research. As such, it's my (personal) belief that the license of derived work, such as this repository, should have a non-commercial license. Hence, I'm opening a Pull Request proposal to reflect that.

Files changed (1) hide show
  1. README.md +5 -4
README.md CHANGED
@@ -1,10 +1,9 @@
1
  ---
2
- license: apache-2.0
3
  pipeline_tag: text-to-video
4
  ---
5
 
6
- The original repo is [here](https://modelscope.cn/models/damo/text-to-video-synthesis/summary).
7
-
8
 
9
  We Are Hiring! (Based on Beijing / Hangzhou, China.)
10
 
@@ -18,6 +17,8 @@ This model is based on a multi-stage text-to-video generation diffusion model, w
18
 
19
  The text-to-video generation diffusion model consists of three sub-networks: text feature extraction, text feature-to-video latent space diffusion model, and video latent space to video visual space. The overall model parameters are about 1.7 billion. Support English input. The diffusion model adopts the Unet3D structure, and realizes the function of video generation through the iterative denoising process from the pure Gaussian noise video.
20
 
 
 
21
  **How to expect the model to be used and where it is applicable**
22
 
23
  This model has a wide range of applications and can reason and generate videos based on arbitrary English text descriptions.
@@ -77,4 +78,4 @@ The output mp4 file can be viewed by [VLC media player](https://www.videolan.org
77
 
78
  ## Training data
79
 
80
- The training data includes LAION5B, ImageNet, Webvid and other public datasets. Image and video filtering is performed after pre-training such as aesthetic score, watermark score, and deduplication.
 
1
  ---
2
+ license: cc-by-nc-4.0
3
  pipeline_tag: text-to-video
4
  ---
5
 
6
+ The original repo is [here](https://modelscope.cn/models/damo/text-to-video-synthesis/summary).
 
7
 
8
  We Are Hiring! (Based on Beijing / Hangzhou, China.)
9
 
 
17
 
18
  The text-to-video generation diffusion model consists of three sub-networks: text feature extraction, text feature-to-video latent space diffusion model, and video latent space to video visual space. The overall model parameters are about 1.7 billion. Support English input. The diffusion model adopts the Unet3D structure, and realizes the function of video generation through the iterative denoising process from the pure Gaussian noise video.
19
 
20
+ **This model is meant for research purposes. Please look at the [model limitations and biases](#model-limitations-and-biases) and [misuse, malicious use and excessive use](#misuse-malicious-use-and-excessive-use) sections.**
21
+
22
  **How to expect the model to be used and where it is applicable**
23
 
24
  This model has a wide range of applications and can reason and generate videos based on arbitrary English text descriptions.
 
78
 
79
  ## Training data
80
 
81
+ The training data includes [LAION5B](https://huggingface.co/datasets/laion/laion2B-en), [ImageNet](https://www.image-net.org/), [Webvid](https://m-bain.github.io/webvid-dataset/) and other public datasets. Image and video filtering is performed after pre-training such as aesthetic score, watermark score, and deduplication.