Unconditional Image Generation
Fairseq
zyx123 commited on
Commit
5c4c40a
β€’
1 Parent(s): 0e1109c

update readme

Browse files
Files changed (1) hide show
  1. README.md +18 -24
README.md CHANGED
@@ -81,7 +81,7 @@ We present **DiGIT**, an auto-regressive generative model performing next-token
81
  | MIM | MaskGIT | 227M | 300 | 6.18 | 182.1 |
82
  | MIM | **DiGIT (+MaskGIT)** | 219M | 200 | **4.62** | **146.19** |
83
  | AR | VQGAN | 227M | 300 | 18.65 | 80.4 |
84
- | AR | **DiGIT (+VQGAN)** | 219M | 200 | **4.79** | **142.87** |
85
  | AR | **DiGIT (+VQGAN)** | 732M | 200 | **3.39** | **205.96** |
86
 
87
  *: VAR is trained with classifier-free guidance while all the other models are not.
@@ -93,25 +93,24 @@ The K-Means npy file and model checkpoints can be downloaded from:
93
  | Model | Link |
94
  |:----------:|:-----:|
95
  | HF weightsπŸ€— | [Huggingface](https://huggingface.co/DAMO-NLP-SG/DiGIT) |
96
- | Google Drive | [Google Drive](https://drive.google.com/drive/folders/1QWc51HhnZ2G4xI7TkKRanaqXuo8WxUSI?usp=share_link) |
97
 
98
  For the base model we use [DINOv2-base](https://dl.fbaipublicfiles.com/dinov2/dinov2_vitb14/dinov2_vitb14_reg4_pretrain.pth) and [DINOv2-large](https://dl.fbaipublicfiles.com/dinov2/dinov2_vitl14/dinov2_vitl14_reg4_pretrain.pth) for large size model. The VQGAN we use is the same as [MAGE](https://drive.google.com/file/d/13S_unB87n6KKuuMdyMnyExW0G1kplTbP/view?usp=sharing).
99
 
100
  ```
101
  DiGIT
102
  └── data/
103
- β”œβ”€β”€ ILSVRC2012
104
- β”œβ”€β”€ dinov2_base_short_224_l3
105
- β”œβ”€β”€ km_8k.npy
106
- β”œβ”€β”€ dinov2_large_short_224_l3
107
- β”œβ”€β”€ km_16k.npy
108
  └── outputs/
109
- β”œβ”€β”€ base_8k_stage1
110
- β”œβ”€β”€ ...
111
  └── models/
112
- β”œβ”€β”€ vqgan_jax_strongaug.ckpt
113
- β”œβ”€β”€ dinov2_vitb14_reg4_pretrain.pth
114
- β”œβ”€β”€ dinov2_vitl14_reg4_pretrain.pth
115
  ```
116
 
117
  The training and inference code can be found at our github repo https://github.com/DAMO-NLP-SG/DiGIT
@@ -122,17 +121,12 @@ The training and inference code can be found at our github repo https://github.c
122
  If you find our project useful, hope you can star our repo and cite our work as follows.
123
 
124
  ```bibtex
125
-
126
-
127
- @misc
128
-
129
- {zhu2024stabilizelatentspaceimage,
130
- title={Stabilize the Latent Space for Image Autoregressive Modeling: A Unified Perspective},
131
- author={Yongxin Zhu and Bocheng Li and Hang Zhang and Xin Li and Linli Xu and Lidong Bing},
132
- year={2024},
133
- eprint={2410.12490},
134
- archivePrefix={arXiv},
135
- primaryClass={cs.CV},
136
- url={https://arxiv.org/abs/2410.12490},
137
  }
138
  ```
 
81
  | MIM | MaskGIT | 227M | 300 | 6.18 | 182.1 |
82
  | MIM | **DiGIT (+MaskGIT)** | 219M | 200 | **4.62** | **146.19** |
83
  | AR | VQGAN | 227M | 300 | 18.65 | 80.4 |
84
+ | AR | **DiGIT (+VQGAN)** | 219M | 400 | **4.79** | **142.87** |
85
  | AR | **DiGIT (+VQGAN)** | 732M | 200 | **3.39** | **205.96** |
86
 
87
  *: VAR is trained with classifier-free guidance while all the other models are not.
 
93
  | Model | Link |
94
  |:----------:|:-----:|
95
  | HF weightsπŸ€— | [Huggingface](https://huggingface.co/DAMO-NLP-SG/DiGIT) |
 
96
 
97
  For the base model we use [DINOv2-base](https://dl.fbaipublicfiles.com/dinov2/dinov2_vitb14/dinov2_vitb14_reg4_pretrain.pth) and [DINOv2-large](https://dl.fbaipublicfiles.com/dinov2/dinov2_vitl14/dinov2_vitl14_reg4_pretrain.pth) for large size model. The VQGAN we use is the same as [MAGE](https://drive.google.com/file/d/13S_unB87n6KKuuMdyMnyExW0G1kplTbP/view?usp=sharing).
98
 
99
  ```
100
  DiGIT
101
  └── data/
102
+ β”œβ”€β”€ ILSVRC2012
103
+ β”œβ”€β”€ dinov2_base_short_224_l3
104
+ β”œβ”€β”€ km_8k.npy
105
+ β”œβ”€β”€ dinov2_large_short_224_l3
106
+ β”œβ”€β”€ km_16k.npy
107
  └── outputs/
108
+ β”œβ”€β”€ base_8k_stage1
109
+ β”œβ”€β”€ ...
110
  └── models/
111
+ β”œβ”€β”€ vqgan_jax_strongaug.ckpt
112
+ β”œβ”€β”€ dinov2_vitb14_reg4_pretrain.pth
113
+ β”œβ”€β”€ dinov2_vitl14_reg4_pretrain.pth
114
  ```
115
 
116
  The training and inference code can be found at our github repo https://github.com/DAMO-NLP-SG/DiGIT
 
121
  If you find our project useful, hope you can star our repo and cite our work as follows.
122
 
123
  ```bibtex
124
+ @misc{zhu2024stabilize,
125
+ title={Stabilize the Latent Space for Image Autoregressive Modeling: A Unified Perspective},
126
+ author={Yongxin Zhu and Bocheng Li and Hang Zhang and Xin Li and Linli Xu and Lidong Bing},
127
+ year={2024},
128
+ eprint={2410.12490},
129
+ archivePrefix={arXiv},
130
+ primaryClass={cs.CV}
 
 
 
 
 
131
  }
132
  ```