update readme
Browse files
README.md
CHANGED
@@ -81,7 +81,7 @@ We present **DiGIT**, an auto-regressive generative model performing next-token
|
|
81 |
| MIM | MaskGIT | 227M | 300 | 6.18 | 182.1 |
|
82 |
| MIM | **DiGIT (+MaskGIT)** | 219M | 200 | **4.62** | **146.19** |
|
83 |
| AR | VQGAN | 227M | 300 | 18.65 | 80.4 |
|
84 |
-
| AR | **DiGIT (+VQGAN)** | 219M |
|
85 |
| AR | **DiGIT (+VQGAN)** | 732M | 200 | **3.39** | **205.96** |
|
86 |
|
87 |
*: VAR is trained with classifier-free guidance while all the other models are not.
|
@@ -93,25 +93,24 @@ The K-Means npy file and model checkpoints can be downloaded from:
|
|
93 |
| Model | Link |
|
94 |
|:----------:|:-----:|
|
95 |
| HF weightsπ€ | [Huggingface](https://huggingface.co/DAMO-NLP-SG/DiGIT) |
|
96 |
-
| Google Drive | [Google Drive](https://drive.google.com/drive/folders/1QWc51HhnZ2G4xI7TkKRanaqXuo8WxUSI?usp=share_link) |
|
97 |
|
98 |
For the base model we use [DINOv2-base](https://dl.fbaipublicfiles.com/dinov2/dinov2_vitb14/dinov2_vitb14_reg4_pretrain.pth) and [DINOv2-large](https://dl.fbaipublicfiles.com/dinov2/dinov2_vitl14/dinov2_vitl14_reg4_pretrain.pth) for large size model. The VQGAN we use is the same as [MAGE](https://drive.google.com/file/d/13S_unB87n6KKuuMdyMnyExW0G1kplTbP/view?usp=sharing).
|
99 |
|
100 |
```
|
101 |
DiGIT
|
102 |
βββ data/
|
103 |
-
βββ ILSVRC2012
|
104 |
-
βββ dinov2_base_short_224_l3
|
105 |
-
βββ km_8k.npy
|
106 |
-
βββ dinov2_large_short_224_l3
|
107 |
-
βββ km_16k.npy
|
108 |
βββ outputs/
|
109 |
-
βββ base_8k_stage1
|
110 |
-
βββ ...
|
111 |
βββ models/
|
112 |
-
βββ vqgan_jax_strongaug.ckpt
|
113 |
-
βββ dinov2_vitb14_reg4_pretrain.pth
|
114 |
-
βββ dinov2_vitl14_reg4_pretrain.pth
|
115 |
```
|
116 |
|
117 |
The training and inference code can be found at our github repo https://github.com/DAMO-NLP-SG/DiGIT
|
@@ -122,17 +121,12 @@ The training and inference code can be found at our github repo https://github.c
|
|
122 |
If you find our project useful, hope you can star our repo and cite our work as follows.
|
123 |
|
124 |
```bibtex
|
125 |
-
|
126 |
-
|
127 |
-
|
128 |
-
|
129 |
-
{
|
130 |
-
|
131 |
-
|
132 |
-
year={2024},
|
133 |
-
eprint={2410.12490},
|
134 |
-
archivePrefix={arXiv},
|
135 |
-
primaryClass={cs.CV},
|
136 |
-
url={https://arxiv.org/abs/2410.12490},
|
137 |
}
|
138 |
```
|
|
|
81 |
| MIM | MaskGIT | 227M | 300 | 6.18 | 182.1 |
|
82 |
| MIM | **DiGIT (+MaskGIT)** | 219M | 200 | **4.62** | **146.19** |
|
83 |
| AR | VQGAN | 227M | 300 | 18.65 | 80.4 |
|
84 |
+
| AR | **DiGIT (+VQGAN)** | 219M | 400 | **4.79** | **142.87** |
|
85 |
| AR | **DiGIT (+VQGAN)** | 732M | 200 | **3.39** | **205.96** |
|
86 |
|
87 |
*: VAR is trained with classifier-free guidance while all the other models are not.
|
|
|
93 |
| Model | Link |
|
94 |
|:----------:|:-----:|
|
95 |
| HF weightsπ€ | [Huggingface](https://huggingface.co/DAMO-NLP-SG/DiGIT) |
|
|
|
96 |
|
97 |
For the base model we use [DINOv2-base](https://dl.fbaipublicfiles.com/dinov2/dinov2_vitb14/dinov2_vitb14_reg4_pretrain.pth) and [DINOv2-large](https://dl.fbaipublicfiles.com/dinov2/dinov2_vitl14/dinov2_vitl14_reg4_pretrain.pth) for large size model. The VQGAN we use is the same as [MAGE](https://drive.google.com/file/d/13S_unB87n6KKuuMdyMnyExW0G1kplTbP/view?usp=sharing).
|
98 |
|
99 |
```
|
100 |
DiGIT
|
101 |
βββ data/
|
102 |
+
βββ ILSVRC2012
|
103 |
+
βββ dinov2_base_short_224_l3
|
104 |
+
βββ km_8k.npy
|
105 |
+
βββ dinov2_large_short_224_l3
|
106 |
+
βββ km_16k.npy
|
107 |
βββ outputs/
|
108 |
+
βββ base_8k_stage1
|
109 |
+
βββ ...
|
110 |
βββ models/
|
111 |
+
βββ vqgan_jax_strongaug.ckpt
|
112 |
+
βββ dinov2_vitb14_reg4_pretrain.pth
|
113 |
+
βββ dinov2_vitl14_reg4_pretrain.pth
|
114 |
```
|
115 |
|
116 |
The training and inference code can be found at our github repo https://github.com/DAMO-NLP-SG/DiGIT
|
|
|
121 |
If you find our project useful, hope you can star our repo and cite our work as follows.
|
122 |
|
123 |
```bibtex
|
124 |
+
@misc{zhu2024stabilize,
|
125 |
+
title={Stabilize the Latent Space for Image Autoregressive Modeling: A Unified Perspective},
|
126 |
+
author={Yongxin Zhu and Bocheng Li and Hang Zhang and Xin Li and Linli Xu and Lidong Bing},
|
127 |
+
year={2024},
|
128 |
+
eprint={2410.12490},
|
129 |
+
archivePrefix={arXiv},
|
130 |
+
primaryClass={cs.CV}
|
|
|
|
|
|
|
|
|
|
|
131 |
}
|
132 |
```
|