DAMO-NLP-SG
/

DiGIT

Unconditional Image Generation

Fairseq

Model card Files Files and versions Community

zyx123 commited on 1 day ago

Commit

5c4c40a

•

1 Parent(s): 0e1109c

update readme

Browse files

Files changed (1) hide show

README.md +18 -24

README.md CHANGED Viewed

@@ -81,7 +81,7 @@ We present **DiGIT**, an auto-regressive generative model performing next-token
 | MIM | MaskGIT | 227M | 300 | 6.18 | 182.1 |
 | MIM | **DiGIT (+MaskGIT)** | 219M | 200 | **4.62** | **146.19** |
 | AR | VQGAN | 227M | 300 | 18.65 | 80.4 |
-| AR | **DiGIT (+VQGAN)** | 219M | 200 | **4.79** | **142.87** |
 | AR | **DiGIT (+VQGAN)** | 732M | 200 | **3.39** | **205.96** |
 *: VAR is trained with classifier-free guidance while all the other models are not.
@@ -93,25 +93,24 @@ The K-Means npy file and model checkpoints can be downloaded from:
 | Model | Link |
 |:----------:|:-----:|
 | HF weights🤗 | [Huggingface](https://huggingface.co/DAMO-NLP-SG/DiGIT) |
-| Google Drive | [Google Drive](https://drive.google.com/drive/folders/1QWc51HhnZ2G4xI7TkKRanaqXuo8WxUSI?usp=share_link) |
 For the base model we use [DINOv2-base](https://dl.fbaipublicfiles.com/dinov2/dinov2_vitb14/dinov2_vitb14_reg4_pretrain.pth) and [DINOv2-large](https://dl.fbaipublicfiles.com/dinov2/dinov2_vitl14/dinov2_vitl14_reg4_pretrain.pth) for large size model. The VQGAN we use is the same as [MAGE](https://drive.google.com/file/d/13S_unB87n6KKuuMdyMnyExW0G1kplTbP/view?usp=sharing).
 ```
 DiGIT
 └── data/
-├── ILSVRC2012
-├── dinov2_base_short_224_l3
-├── km_8k.npy
-├── dinov2_large_short_224_l3
-├── km_16k.npy
 └── outputs/
-├── base_8k_stage1
-├── ...
 └── models/
-├── vqgan_jax_strongaug.ckpt
-├── dinov2_vitb14_reg4_pretrain.pth
-├── dinov2_vitl14_reg4_pretrain.pth
 ```
 The training and inference code can be found at our github repo https://github.com/DAMO-NLP-SG/DiGIT
@@ -122,17 +121,12 @@ The training and inference code can be found at our github repo https://github.c
 If you find our project useful, hope you can star our repo and cite our work as follows.
 ```bibtex
-@misc
-{zhu2024stabilizelatentspaceimage,
-title={Stabilize the Latent Space for Image Autoregressive Modeling: A Unified Perspective},
-author={Yongxin Zhu and Bocheng Li and Hang Zhang and Xin Li and Linli Xu and Lidong Bing},
-year={2024},
-eprint={2410.12490},
-archivePrefix={arXiv},
-primaryClass={cs.CV},
-url={https://arxiv.org/abs/2410.12490},
 }
 ```

 | MIM | MaskGIT | 227M | 300 | 6.18 | 182.1 |
 | MIM | **DiGIT (+MaskGIT)** | 219M | 200 | **4.62** | **146.19** |
 | AR | VQGAN | 227M | 300 | 18.65 | 80.4 |
+| AR | **DiGIT (+VQGAN)** | 219M | 400 | **4.79** | **142.87** |
 | AR | **DiGIT (+VQGAN)** | 732M | 200 | **3.39** | **205.96** |
 *: VAR is trained with classifier-free guidance while all the other models are not.
 | Model | Link |
 |:----------:|:-----:|
 | HF weights🤗 | [Huggingface](https://huggingface.co/DAMO-NLP-SG/DiGIT) |
 For the base model we use [DINOv2-base](https://dl.fbaipublicfiles.com/dinov2/dinov2_vitb14/dinov2_vitb14_reg4_pretrain.pth) and [DINOv2-large](https://dl.fbaipublicfiles.com/dinov2/dinov2_vitl14/dinov2_vitl14_reg4_pretrain.pth) for large size model. The VQGAN we use is the same as [MAGE](https://drive.google.com/file/d/13S_unB87n6KKuuMdyMnyExW0G1kplTbP/view?usp=sharing).
 ```
 DiGIT
 └── data/
+    ├── ILSVRC2012
+        ├── dinov2_base_short_224_l3
+            ├── km_8k.npy
+        ├── dinov2_large_short_224_l3
+            ├── km_16k.npy
 └── outputs/
+    ├── base_8k_stage1
+    ├── ...
 └── models/
+    ├── vqgan_jax_strongaug.ckpt
+    ├── dinov2_vitb14_reg4_pretrain.pth
+    ├── dinov2_vitl14_reg4_pretrain.pth
 ```
 The training and inference code can be found at our github repo https://github.com/DAMO-NLP-SG/DiGIT
 If you find our project useful, hope you can star our repo and cite our work as follows.
 ```bibtex
+@misc{zhu2024stabilize,
+    title={Stabilize the Latent Space for Image Autoregressive Modeling: A Unified Perspective},
+    author={Yongxin Zhu and Bocheng Li and Hang Zhang and Xin Li and Linli Xu and Lidong Bing},
+    year={2024},
+    eprint={2410.12490},
+    archivePrefix={arXiv},
+    primaryClass={cs.CV}
 }
 ```