SEED Multimodal

Powered by CV Center, Tencent AI Lab, and ARC Lab, Tencent PCG.

Usage

Dependencies

Python >= 3.8 (Recommend to use Anaconda)
PyTorch >= 1.11.0
NVIDIA GPU + CUDA

Installation

Clone the repo and install dependent packages

git clone https://github.com/AILab-CVC/SEED.git
cd SEED
pip install -r requirements.txt

Model Weights

We release the pretrained SEED Tokenizer and De-Tokenizer, pre-trained and instruction tuned SEED-LLaMA-8B and SEED-LLaMA-14B in SEED Hugging Face. Please download the checkpoints and save under the folder ./pretrained.

You can also download them separately as below,

Check the SEED tokenizer weights in AILab-CVC/seed-tokenizer-2
Check the SEED LLaMA(pre-trained 8B) weights in AILab-CVC/seed-llama-8b-pretrain
Check the SEED LLaMA(sft 8B) weights in AILab-CVC/seed-llama-8b-sft
Check the SEED LLaMA(pre-trained 14B) weights in AILab-CVC/seed-llama-14b-pretrain
Check the SEED LLaMA(sft 14B) weights in AILab-CVC/seed-llama-14b-sft

cd pretrained   # SEED/pretrained
git lfs install
git clone https://huggingface.co/AILab-CVC/SEED
mv SEED/* ./

To reconstruct the image from the SEED visual codes using unCLIP SD-UNet, please download the pretrained unCLIP SD. Rename the checkpoint directory to "diffusion_model" and create a soft link to the "pretrained/seed_tokenizer" directory.

# SEED/pretrained
git lfs install
git clone https://huggingface.co/stabilityai/stable-diffusion-2-1-unclip
mv stable-diffusion-2-1-unclip seed_tokenizer/diffusion_model

Inference for visual tokenization and de-tokenization

To discretize an image to 1D visual codes with causal dependency, and reconstruct the image from the visual codes using the off-the-shelf unCLIP SD-UNet:

cd ..   # SEED/ 
python scripts/seed_tokenizer_inference.py

Launching Gradio Demo of SEED-LLaMA-14B Locally

Building the local demo of SEED-LLaMA-14B currently requires 2*32GB devices.

# SEED/
# in first terminal
sh scripts/start_backend.sh
# in second terminal
sh scripts/start_frontend.sh

Then the demo can be accessed through http://127.0.0.1:80

Citation

If you find the work helpful, please consider citing:

@article{ge2023making,
  title={Making LLaMA SEE and Draw with SEED Tokenizer},
  author={Ge, Yuying and Zhao, Sijie and Zeng, Ziyun and Ge, Yixiao and Li, Chen and Wang, Xintao and Shan, Ying},
  journal={arXiv preprint arXiv:2310.01218},
  year={2023}
}

@article{ge2023planting,
  title={Planting a seed of vision in large language model},
  author={Ge, Yuying and Ge, Yixiao and Zeng, Ziyun and Wang, Xintao and Shan, Ying},
  journal={arXiv preprint arXiv:2307.08041},
  year={2023}
}

The project is still in progress. Stay tuned for more updates!

License

SEED is released under Apache License Version 2.0.

SEED-LLaMA is released under the original License of LLaMA2.

Acknowledgement

We thank the great work from unCLIP SD and BLIP2.

AILab-CVC
/

SEED