English

This is the model release of the paper

Elucidating the design space of language models for image generation

You may check the paper: arXiv, code: Github

We provide 4 Binary-Autoencoder (BAE) tokenizers, following Binary Latent Diffusion, with code dimension 16, 10, 24 and 32, each trained for 1,000,000 iterations with batch size 256.

Code Dim Bernoulli Sampling Link Size
16 link 332MB
16 link 332MB
20 link 332MB
24 link 332MB

The generation model architecture is adapted from Llama2, following LlameGen.

Model Link Size
AR-L [1-16] [2-8] [2-10] [2-12] 1.25GB~1.77GB
AR-XL [1-16] [2-8] [2-10] [2-12] 2.95GB~3.6GB
AR-XXL [1-16] [2-10] [2-12] 5.49GB~6.25GB
AR-2B [2-12] 7.64GB
MLM-L [1-16] 1.51GB
MLM-XL [1-16] 3.27GB
MLM-XXL [1-16] 5.86GB
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference API
Unable to determine this model's library. Check the docs .

Model tree for xuantonglll/ELM

Unable to build the model tree, the base model loops to the model itself. Learn more.

Dataset used to train xuantonglll/ELM