Edit model card

EMNLP 2024

This repository contains the official checkpoint for PixelGPT, as presented in the paper Autoregressive Pre-Training on Pixels and Texts (EMNLP 2024). For detailed instructions on how to use the model, please visit our GitHub page.

Model Description

PixelGPT is an autoregressive language model pre-trained exclusively on pixel data using a next patch prediction objective. By processing documents as visual data (pixels), the model learns to predict the next image patch in a sequence, enabling it to handle visually complex tasks without relying on tokenized text. This tokenization-free approach allows PixelGPT to process and understand text rendered as images.

Citation

@misc{chai2024autoregressivepretrainingpixelstexts,
  title = {Autoregressive Pre-Training on Pixels and Texts},
  author = {Chai, Yekun and Liu, Qingyi and Xiao, Jingwu and Wang, Shuohuan and Sun, Yu and Wu, Hua},
  year = {2024},
  eprint = {2404.10710},
  archiveprefix = {arXiv},
  primaryclass = {cs.CL},
  url = {https://arxiv.org/abs/2404.10710},
}
Downloads last month
20
Safetensors
Model size
350M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Collection including baidu/PixelGPT