GIT: A Generative Image-to-text Transformer for Vision and Language
Paper
•
2205.14100
•
Published
•
1
GIT (Generative Image-to-text Transformer) is a model useful for vision-language tasks such as image/video captioning and question answering.