LongVA-7B / README.md
PY007's picture
Create README.md
9cfdaa9 verified
|
raw
history blame
No virus
718 Bytes

LongVA

🌐 Blog | πŸ“ƒ Paper | πŸ€— Hugging Face | πŸŽ₯ Demo

Long context capability can zero-shot transfer from language to vision.

LongVA can process 2000 frames or over 200K visual tokens. It achieves state-of-the-art performance on Video-MME among 7B models.