A collection of Audio, Video and Visual LLMs.
GOT - OCR (from : UCAS, Beijing)
VLMEvalKit Eval Results in video understanding benchmark