h2ovl-mississippi-benchmarks / filtered_opencompass.csv
Shanshan Wang
updated multimodal benchmarks
9581bcd
Category,Models,Type,Params (B),Language Model,Vision Model,Avg. Score (8 single image benchmarks),MMBench V1.1_TEST,MMStar,MMMU,Math Vista,Hallusion Bench Avg,AI2D_TEST,OCR Bench,MMVet
Similar score models,Qwen2-VL-2B,Open,2.1,Qwen2-1.5B,ViT-600M,57.3,72.2,47.5,42.2,47.8,42.4,74.7,79.7,51.5
Similar score models,H2O-Mississippi-2B,Open,2.1,Danube2 1.8B,InternViT-300M,54.5,64.8,49.6,35.2,56.8,36.4,69.9,78.2,44.7
Similar score models,InternVL2-2B,Open,2.1,InternLM2-1.8B,InternViT-300M,54.0,69.6,49.8,36.3,46.0,38.0,74.1,78.1,39.7
Similar score models,Phi-3-Vision - Microsoft,Open,4.2,Phi-3,CLIP ViT-L/14,53.6,65.2,47.7,46.1,44.6,39.0,78.4,63.7,44.1
Similar score models,Claude3-Opus - Anthropic,Closed,Unknown,,,54.4,59.1,45.7,54.9,45.8,37.8,70.6,69.4,51.7
Similar score models,Claude3-Sonnet- Anthropic,Closed,Unknown,,,53.5,63.9,44.2,47.4,45.0,41.3,69.9,64.6,51.7
Similar score models,Cambrian-13B,Open,13,Vicuna-v1.5-13B,CLIP ViT-L/14,53.3,67.5,47.1,41.6,47.4,39.4,73.6,61.0,48.9
Similar score models,Qwen-VL-Plus - Alibaba,Closed,Unknown,,,52.2,66.2,39.7,39.8,37.6,40.6,65.7,72.6,55.7
Similar size models,Qwen2-VL-2B,Open,2.1,Qwen2-1.5B,ViT-600M,57.3,72.2,47.5,42.2,47.8,42.4,74.7,79.7,51.5
Similar size models,H2O-Mississippi-2B,Open,2.1,Danube2 1.8B,InternViT-300M,54.5,64.8,49.6,35.2,56.8,36.4,69.9,78.2,44.7
Similar size models,InternVL2-2B,Open,2.1,InternLM2-1.8B,InternViT-300M,54.0,69.6,49.8,36.3,46.0,38.0,74.1,78.1,39.7
Similar size models,Phi-3-Vision - Microsoft,Open,4.2,Phi-3,CLIP ViT-L/14,53.6,65.2,47.7,46.1,44.6,39.0,78.4,63.7,44.1
Similar size models,MiniCPM-V-2 ,Open,2.8,MiniCPM-2.4B,SigLip-400M,47.9,65.8,39.1,38.2,39.8,36.1,62.9,60.5,41.0
Similar size models,PaliGemma-3B-mix-448 ,Open,3,Gemma-2B,SigLip-400M,46.6,65.6,48.3,34.9,28.7,32.2,68.3,61.4,33.1
Similar size models,DeepSeek-VL-1.3B ,Open,2,DeekSeek-1B,SAM-B & SigLIP-L,39.6,63.8,39.9,33.8,29.8,27.6,51.5,41.3,29.2