mazpie's picture
Initial commit
2d9a728
|
raw
history blame
5.94 kB

Model Zoo

Pretraining

For $\text{InternVideo2}{s2}$, we load those models of $\text{InternVideo2}{s1}$ and further pretrain them on multi-modality datasets.

For $\text{InternVideo2}{clip}$, we load those models of $\text{InternVideo2}{s2}$.

Model Setting Model Pretraining Script
$\text{InternVideo2}_{s2}$-1B IV-25.5M :hugs: HF link script
$\text{InternVideo2}_{clip}$-1B IV-25.5M TBD script
$\text{InternVideo2}_{s2}$-6B IV-400M TBD script
$\text{InternVideo2}_{clip}$-6B IV-400M TBD script

Zero-shot Evaluation

Zero-Shot Video-Text Retrieval

Model Dataset T2V V2T Evaluation Script
$\text{InternVideo2}_{s2}$-1B MSRVTT 51.9 50.9 script
LSMDC 32.0 27.3 script
DiDeMo 57.0 54.3 script
MSVD 58.1 83.3 script
ANet 60.4 54.8 script
VATEX 70.4 85.4 script
$\text{InternVideo2}_{s2}$-6B MSRVTT 55.9 53.7 TBD
LSMDC 33.8 30.1 TBD
DiDeMo 57.9 57.1 TBD
MSVD 59.3 83.1 TBD
ANet 63.2 56.5 TBD
VATEX 71.5 85.3 TBD
Model Dataset T2V V2T Evaluation Script
$\text{InternVideo2}_{clip}$-1B MSRVTT 50.0 48.4 script
LSMDC 26.4 23.1 script
DiDeMo 47.8 46.4 script
ANet 49.4 46.2 script
VATEX_en 63.5 81.2 script
VATEX_ch 54.9 76.4 script
$\text{InternVideo2}_{clip}$-6B MSRVTT 50.9 50.6 script
LSMDC 29.4 26.3 script
DiDeMo 50.5 46.8 script
ANet 50.2 47.5 script
VATEX_en 64.1 82.6 script
VATEX_ch 54.6 76.9 script

Zero-Shot Action Recognition

Model Dataset top-1 AVG Script
$\text{InternVideo2}_{clip}$-1B K400 73.1 82.4 script
K600 72.8 81.8 script
K700 64.9 75.2 script
UCF101 88.8 - script
HMDB51 53.9 - script
MiT 31.6 - script
SSv2-MC 61.5 - script
$\text{InternVideo2}_{clip}$-6B K400 72.7 82.2 script
K600 71.7 81.2 script
K700 64.2 75.2 script
UCF101 89.5 - script
HMDB51 56.7 - script
MiT 32.9 - script
SSv2-MC 63.5 - script
Model Dataset mAP Script
$\text{InternVideo2}_{clip}$-1B Charades 32.9 script
$\text{InternVideo2}_{clip}$-6B Charades 34.6 script