ZeroI2V: Zero-Cost Adaptation of Pre-trained Transformers from Image to Video(ECCV2024)
This repo is the official model checkpoints of "ZeroI2V: Zero-Cost Adaptation of Pre-trained Transformers from Image to Video"(ECCV2024)
Models
We provide the checkpoints before reparameterization, you could reparameter the weight refer to tools\weight_reparam.py
in our codes.
Kinetics 400
Backbone | Pretrain | GFLOPs | Param | New Param (M) | acc@1 | Views |
---|---|---|---|---|---|---|
ViT-B/16 | CLIP | 422 | 86 | 0 | 83.0 | 8x1x3 |
ViT-L/14 | CLIP | 1946 | 304 | 0 | 86.3 | 8x1x3 |
ViT-L/14 | CLIP | 7783 | 304 | 0 | 87.2 | 32x1x3 |
Something Something V2
Backbone | Pretrain | GFLOPs | Param | New Param (M) | acc@1 | Views |
---|---|---|---|---|---|---|
ViT-L/14 | CLIP | 7783 | 304 | 0 | 72.2 | 32x3x1 |
If you find our work useful in your research, please cite:
@article{li2023zeroi2v,
title={ZeroI2V: Zero-Cost Adaptation of Pre-trained Transformers from Image to Video},
author={Li, Xinhao and Zhu, Yuhan and Wang, Limin},
journal={arXiv preprint arXiv:2310.01324},
year={2023}
}