Installation

Requirements

We mainly follow UMT to prepare the enviroment.

pip install -r requirements.txt

In addition, in order to support the InternVideo2-6B pre-training, you also need to install Flash Attention and DeepSpeed.

To run InternVideo2 pretraining, you have to prepare the weights of the InternVL-6B visual encoder, and set the your_model_path in internvl_clip_vision.py.

Some modules (FusedMLP and DropoutLayerNorm) from FlashAttention2 used in our models rely on CUDA extensions. TBD