Visual Question Answering
English
Edit model card

logo  HawkEye: Training Video-Text LLMs for Grounding Text in Videos

This repo provides the checkpoint of HawkEye, and our implementation of VideoChat2.

videochat2-stage3-our_impl.pth is the chekepoint of our reproduce of VideoChat2. You can use it as an substitution of hawkeye.pth.

  • The difference between it and HawkEye is: not trained with data from InternVid-G.
  • The difference between it and the original implementation of VideoChat2 is: the visual encoder is frozen, and not trained with image data from VideoChat2-IT

For more details please refer to our paper and github.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Examples
Unable to determine this model's library. Check the docs .

Datasets used to train wangyueqian/HawkEye