README.md · Yongxin-Guo/VTG-LLM at bfe52335823cecdc0d77a6ca4fb564f6c013db36

metadata

license: apache-2.0
datasets:
  - Yongxin-Guo/VTG-IT
tags:
  - dense-video-caption
  - video-highlight-detection
  - video-summarization
  - moment-retrieval

VTG-LLM: Integrating Timestamp Knowledge into Video LLMs for Enhanced Video Temporal Grounding

Overview

We introduce

VTG-IT-120K, a high-quality and comprehensive instruction tuning dataset that covers VTG tasks such as moment retrieval (63.2K), dense video captioning (37.2K), video summarization (15.2K), and video highlight detection (3.9K).
VTG-LLM, which (1) effectively integrates timestamp knowledge into visual tokens; (2) incorporates absolute-time tokens that specifically handle timestamp knowledge, thereby avoiding concept shifts; and (3) introduces a lightweight, high-performance slot-based token compression method to facilitate the sampling of more video frames.

How to Use

Please refer to GitHub repo for details.

Citation

If you find this repository helpful for your project, please consider citing:

@article{guo2024vtg,
  title={VTG-LLM: Integrating Timestamp Knowledge into Video LLMs for Enhanced Video Temporal Grounding},
  author={Guo, Yongxin and Liu, Jingyu and Li, Mingda and Tang, Xiaoying and Chen, Xi and Zhao, Bo},
  journal={arXiv preprint arXiv:2405.13382},
  year={2024}
}