VTG-LLM / README.md
Yongxin-Guo's picture
Update README.md
927e23c verified
|
raw
history blame
1.24 kB
---
license: apache-2.0
---
[VTG-LLM: Integrating Timestamp Knowledge into Video LLMs for Enhanced Video Temporal Grounding](https://arxiv.org/abs/2405.13382)
## Overview
We introduce
- VTG-IT-120K, a high-quality and comprehensive instruction tuning dataset that covers VTG tasks such as moment retrieval (63.2K), dense video captioning (37.2K), video summarization (15.2K), and video highlight detection (3.9K).
- VTG-LLM, which (1) effectively integrates timestamp knowledge into visual tokens; (2) incorporates absolute-time tokens that specifically handle timestamp knowledge, thereby avoiding concept shifts; and (3) introduces a lightweight, high-performance slot-based token compression method to facilitate the sampling of more video frames.
## How to Use
Please refer to [GitHub repo](https://github.com/gyxxyg/VTG-LLM) for details.
## Citation
If you find this repository helpful for your project, please consider citing:
```
@article{guo2024vtg,
title={VTG-LLM: Integrating Timestamp Knowledge into Video LLMs for Enhanced Video Temporal Grounding},
author={Guo, Yongxin and Liu, Jingyu and Li, Mingda and Tang, Xiaoying and Chen, Xi and Zhao, Bo},
journal={arXiv preprint arXiv:2405.13382},
year={2024}
}
```