Yongxin-Guo
/

VTG-LLM

dense-video-caption

video-highlight-detection

video-summarization

moment-retrieval

Model card Files Files and versions Community

VTG-LLM / README.md

Yongxin-Guo's picture

Update README.md

927e23c verified 6 months ago

|

1.24 kB

	---
	license: apache-2.0
	---

	[VTG-LLM: Integrating Timestamp Knowledge into Video LLMs for Enhanced Video Temporal Grounding](https://arxiv.org/abs/2405.13382)

	## Overview

	We introduce
	- VTG-IT-120K, a high-quality and comprehensive instruction tuning dataset that covers VTG tasks such as moment retrieval (63.2K), dense video captioning (37.2K), video summarization (15.2K), and video highlight detection (3.9K).
	- VTG-LLM, which (1) effectively integrates timestamp knowledge into visual tokens; (2) incorporates absolute-time tokens that specifically handle timestamp knowledge, thereby avoiding concept shifts; and (3) introduces a lightweight, high-performance slot-based token compression method to facilitate the sampling of more video frames.

	## How to Use

	Please refer to [GitHub repo](https://github.com/gyxxyg/VTG-LLM) for details.

	## Citation
	If you find this repository helpful for your project, please consider citing:
	```
	@article{guo2024vtg,
	title={VTG-LLM: Integrating Timestamp Knowledge into Video LLMs for Enhanced Video Temporal Grounding},
	author={Guo, Yongxin and Liu, Jingyu and Li, Mingda and Tang, Xiaoying and Chen, Xi and Zhao, Bo},
	journal={arXiv preprint arXiv:2405.13382},
	year={2024}
	}
	```