Grounded-VideoLLM: Sharpening Fine-grained Temporal Grounding in Video Large Language Models Paper • 2410.03290 • Published Oct 4 • 6
Toffee: Efficient Million-Scale Dataset Construction for Subject-Driven Text-to-Image Generation Paper • 2406.09305 • Published Jun 13 • 4
Toffee: Efficient Million-Scale Dataset Construction for Subject-Driven Text-to-Image Generation Paper • 2406.09305 • Published Jun 13 • 4 • 2
LAFITE: Towards Language-Free Training for Text-to-Image Generation Paper • 2111.13792 • Published Nov 27, 2021
LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding Paper • 2306.17107 • Published Jun 29, 2023 • 11
Enhancing Detail Preservation for Customized Text-to-Image Generation: A Regularization-Free Approach Paper • 2305.13579 • Published May 23, 2023 • 3