EgoVid-5M: A Large-Scale Video-Action Dataset for Egocentric Video Generation Paper • 2411.08380 • Published 11 days ago • 24
Large Language Models Can Self-Improve in Long-context Reasoning Paper • 2411.08147 • Published 12 days ago • 59
Enhancing Training Efficiency Using Packing with Flash Attention Paper • 2407.09105 • Published Jul 12 • 14
Chain of Ideas: Revolutionizing Research in Novel Idea Development with LLM Agents Paper • 2410.13185 • Published Oct 17 • 6
Stabilize the Latent Space for Image Autoregressive Modeling: A Unified Perspective Paper • 2410.12490 • Published Oct 16 • 8
DiGIT Collection The corresponding checkpoints of [NeurIPS 2024] Stabilize the Latent Space for Image Autoregressive Modeling: A Unified Perspective • 2 items • Updated about 1 month ago • 1
Inf-CL Collection The corresponding demos/checkpoints/papers/datasets of Inf-CL. • 2 items • Updated about 1 month ago • 2
Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss Paper • 2410.17243 • Published Oct 22 • 88
The Curse of Multi-Modalities: Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio Paper • 2410.12787 • Published Oct 16 • 30
SeaLLMs 3: Open Foundation and Chat Multilingual Large Language Models for Southeast Asian Languages Paper • 2407.19672 • Published Jul 29 • 55
VideoLLaMA 2 Collection Optimized VideoLLaMA with improved spatial-temporal modeling and better audio understanding capability • 13 items • Updated 11 days ago • 21
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs Paper • 2406.07476 • Published Jun 11 • 32