LLaVA-o1: Let Vision Language Models Reason Step-by-Step Paper β’ 2411.10440 β’ Published 6 days ago β’ 87
LLaMA-Mesh: Unifying 3D Mesh Generation with Language Models Paper β’ 2411.09595 β’ Published 7 days ago β’ 65
BLIP3-KALE: Knowledge Augmented Large-Scale Dense Captions Paper β’ 2411.07461 β’ Published 10 days ago β’ 21
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents Paper β’ 2410.23218 β’ Published 22 days ago β’ 46
MobileLLM Collection Optimizing Sub-billion Parameter Language Models for On-Device Use Cases (ICML 2024) https://arxiv.org/abs/2402.14905 β’ 8 items β’ Updated 15 days ago β’ 95
MarDini: Masked Autoregressive Diffusion for Video Generation at Scale Paper β’ 2410.20280 β’ Published 26 days ago β’ 21
CogVLM2 Collection This collection hosts the repos of the THUDM's CogVLM2 releases β’ 8 items β’ Updated Aug 18 β’ 18
LoLCATS Collection Linearizing LLMs with high quality and efficiency. We linearize the full Llama 3.1 model family -- 8b, 70b, 405b -- for the first time! β’ 4 items β’ Updated Oct 14 β’ 14
based Collection These language model checkpoints are trained at the 360M and 1.3Bn parameter scales for up to 50Bn tokens on the Pile corpus, for research purposes. β’ 15 items β’ Updated Oct 18 β’ 9
Animate-X: Universal Character Image Animation with Enhanced Motion Representation Paper β’ 2410.10306 β’ Published Oct 14 β’ 52
Cavia: Camera-controllable Multi-view Video Diffusion with View-Integrated Attention Paper β’ 2410.10774 β’ Published Oct 14 β’ 24
Loradex Highlights Collection This collection features awesome opensource LoRAs trained by members of the Glif Community during Loradex Early Access! β’ 14 items β’ Updated Oct 18 β’ 18
Semantic Image Inversion and Editing using Rectified Stochastic Differential Equations Paper β’ 2410.10792 β’ Published Oct 14 β’ 26
African History Collection A collection of data on the history of mankind β’ 5 items β’ Updated 12 days ago β’ 1