Discovering the Gems in Early Layers: Accelerating Long-Context LLMs with 1000x Input Token Reduction Paper • 2409.17422 • Published 4 days ago • 16
EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions Paper • 2409.18042 • Published 3 days ago • 25
Programming Every Example: Lifting Pre-training Data Quality like Experts at Scale Paper • 2409.17115 • Published 4 days ago • 50
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models Paper • 2409.17146 • Published 4 days ago • 73
MonoFormer: One Transformer for Both Diffusion and Autoregression Paper • 2409.16280 • Published 5 days ago • 16
OmniBench: Towards The Future of Universal Omni-Language Models Paper • 2409.15272 • Published 6 days ago • 22
SpaceBlender: Creating Context-Rich Collaborative Spaces Through Generative 3D Scene Blending Paper • 2409.13926 • Published 9 days ago • 4
Beyond Fine-tuning: Unleashing the Potential of Continuous Pretraining for Clinical LLMs Paper • 2409.14988 • Published 6 days ago • 20
PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions Paper • 2409.15278 • Published 6 days ago • 21
Phantom of Latent for Large Language and Vision Models Paper • 2409.14713 • Published 7 days ago • 26
A Preliminary Study of o1 in Medicine: Are We Closer to an AI Doctor? Paper • 2409.15277 • Published 6 days ago • 32
Fact, Fetch, and Reason: A Unified Evaluation of Retrieval-Augmented Generation Paper • 2409.12941 • Published 10 days ago • 14
Portrait Video Editing Empowered by Multimodal Generative Priors Paper • 2409.13591 • Published 9 days ago • 15
YesBut: A High-Quality Annotated Multimodal Dataset for evaluating Satire Comprehension capability of Vision-Language Models Paper • 2409.13592 • Published 9 days ago • 44
Scaling Smart: Accelerating Large Language Model Pre-training with Small Model Initialization Paper • 2409.12903 • Published 10 days ago • 19
B4: Towards Optimal Assessment of Plausible Code Solutions with Plausible Tests Paper • 2409.08692 • Published 17 days ago • 25
Training Language Models to Self-Correct via Reinforcement Learning Paper • 2409.12917 • Published 10 days ago • 120
LLaVA-3D: A Simple yet Effective Pathway to Empowering LMMs with 3D-awareness Paper • 2409.18125 • Published 3 days ago • 25
Scalable and Domain-General Abstractive Proposition Segmentation Paper • 2406.19803 • Published Jun 28 • 1
Llama 3.2 Evals Collection This collection provides detailed information on how we derived the reported benchmark metrics for the Llama 3.2 models, including the configurations • 4 items • Updated 4 days ago • 13
Llama 3.2 Collection This collection hosts the transformers and original repos of the Llama 3.2 and Llama Guard 3 • 11 items • Updated 4 days ago • 287
Llama 3.2 3B & 1B GGUF Quants Collection Llama.cpp compatible quants for Llama 3.2 3B and 1B Instruct models. • 4 items • Updated 4 days ago • 33
Loradex Highlights Collection This collection features awesome opensource LoRAs trained by members of the Glif Community during Loradex Early Access! • 12 items • Updated 6 days ago • 15
3DTopia-XL: Scaling High-quality 3D Asset Generation via Primitive Diffusion Paper • 2409.12957 • Published 10 days ago • 17
Oryx MLLM: On-Demand Spatial-Temporal Understanding at Arbitrary Resolution Paper • 2409.12961 • Published 10 days ago • 22
InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced Mathematical Reasoning Paper • 2409.12568 • Published 11 days ago • 45
Preference Tuning with Human Feedback on Language, Speech, and Vision Tasks: A Survey Paper • 2409.11564 • Published 12 days ago • 17
A Controlled Study on Long Context Extension and Generalization in LLMs Paper • 2409.12181 • Published 11 days ago • 41
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution Paper • 2409.12191 • Published 11 days ago • 65
A Comprehensive Evaluation of Quantized Instruction-Tuned Large Language Models: An Experimental Analysis up to 405B Paper • 2409.11055 • Published 13 days ago • 16
ReCLAP: Improving Zero Shot Audio Classification by Describing Sounds Paper • 2409.09213 • Published 16 days ago • 10
RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval Paper • 2409.10516 • Published 13 days ago • 31
Seed-Music: A Unified Framework for High Quality and Controlled Music Generation Paper • 2409.09214 • Published 16 days ago • 44
Mamba-YOLO-World: Marrying YOLO-World with Mamba for Open-Vocabulary Detection Paper • 2409.08513 • Published 17 days ago • 10
DrawingSpinUp: 3D Animation from Single Character Drawings Paper • 2409.08615 • Published 17 days ago • 14
InstantDrag: Improving Interactivity in Drag-based Image Editing Paper • 2409.08857 • Published 16 days ago • 30
Oryx Collection Oryx: One Multi-Modal LLM for On-Demand Spatial-Temporal Understanding • 5 items • Updated 11 days ago • 9
Moshi v0.1 Release Collection MLX, Candle & PyTorch model checkpoints released as part of the Moshi release from Kyutai. Run inference via: https://github.com/kyutai-labs/moshi • 13 items • Updated 11 days ago • 192
jina-embeddings-v3 Collection Multilingual multi-task general text embedding model • 6 items • Updated 11 days ago • 12
Source2Synth: Synthetic Data Generation and Curation Grounded in Real Data Sources Paper • 2409.08239 • Published 17 days ago • 15
DSBench: How Far Are Data Science Agents to Becoming Data Science Experts? Paper • 2409.07703 • Published 18 days ago • 61
Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale Paper • 2409.08264 • Published 17 days ago • 41
Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers Paper • 2409.04109 • Published 24 days ago • 37
MEDIC: Towards a Comprehensive Framework for Evaluating LLMs in Clinical Applications Paper • 2409.07314 • Published 18 days ago • 50
DataGemma Release Collection A series of pioneering open models that help ground LLMs in real-world data through Data Commons. • 2 items • Updated 18 days ago • 75