CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data Paper • 2404.15653 • Published Apr 24 • 26
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning Paper • 2405.12130 • Published May 20 • 45
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention Paper • 2405.12981 • Published May 21 • 28
LiteVAE: Lightweight and Efficient Variational Autoencoders for Latent Diffusion Models Paper • 2405.14477 • Published May 23 • 16
Tele-Aloha: A Low-budget and High-authenticity Telepresence System Using Sparse RGB Cameras Paper • 2405.14866 • Published May 23 • 5
VeLoRA: Memory Efficient Training using Rank-1 Sub-Token Projections Paper • 2405.17991 • Published May 28 • 11
Artificial Generational Intelligence: Cultural Accumulation in Reinforcement Learning Paper • 2406.00392 • Published Jun 1 • 12
Block Transformer: Global-to-Local Language Modeling for Fast Inference Paper • 2406.02657 • Published Jun 4 • 36
PLaD: Preference-based Large Language Model Distillation with Pseudo-Preference Pairs Paper • 2406.02886 • Published Jun 5 • 7
Scaling Laws for Reward Model Overoptimization in Direct Alignment Algorithms Paper • 2406.02900 • Published Jun 5 • 10
GenAI Arena: An Open Evaluation Platform for Generative Models Paper • 2406.04485 • Published Jun 6 • 19
WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild Paper • 2406.04770 • Published Jun 7 • 26
Why Has Predicting Downstream Capabilities of Frontier AI Models with Scale Remained Elusive? Paper • 2406.04391 • Published Jun 6 • 6
Boosting Large-scale Parallel Training Efficiency with C4: A Communication-Driven Approach Paper • 2406.04594 • Published Jun 7 • 4
The Prompt Report: A Systematic Survey of Prompting Techniques Paper • 2406.06608 • Published Jun 6 • 52
DiTFastAttn: Attention Compression for Diffusion Transformer Models Paper • 2406.08552 • Published Jun 12 • 22
Interpreting the Weight Space of Customized Diffusion Models Paper • 2406.09413 • Published Jun 13 • 18
MLKV: Multi-Layer Key-Value Heads for Memory Efficient Transformer Decoding Paper • 2406.09297 • Published Jun 13 • 4
A Simple and Effective L_2 Norm-Based Strategy for KV Cache Compression Paper • 2406.11430 • Published Jun 17 • 23
MMBench-Video: A Long-Form Multi-Shot Benchmark for Holistic Video Understanding Paper • 2406.14515 • Published Jun 20 • 32
Model Merging and Safety Alignment: One Bad Model Spoils the Bunch Paper • 2406.14563 • Published Jun 20 • 30
Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges Paper • 2406.12624 • Published Jun 18 • 36
We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning? Paper • 2407.01284 • Published Jul 1 • 75
MMEvalPro: Calibrating Multimodal Benchmarks Towards Trustworthy and Efficient Evaluation Paper • 2407.00468 • Published Jun 29 • 35
μ-Bench: A Vision-Language Benchmark for Microscopy Understanding Paper • 2407.01791 • Published Jul 1 • 5
No Training, No Problem: Rethinking Classifier-Free Guidance for Diffusion Models Paper • 2407.02687 • Published Jul 2 • 22
Learning to (Learn at Test Time): RNNs with Expressive Hidden States Paper • 2407.04620 • Published Jul 5 • 26
MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation? Paper • 2407.04842 • Published Jul 5 • 52
ANOLE: An Open, Autoregressive, Native Large Multimodal Models for Interleaved Image-Text Generation Paper • 2407.06135 • Published Jul 8 • 20
An accurate detection is not all you need to combat label noise in web-noisy datasets Paper • 2407.05528 • Published Jul 8 • 3
CosmoCLIP: Generalizing Large Vision-Language Models for Astronomical Imaging Paper • 2407.07315 • Published Jul 10 • 6
SHERL: Synthesizing High Accuracy and Efficient Memory for Resource-Limited Transfer Learning Paper • 2407.07523 • Published Jul 10 • 4
NeedleBench: Can LLMs Do Retrieval and Reasoning in 1 Million Context Window? Paper • 2407.11963 • Published Jul 16 • 43
From GaLore to WeLore: How Low-Rank Weights Non-uniformly Emerge from Low-Rank Gradients Paper • 2407.11239 • Published Jul 15 • 7
EfficientQAT: Efficient Quantization-Aware Training for Large Language Models Paper • 2407.11062 • Published Jul 10 • 8
NNsight and NDIF: Democratizing Access to Foundation Model Internals Paper • 2407.14561 • Published Jul 18 • 33
LongVideoBench: A Benchmark for Long-context Interleaved Video-Language Understanding Paper • 2407.15754 • Published Jul 22 • 19
Longhorn: State Space Models are Amortized Online Learners Paper • 2407.14207 • Published Jul 19 • 16
The Impact of Hyperparameters on Large Language Model Inference Performance: An Evaluation of vLLM and HuggingFace Pipelines Paper • 2408.01050 • Published Aug 2 • 8
UniBench: Visual Reasoning Requires Rethinking Vision-Language Beyond Scaling Paper • 2408.04810 • Published Aug 9 • 22
VisualAgentBench: Towards Large Multimodal Models as Visual Foundation Agents Paper • 2408.06327 • Published Aug 12 • 13
Heavy Labels Out! Dataset Distillation with Label Space Lightening Paper • 2408.08201 • Published Aug 15 • 17
MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans? Paper • 2408.13257 • Published Aug 23 • 25
K-Sort Arena: Efficient and Reliable Benchmarking for Generative Models via K-wise Human Preferences Paper • 2408.14468 • Published Aug 26 • 33
Dolphin: Long Context as a New Modality for Energy-Efficient On-Device Language Models Paper • 2408.15518 • Published Aug 28 • 41
SaRA: High-Efficient Diffusion Model Fine-tuning with Progressive Sparse Low-Rank Adaptation Paper • 2409.06633 • Published 24 days ago • 14
RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval Paper • 2409.10516 • Published 18 days ago • 33
InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced Mathematical Reasoning Paper • 2409.12568 • Published 15 days ago • 46
Oryx MLLM: On-Demand Spatial-Temporal Understanding at Arbitrary Resolution Paper • 2409.12961 • Published 15 days ago • 23
Prithvi WxC: Foundation Model for Weather and Climate Paper • 2409.13598 • Published 14 days ago • 32
Time-MoE: Billion-Scale Time Series Foundation Models with Mixture of Experts Paper • 2409.16040 • Published 10 days ago • 9
VPTQ: Extreme Low-bit Vector Post-Training Quantization for Large Language Models Paper • 2409.17066 • Published 9 days ago • 21