LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs Paper • 2408.07055 • Published Aug 13 • 65 • 4
Transformer Explainer: Interactive Learning of Text-Generative Models Paper • 2408.04619 • Published Aug 8 • 152 • 17
Unleashing the Power of Data Tsunami: A Comprehensive Survey on Data Assessment and Selection for Instruction Tuning of Language Models Paper • 2408.02085 • Published Aug 4 • 17 • 4
MM-Vet v2: A Challenging Benchmark to Evaluate Large Multimodal Models for Integrated Capabilities Paper • 2408.00765 • Published Aug 1 • 12 • 9
Case2Code: Learning Inductive Reasoning with Synthetic Data Paper • 2407.12504 • Published Jul 17 • 7 • 7
Q-Sparse: All Large Language Models can be Fully Sparsely-Activated Paper • 2407.10969 • Published Jul 15 • 20 • 3
Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients Paper • 2407.08296 • Published Jul 11 • 31 • 3
Autoregressive Speech Synthesis without Vector Quantization Paper • 2407.08551 • Published Jul 11 • 13 • 4
Inference Performance Optimization for Large Language Models on CPUs Paper • 2407.07304 • Published Jul 10 • 52 • 7
ProgressGym: Alignment with a Millennium of Moral Progress Paper • 2406.20087 • Published Jun 28 • 3 • 2
Wavelets Are All You Need for Autoregressive Image Generation Paper • 2406.19997 • Published Jun 28 • 28 • 5
Chain-of-Knowledge: Integrating Knowledge Reasoning into Large Language Models by Learning from Knowledge Graphs Paper • 2407.00653 • Published Jun 30 • 11 • 2
E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS Paper • 2406.18009 • Published Jun 26 • 18 • 3
RegMix: Data Mixture as Regression for Language Model Pre-training Paper • 2407.01492 • Published Jul 1 • 33 • 7
MMEvalPro: Calibrating Multimodal Benchmarks Towards Trustworthy and Efficient Evaluation Paper • 2407.00468 • Published Jun 29 • 35 • 2
Simulating Classroom Education with LLM-Empowered Agents Paper • 2406.19226 • Published Jun 27 • 28 • 9
MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression Paper • 2406.14909 • Published Jun 21 • 13 • 4
Long Code Arena: a Set of Benchmarks for Long-Context Code Models Paper • 2406.11612 • Published Jun 17 • 21 • 3
TroL: Traversal of Layers for Large Language and Vision Models Paper • 2406.12246 • Published Jun 18 • 34 • 2
DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence Paper • 2406.11931 • Published Jun 17 • 56 • 3
Designing a Dashboard for Transparency and Control of Conversational AI Paper • 2406.07882 • Published Jun 12 • 9 • 4
LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report Paper • 2405.00732 • Published Apr 29 • 118 • 9
OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework Paper • 2404.14619 • Published Apr 22 • 124 • 14
The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions Paper • 2404.13208 • Published Apr 19 • 38 • 9
MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies Paper • 2404.06395 • Published Apr 9 • 21 • 1
AutoWebGLM: Bootstrap And Reinforce A Large Language Model-based Web Navigating Agent Paper • 2404.03648 • Published Apr 4 • 23 • 3
PointInfinity: Resolution-Invariant Point Diffusion Models Paper • 2404.03566 • Published Apr 4 • 13 • 1
Learning to Decode Collaboratively with Multiple Language Models Paper • 2403.03870 • Published Mar 6 • 17 • 6
Orca-Math: Unlocking the potential of SLMs in Grade School Math Paper • 2402.14830 • Published Feb 16 • 24 • 3
OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset Paper • 2402.10176 • Published Feb 15 • 34 • 4
In Search of Needles in a 10M Haystack: Recurrent Memory Finds What LLMs Miss Paper • 2402.10790 • Published Feb 16 • 40 • 8
Premise Order Matters in Reasoning with Large Language Models Paper • 2402.08939 • Published Feb 14 • 24 • 3
BASE TTS: Lessons from building a billion-parameter Text-to-Speech model on 100K hours of data Paper • 2402.08093 • Published Feb 12 • 54 • 9
Direct Language Model Alignment from Online AI Feedback Paper • 2402.04792 • Published Feb 7 • 27 • 3
BiLLM: Pushing the Limit of Post-Training Quantization for LLMs Paper • 2402.04291 • Published Feb 6 • 48 • 3
ReGAL: Refactoring Programs to Discover Generalizable Abstractions Paper • 2401.16467 • Published Jan 29 • 8 • 2
Proactive Detection of Voice Cloning with Localized Watermarking Paper • 2401.17264 • Published Jan 30 • 16 • 4