The FinBen: An Holistic Financial Benchmark for Large Language Models Paper • 2402.12659 • Published Feb 20 • 16
TofuEval: Evaluating Hallucinations of LLMs on Topic-Focused Dialogue Summarization Paper • 2402.13249 • Published Feb 20 • 10
MEGAVERSE: Benchmarking Large Language Models Across Languages, Modalities, Models and Tasks Paper • 2311.07463 • Published Nov 13, 2023 • 13
MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers Paper • 2305.07185 • Published May 12, 2023 • 9
QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models Paper • 2309.14717 • Published Sep 26, 2023 • 43
PEFTDebias : Capturing debiasing information using PEFTs Paper • 2312.00434 • Published Dec 1, 2023 • 1
From PEFT to DEFT: Parameter Efficient Finetuning for Reducing Activation Density in Transformers Paper • 2402.01911 • Published Feb 2 • 2
Empirical Study of PEFT techniques for Winter Wheat Segmentation Paper • 2310.01825 • Published Oct 3, 2023 • 2
LoRA: Low-Rank Adaptation of Large Language Models Paper • 2106.09685 • Published Jun 17, 2021 • 29
L4Q: Parameter Efficient Quantization-Aware Training on Large Language Models via LoRA-wise LSQ Paper • 2402.04902 • Published Feb 7 • 4
Self-Instruct: Aligning Language Model with Self Generated Instructions Paper • 2212.10560 • Published Dec 20, 2022 • 7
Efficient Training of Language Models to Fill in the Middle Paper • 2207.14255 • Published Jul 28, 2022 • 1
When Scaling Meets LLM Finetuning: The Effect of Data, Model and Finetuning Method Paper • 2402.17193 • Published Feb 27 • 23
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits Paper • 2402.17764 • Published Feb 27 • 592
DiffuseKronA: A Parameter Efficient Fine-tuning Method for Personalized Diffusion Model Paper • 2402.17412 • Published Feb 27 • 21
MobiLlama: Towards Accurate and Lightweight Fully Transparent GPT Paper • 2402.16840 • Published Feb 26 • 23
LLM Comparator: Visual Analytics for Side-by-Side Evaluation of Large Language Models Paper • 2402.10524 • Published Feb 16 • 21
FinTral: A Family of GPT-4 Level Multimodal Financial Large Language Models Paper • 2402.10986 • Published Feb 16 • 76
Beyond Language Models: Byte Models are Digital World Simulators Paper • 2402.19155 • Published Feb 29 • 49
Training-Free Long-Context Scaling of Large Language Models Paper • 2402.17463 • Published Feb 27 • 19
Design2Code: How Far Are We From Automating Front-End Engineering? Paper • 2403.03163 • Published Mar 5 • 93
EasyQuant: An Efficient Data-free Quantization Algorithm for LLMs Paper • 2403.02775 • Published Mar 5 • 11
OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset Paper • 2402.10176 • Published Feb 15 • 34
OneBit: Towards Extremely Low-bit Large Language Models Paper • 2402.11295 • Published Feb 17 • 22
Adding NVMe SSDs to Enable and Accelerate 100B Model Fine-tuning on a Single GPU Paper • 2403.06504 • Published Mar 11 • 53
An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models Paper • 2403.06764 • Published Mar 11 • 25
V3D: Video Diffusion Models are Effective 3D Generators Paper • 2403.06738 • Published Mar 11 • 28
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context Paper • 2403.05530 • Published Mar 8 • 59
MoAI: Mixture of All Intelligence for Large Language and Vision Models Paper • 2403.07508 • Published Mar 12 • 75
GiT: Towards Generalist Vision Transformer through Universal Language Interface Paper • 2403.09394 • Published Mar 14 • 25
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training Paper • 2403.09611 • Published Mar 14 • 124
Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking Paper • 2403.09629 • Published Mar 14 • 72
ORPO: Monolithic Preference Optimization without Reference Model Paper • 2403.07691 • Published Mar 12 • 60
LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models Paper • 2403.13372 • Published Mar 20 • 58
Matryoshka: Stealing Functionality of Private ML Data by Hiding Models in Model Paper • 2206.14371 • Published Jun 29, 2022 • 3
Model Stock: All we need is just a few fine-tuned models Paper • 2403.19522 • Published Mar 28 • 10
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models Paper • 2404.02258 • Published Apr 2 • 103
Long-context LLMs Struggle with Long In-context Learning Paper • 2404.02060 • Published Apr 2 • 34
Found in the Middle: How Language Models Use Long Contexts Better via Plug-and-Play Positional Encoding Paper • 2403.04797 • Published Mar 5 • 1
Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM Paper • 2403.07816 • Published Mar 12 • 39
OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models Paper • 2402.01739 • Published Jan 29 • 26
Aurora-M: The First Open Source Multilingual Language Model Red-teamed according to the U.S. Executive Order Paper • 2404.00399 • Published Mar 30 • 40
Latxa: An Open Language Model and Evaluation Suite for Basque Paper • 2403.20266 • Published Mar 29 • 3
LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders Paper • 2404.05961 • Published Apr 9 • 63
A Review of Modern Recommender Systems Using Generative Models (Gen-RecSys) Paper • 2404.00579 • Published Mar 31 • 2
RoFormer: Enhanced Transformer with Rotary Position Embedding Paper • 2104.09864 • Published Apr 20, 2021 • 9
Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences Paper • 2404.03715 • Published Apr 4 • 59
Parameter Efficient Fine Tuning: A Comprehensive Analysis Across Applications Paper • 2404.13506 • Published Apr 21 • 1
Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models Paper • 2405.01535 • Published May 2 • 114
Better & Faster Large Language Models via Multi-token Prediction Paper • 2404.19737 • Published Apr 30 • 73
Granite Code Models: A Family of Open Foundation Models for Code Intelligence Paper • 2405.04324 • Published May 7 • 21
Stylus: Automatic Adapter Selection for Diffusion Models Paper • 2404.18928 • Published Apr 29 • 14
Beyond Scaling Laws: Understanding Transformer Performance with Associative Memory Paper • 2405.08707 • Published May 14 • 27
ALPINE: Unveiling the Planning Capability of Autoregressive Learning in Language Models Paper • 2405.09220 • Published May 15 • 24
DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data Paper • 2405.14333 • Published May 23 • 32
OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework Paper • 2405.11143 • Published May 20 • 33
CS-Bench: A Comprehensive Benchmark for Large Language Models towards Computer Science Mastery Paper • 2406.08587 • Published Jun 12 • 15
Blending Is All You Need: Cheaper, Better Alternative to Trillion-Parameters LLM Paper • 2401.02994 • Published Jan 4 • 47
Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8B Paper • 2406.07394 • Published Jun 11 • 21
The Prompt Report: A Systematic Survey of Prompting Techniques Paper • 2406.06608 • Published Jun 6 • 52
How Do Large Language Models Acquire Factual Knowledge During Pretraining? Paper • 2406.11813 • Published Jun 17 • 29
GLiNER multi-task: Generalist Lightweight Model for Various Information Extraction Tasks Paper • 2406.12925 • Published Jun 14 • 22
Self-MoE: Towards Compositional Large Language Models with Self-Specialized Experts Paper • 2406.12034 • Published Jun 17 • 13
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models Paper • 2402.03300 • Published Feb 5 • 67
LiveMind: Low-latency Large Language Models with Simultaneous Inference Paper • 2406.14319 • Published Jun 20 • 14
Extreme Compression of Large Language Models via Additive Quantization Paper • 2401.06118 • Published Jan 11 • 12
Math-Shepherd: Verify and Reinforce LLMs Step-by-step without Human Annotations Paper • 2312.08935 • Published Dec 14, 2023 • 4
Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data Paper • 2406.14546 • Published Jun 20 • 1
HyperZcdotZcdotW Operator Connects Slow-Fast Networks for Full Context Interaction Paper • 2401.17948 • Published Jan 31 • 2
Grokfast: Accelerated Grokking by Amplifying Slow Gradients Paper • 2405.20233 • Published May 30 • 5
DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models Paper • 2309.03883 • Published Sep 7, 2023 • 33
SpreadsheetLLM: Encoding Spreadsheets for Large Language Models Paper • 2407.09025 • Published Jul 12 • 123
NeedleBench: Can LLMs Do Retrieval and Reasoning in 1 Million Context Window? Paper • 2407.11963 • Published Jul 16 • 43
Fast Matrix Multiplications for Lookup Table-Quantized LLMs Paper • 2407.10960 • Published Jul 15 • 11
TaskGen: A Task-Based, Memory-Infused Agentic Framework using StrictJSON Paper • 2407.15734 • Published Jul 22 • 1
AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agents Paper • 2407.18901 • Published Jul 26 • 31
Trace is the New AutoDiff -- Unlocking Efficient Optimization of Computational Workflows Paper • 2406.16218 • Published Jun 23 • 1
The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale Paper • 2406.17557 • Published Jun 25 • 84
Probabilistic Programming with Programmable Variational Inference Paper • 2406.15742 • Published Jun 22 • 2
Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing Paper • 2406.08464 • Published Jun 12 • 62
Improving Retrieval Augmented Language Model with Self-Reasoning Paper • 2407.19813 • Published Jul 29 • 6
Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers Paper • 2408.06195 • Published Aug 12 • 57
Training Language Models on the Knowledge Graph: Insights on Hallucinations and Their Detectability Paper • 2408.07852 • Published Aug 14 • 14
Writing in the Margins: Better Inference Pattern for Long Context Retrieval Paper • 2408.14906 • Published Aug 27 • 137
Advancing LLM Reasoning Generalists with Preference Trees Paper • 2404.02078 • Published Apr 2 • 43
SciLitLLM: How to Adapt LLMs for Scientific Literature Understanding Paper • 2408.15545 • Published Aug 28 • 32
Draw an Audio: Leveraging Multi-Instruction for Video-to-Audio Synthesis Paper • 2409.06135 • Published 22 days ago • 14
OneGen: Efficient One-Pass Unified Generation and Retrieval for LLMs Paper • 2409.05152 • Published 23 days ago • 29
Scaling Synthetic Data Creation with 1,000,000,000 Personas Paper • 2406.20094 • Published Jun 28 • 93
Training Language Models to Self-Correct via Reinforcement Learning Paper • 2409.12917 • Published 12 days ago • 126
Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters Paper • 2408.03314 • Published Aug 6 • 33
Self-Reflection in LLM Agents: Effects on Problem-Solving Performance Paper • 2405.06682 • Published May 5 • 2
Improve Mathematical Reasoning in Language Models by Automated Process Supervision Paper • 2406.06592 • Published Jun 5 • 23
3DTopia-XL: Scaling High-quality 3D Asset Generation via Primitive Diffusion Paper • 2409.12957 • Published 12 days ago • 17
Prithvi WxC: Foundation Model for Weather and Climate Paper • 2409.13598 • Published 11 days ago • 32
A Preliminary Study of o1 in Medicine: Are We Closer to an AI Doctor? Paper • 2409.15277 • Published 8 days ago • 33
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models Paper • 2409.17146 • Published 6 days ago • 85