Large Language Model Unlearning via Embedding-Corrupted Prompts Paper • 2406.07933 • Published Jun 12 • 7
Block Transformer: Global-to-Local Language Modeling for Fast Inference Paper • 2406.02657 • Published Jun 4 • 37
Learn Beyond The Answer: Training Language Models with Reflection for Mathematical Reasoning Paper • 2406.12050 • Published Jun 17 • 19
How Do Large Language Models Acquire Factual Knowledge During Pretraining? Paper • 2406.11813 • Published Jun 17 • 30
The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale Paper • 2406.17557 • Published Jun 25 • 86
Aligning Teacher with Student Preferences for Tailored Training Data Generation Paper • 2406.19227 • Published Jun 27 • 24
MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression Paper • 2406.14909 • Published Jun 21 • 14
To Forget or Not? Towards Practical Knowledge Unlearning for Large Language Models Paper • 2407.01920 • Published Jul 2 • 13
Q-Sparse: All Large Language Models can be Fully Sparsely-Activated Paper • 2407.10969 • Published Jul 15 • 20
Refuse Whenever You Feel Unsafe: Improving Safety in LLMs via Decoupled Refusal Training Paper • 2407.09121 • Published Jul 12 • 5
Phi-3 Safety Post-Training: Aligning Language Models with a "Break-Fix" Cycle Paper • 2407.13833 • Published Jul 18 • 11
RAG Foundry: A Framework for Enhancing LLMs for Retrieval Augmented Generation Paper • 2408.02545 • Published Aug 5 • 35
CoverBench: A Challenging Benchmark for Complex Claim Verification Paper • 2408.03325 • Published Aug 6 • 14
Better Alignment with Instruction Back-and-Forth Translation Paper • 2408.04614 • Published Aug 8 • 14
Transformer Explainer: Interactive Learning of Text-Generative Models Paper • 2408.04619 • Published Aug 8 • 155
To Code, or Not To Code? Exploring Impact of Code in Pre-training Paper • 2408.10914 • Published Aug 20 • 41
ReMamba: Equip Mamba with Effective Long-Sequence Modeling Paper • 2408.15496 • Published Aug 28 • 10
Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers Paper • 2409.04109 • Published Sep 6 • 43
CORAL: Benchmarking Multi-turn Conversational Retrieval-Augmentation Generation Paper • 2410.23090 • Published Oct 30 • 53
Can Language Models Replace Programmers? REPOCOD Says 'Not Yet' Paper • 2410.21647 • Published Oct 29 • 16
LongReward: Improving Long-context Large Language Models with AI Feedback Paper • 2410.21252 • Published Oct 28 • 16
Hymba: A Hybrid-head Architecture for Small Language Models Paper • 2411.13676 • Published 21 days ago • 38