-
Bootstrapping Language Models with DPO Implicit Rewards
Paper • 2406.09760 • Published • 37 -
DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence
Paper • 2406.11931 • Published • 56 -
Prism: A Framework for Decoupling and Assessing the Capabilities of VLMs
Paper • 2406.14544 • Published • 34 -
Instruction Pre-Training: Language Models are Supervised Multitask Learners
Paper • 2406.14491 • Published • 85
Collections
Discover the best community collections!
Collections including paper arxiv:2407.01370
-
How Do Large Language Models Acquire Factual Knowledge During Pretraining?
Paper • 2406.11813 • Published • 29 -
From RAGs to rich parameters: Probing how language models utilize external knowledge over parametric information for factual queries
Paper • 2406.12824 • Published • 20 -
Tokenization Falling Short: The Curse of Tokenization
Paper • 2406.11687 • Published • 15 -
Iterative Length-Regularized Direct Preference Optimization: A Case Study on Improving 7B Language Models to GPT-4 Level
Paper • 2406.11817 • Published • 13
-
LLoCO: Learning Long Contexts Offline
Paper • 2404.07979 • Published • 19 -
LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens
Paper • 2402.13753 • Published • 111 -
LongAgent: Scaling Language Models to 128k Context through Multi-Agent Collaboration
Paper • 2402.11550 • Published • 15 -
LongAlign: A Recipe for Long Context Alignment of Large Language Models
Paper • 2401.18058 • Published • 21
-
Same Task, More Tokens: the Impact of Input Length on the Reasoning Performance of Large Language Models
Paper • 2402.14848 • Published • 18 -
The Prompt Report: A Systematic Survey of Prompting Techniques
Paper • 2406.06608 • Published • 52 -
CRAG -- Comprehensive RAG Benchmark
Paper • 2406.04744 • Published • 40 -
Transformers meet Neural Algorithmic Reasoners
Paper • 2406.09308 • Published • 43
-
Attention Is All You Need
Paper • 1706.03762 • Published • 41 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 14 -
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Paper • 1910.01108 • Published • 14 -
Language Models are Few-Shot Learners
Paper • 2005.14165 • Published • 11
-
The Unreasonable Ineffectiveness of the Deeper Layers
Paper • 2403.17887 • Published • 77 -
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models
Paper • 2404.02258 • Published • 104 -
ReFT: Representation Finetuning for Language Models
Paper • 2404.03592 • Published • 89 -
Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences
Paper • 2404.03715 • Published • 60
-
Unlocking the conversion of Web Screenshots into HTML Code with the WebSight Dataset
Paper • 2403.09029 • Published • 54 -
LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression
Paper • 2403.12968 • Published • 24 -
RAFT: Adapting Language Model to Domain Specific RAG
Paper • 2403.10131 • Published • 66 -
Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking
Paper • 2403.09629 • Published • 72