-
MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention
Paper • 2407.02490 • Published • 20 -
Can Few-shot Work in Long-Context? Recycling the Context to Generate Demonstrations
Paper • 2406.13632 • Published • 5 -
LongRAG: Enhancing Retrieval-Augmented Generation with Long-context LLMs
Paper • 2406.15319 • Published • 56 -
Make Your LLM Fully Utilize the Context
Paper • 2404.16811 • Published • 52
Collections
Discover the best community collections!
Collections including paper arxiv:2404.02060
-
Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length
Paper • 2404.08801 • Published • 62 -
Ring Attention with Blockwise Transformers for Near-Infinite Context
Paper • 2310.01889 • Published • 9 -
World Model on Million-Length Video And Language With RingAttention
Paper • 2402.08268 • Published • 35 -
Scaling Transformer to 1M tokens and beyond with RMT
Paper • 2304.11062 • Published • 2
-
Compression Represents Intelligence Linearly
Paper • 2404.09937 • Published • 27 -
MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies
Paper • 2404.06395 • Published • 18 -
Long-context LLMs Struggle with Long In-context Learning
Paper • 2404.02060 • Published • 33 -
Are large language models superhuman chemists?
Paper • 2404.01475 • Published • 15
-
Long-context LLMs Struggle with Long In-context Learning
Paper • 2404.02060 • Published • 33 -
Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks
Paper • 2211.12588 • Published • 3 -
StructLM: Towards Building Generalist Models for Structured Knowledge Grounding
Paper • 2402.16671 • Published • 26 -
Chinese Tiny LLM: Pretraining a Chinese-Centric Large Language Model
Paper • 2404.04167 • Published • 8
-
Long-context LLMs Struggle with Long In-context Learning
Paper • 2404.02060 • Published • 33 -
Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks
Paper • 2211.12588 • Published • 3 -
StructLM: Towards Building Generalist Models for Structured Knowledge Grounding
Paper • 2402.16671 • Published • 26 -
Chinese Tiny LLM: Pretraining a Chinese-Centric Large Language Model
Paper • 2404.04167 • Published • 8