stereoplegic
's Collections
Long context
updated
TRAMS: Training-free Memory Selection for Long-range Language Modeling
Paper
•
2310.15494
•
Published
•
1
A Long Way to Go: Investigating Length Correlations in RLHF
Paper
•
2310.03716
•
Published
•
9
YaRN: Efficient Context Window Extension of Large Language Models
Paper
•
2309.00071
•
Published
•
65
Giraffe: Adventures in Expanding Context Lengths in LLMs
Paper
•
2308.10882
•
Published
•
1
LM-Infinite: Simple On-the-Fly Length Generalization for Large Language
Models
Paper
•
2308.16137
•
Published
•
39
Scaling Transformer to 1M tokens and beyond with RMT
Paper
•
2304.11062
•
Published
•
2
Investigating Answerability of LLMs for Long-Form Question Answering
Paper
•
2309.08210
•
Published
•
12
DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme
Long Sequence Transformer Models
Paper
•
2309.14509
•
Published
•
17
LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models
Paper
•
2309.12307
•
Published
•
87
PoSE: Efficient Context Window Extension of LLMs via Positional
Skip-wise Training
Paper
•
2309.10400
•
Published
•
25
CLEX: Continuous Length Extrapolation for Large Language Models
Paper
•
2310.16450
•
Published
•
9
Code Llama: Open Foundation Models for Code
Paper
•
2308.12950
•
Published
•
22
CAT-LM: Training Language Models on Aligned Code And Tests
Paper
•
2310.01602
•
Published
•
1
LongBench: A Bilingual, Multitask Benchmark for Long Context
Understanding
Paper
•
2308.14508
•
Published
•
2
BTLM-3B-8K: 7B Parameter Performance in a 3B Parameter Model
Paper
•
2309.11568
•
Published
•
10
Paper
•
2309.03450
•
Published
•
8
Effective Long-Context Scaling of Foundation Models
Paper
•
2309.16039
•
Published
•
30
LongLLMLingua: Accelerating and Enhancing LLMs in Long Context Scenarios
via Prompt Compression
Paper
•
2310.06839
•
Published
•
3
Context Compression for Auto-regressive Transformers with Sentinel
Tokens
Paper
•
2310.08152
•
Published
•
1
Learning to Compress Prompts with Gist Tokens
Paper
•
2304.08467
•
Published
•
3
Long-range Language Modeling with Self-retrieval
Paper
•
2306.13421
•
Published
•
16
Can Retriever-Augmented Language Models Reason? The Blame Game Between
the Retriever and the Language Model
Paper
•
2212.09146
•
Published
•
3
Knowledge-Augmented Reasoning Distillation for Small Language Models in
Knowledge-Intensive Tasks
Paper
•
2305.18395
•
Published
•
1
LLM+P: Empowering Large Language Models with Optimal Planning
Proficiency
Paper
•
2304.11477
•
Published
•
3
SayCanPay: Heuristic Planning with Large Language Models using Learnable
Domain Knowledge
Paper
•
2308.12682
•
Published
•
2
Combiner: Full Attention Transformer with Sparse Computation Cost
Paper
•
2107.05768
•
Published
•
1
Paper
•
2203.08913
•
Published
•
2
Adapting Language Models to Compress Contexts
Paper
•
2305.14788
•
Published
•
1
Lost in the Middle: How Language Models Use Long Contexts
Paper
•
2307.03172
•
Published
•
36
L-Eval: Instituting Standardized Evaluation for Long Context Language
Models
Paper
•
2307.11088
•
Published
•
4
A Unified View of Long-Sequence Models towards Modeling Million-Scale
Dependencies
Paper
•
2302.06218
•
Published
•
1
Blockwise Parallel Transformer for Long Context Large Models
Paper
•
2305.19370
•
Published
•
3
Blockwise Self-Attention for Long Document Understanding
Paper
•
1911.02972
•
Published
•
1
LSG Attention: Extrapolation of pretrained Transformers to long
sequences
Paper
•
2210.15497
•
Published
•
1
Efficient Long-Text Understanding with Short-Text Models
Paper
•
2208.00748
•
Published
•
1
Cure the headache of Transformers via Collinear Constrained Attention
Paper
•
2309.08646
•
Published
•
12
Bird-Eye Transformers for Text Generation Models
Paper
•
2210.03985
•
Published
•
1
Memoria: Hebbian Memory Architecture for Human-Like Sequential
Processing
Paper
•
2310.03052
•
Published
•
1
Efficient Streaming Language Models with Attention Sinks
Paper
•
2309.17453
•
Published
•
13
LightSeq: Sequence Level Parallelism for Distributed Training of Long
Context Transformers
Paper
•
2310.03294
•
Published
•
2
Ultra-Long Sequence Distributed Transformer
Paper
•
2311.02382
•
Published
•
2
In-Context Pretraining: Language Modeling Beyond Document Boundaries
Paper
•
2310.10638
•
Published
•
28
Retrieval meets Long Context Large Language Models
Paper
•
2310.03025
•
Published
•
4
AWESOME: GPU Memory-constrained Long Document Summarization using Memory
Mechanism and Global Salient Content
Paper
•
2305.14806
•
Published
•
1
mLongT5: A Multilingual and Efficient Text-To-Text Transformer for
Longer Sequences
Paper
•
2305.11129
•
Published
•
2
LongT5: Efficient Text-To-Text Transformer for Long Sequences
Paper
•
2112.07916
•
Published
•
2
Unleashing Infinite-Length Input Capacity for Large-scale Language
Models with Self-Controlled Memory System
Paper
•
2304.13343
•
Published
•
1
Vcc: Scaling Transformers to 128K Tokens or More by Prioritizing
Important Tokens
Paper
•
2305.04241
•
Published
•
1
Are We Falling in a Middle-Intelligence Trap? An Analysis and Mitigation
of the Reversal Curse
Paper
•
2311.07468
•
Published
•
1
Never Lost in the Middle: Improving Large Language Models via Attention
Strengthening Question Answering
Paper
•
2311.09198
•
Published
•
3
SpanDrop: Simple and Effective Counterfactual Learning for Long
Sequences
Paper
•
2208.02169
•
Published
•
1
System 2 Attention (is something you might need too)
Paper
•
2311.11829
•
Published
•
39
Attention Sorting Combats Recency Bias In Long Context Language Models
Paper
•
2310.01427
•
Published
•
1
CoLT5: Faster Long-Range Transformers with Conditional Computation
Paper
•
2303.09752
•
Published
•
2
Cached Transformers: Improving Transformers with Differentiable Memory
Cache
Paper
•
2312.12742
•
Published
•
12
Axiomatic Preference Modeling for Longform Question Answering
Paper
•
2312.02206
•
Published
•
7
TextGenSHAP: Scalable Post-hoc Explanations in Text Generation with Long
Documents
Paper
•
2312.01279
•
Published
•
3
Extending Context Window of Large Language Models via Semantic
Compression
Paper
•
2312.09571
•
Published
•
12
Zebra: Extending Context Window with Layerwise Grouped Local-Global
Attention
Paper
•
2312.08618
•
Published
•
11
LongAlign: A Recipe for Long Context Alignment of Large Language Models
Paper
•
2401.18058
•
Published
•
21
Extending LLMs' Context Window with 100 Samples
Paper
•
2401.07004
•
Published
•
14
The What, Why, and How of Context Length Extension Techniques in Large
Language Models -- A Detailed Survey
Paper
•
2401.07872
•
Published
•
2
E^2-LLM: Efficient and Extreme Length Extension of Large Language Models
Paper
•
2401.06951
•
Published
•
24
Exploring Transformer Extrapolation
Paper
•
2307.10156
•
Published
•
1
Gated Linear Attention Transformers with Hardware-Efficient Training
Paper
•
2312.06635
•
Published
•
6
LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning
Paper
•
2401.01325
•
Published
•
26
LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens
Paper
•
2402.13753
•
Published
•
111
Training-Free Long-Context Scaling of Large Language Models
Paper
•
2402.17463
•
Published
•
19
LOCOST: State-Space Models for Long Document Abstractive Summarization
Paper
•
2401.17919
•
Published
Megalodon: Efficient LLM Pretraining and Inference with Unlimited
Context Length
Paper
•
2404.08801
•
Published
•
63
Orchid: Flexible and Data-Dependent Convolution for Sequence Modeling
Paper
•
2402.18508
•
Published
HMT: Hierarchical Memory Transformer for Long Context Language
Processing
Paper
•
2405.06067
•
Published
•
1
LLoCO: Learning Long Contexts Offline
Paper
•
2404.07979
•
Published
•
20
LongHeads: Multi-Head Attention is Secretly a Long Context Processor
Paper
•
2402.10685
•
Published
•
1
XL3M: A Training-free Framework for LLM Length Extension Based on
Segment-wise Inference
Paper
•
2405.17755
•
Published
Base of RoPE Bounds Context Length
Paper
•
2405.14591
•
Published
SinkLoRA: Enhanced Efficiency and Chat Capabilities for Long-Context
Large Language Models
Paper
•
2406.05678
•
Published
LongSkywork: A Training Recipe for Efficiently Extending Context Length
in Large Language Models
Paper
•
2406.00605
•
Published
•
2
Equipping Transformer with Random-Access Reading for Long-Context
Understanding
Paper
•
2405.13216
•
Published
THEANINE: Revisiting Memory Management in Long-term Conversations with
Timeline-augmented Response Generation
Paper
•
2406.10996
•
Published
•
32
InfLLM: Unveiling the Intrinsic Capacity of LLMs for Understanding
Extremely Long Sequences with Training-Free Memory
Paper
•
2402.04617
•
Published
•
4
Farewell to Length Extrapolation, a Training-Free Infinite Context with
Finite Attention Scope
Paper
•
2407.15176
•
Published
RazorAttention: Efficient KV Cache Compression Through Retrieval Heads
Paper
•
2407.15891
•
Published
Writing in the Margins: Better Inference Pattern for Long Context
Retrieval
Paper
•
2408.14906
•
Published
•
138
Retrieval Augmented Generation or Long-Context LLMs? A Comprehensive
Study and Hybrid Approach
Paper
•
2407.16833
•
Published
ReMamba: Equip Mamba with Effective Long-Sequence Modeling
Paper
•
2408.15496
•
Published
•
10
General-purpose, long-context autoregressive modeling with Perceiver AR
Paper
•
2202.07765
•
Published
MINI-SEQUENCE TRANSFORMER: Optimizing Intermediate Memory for Long
Sequences Training
Paper
•
2407.15892
•
Published
ContextCite: Attributing Model Generation to Context
Paper
•
2409.00729
•
Published
•
13
UIO-LLMs: Unbiased Incremental Optimization for Long-Context LLMs
Paper
•
2406.18173
•
Published
MemLong: Memory-Augmented Retrieval for Long Text Modeling
Paper
•
2408.16967
•
Published
•
2
Efficient LLM Training and Serving with Heterogeneous Context Sharding
among Attention Heads
Paper
•
2407.17678
•
Published
E2LLM: Encoder Elongated Large Language Models for Long-Context
Understanding and Reasoning
Paper
•
2409.06679
•
Published
InfiniGen: Efficient Generative Inference of Large Language Models with
Dynamic KV Cache Management
Paper
•
2406.19707
•
Published