Attention in LLM - a imamnurby Collection

imamnurby 's Collections

Long Sequences for LLM

Attention in LLM

Graph Neural Network

MoE

General Purpose LLM

Continual Training

Chain of Thought

Instruction Tuning

Attention in LLM

updated Mar 16

Simple linear attention language models balance the recall-throughput tradeoff

Paper • 2402.18668 • Published Feb 28 • 18
BurstAttention: An Efficient Distributed Attention Framework for Extremely Long Sequences

Paper • 2403.09347 • Published Mar 14 • 20