Literature review on transformer architecture and what followed.
-
Universal Language Model Fine-tuning for Text Classification
Paper • 1801.06146 • Published • 6 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 14 -
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
Paper • 2205.14135 • Published • 11 -
SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing
Paper • 1808.06226 • Published • 1