LLM - a paisleypark Collection

paisleypark 's Collections

LLM

LLM

updated Jun 11

Order Matters in the Presence of Dataset Imbalance for Multilingual Learning

Paper • 2312.06134 • Published Dec 11, 2023 • 2
Efficient Monotonic Multihead Attention

Paper • 2312.04515 • Published Dec 7, 2023 • 6
Contrastive Decoding Improves Reasoning in Large Language Models

Paper • 2309.09117 • Published Sep 17, 2023 • 37
Exploring Format Consistency for Instruction Tuning

Paper • 2307.15504 • Published Jul 28, 2023 • 7
Learning Universal Predictors

Paper • 2401.14953 • Published Jan 26 • 18
EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty

Paper • 2401.15077 • Published Jan 26 • 18
SliceGPT: Compress Large Language Models by Deleting Rows and Columns

Paper • 2401.15024 • Published Jan 26 • 68
Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities

Paper • 2401.14405 • Published Jan 25 • 11
Deconstructing Denoising Diffusion Models for Self-Supervised Learning

Paper • 2401.14404 • Published Jan 25 • 16
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data

Paper • 2401.10891 • Published Jan 19 • 58
Time is Encoded in the Weights of Finetuned Language Models

Paper • 2312.13401 • Published Dec 20, 2023 • 19
Unsupervised Universal Image Segmentation

Paper • 2312.17243 • Published Dec 28, 2023 • 19
Reasons to Reject? Aligning Language Models with Judgments

Paper • 2312.14591 • Published Dec 22, 2023 • 17
Unlocking Pre-trained Image Backbones for Semantic Image Synthesis

Paper • 2312.13314 • Published Dec 20, 2023 • 7
Cached Transformers: Improving Transformers with Differentiable Memory Cache

Paper • 2312.12742 • Published Dec 20, 2023 • 12
In-Context Learning Creates Task Vectors

Paper • 2310.15916 • Published Oct 24, 2023 • 41
Controlled Decoding from Language Models

Paper • 2310.17022 • Published Oct 25, 2023 • 14
CapsFusion: Rethinking Image-Text Data at Scale

Paper • 2310.20550 • Published Oct 31, 2023 • 25
Tell Your Model Where to Attend: Post-hoc Attention Steering for LLMs

Paper • 2311.02262 • Published Nov 3, 2023 • 10
Memory Augmented Language Models through Mixture of Word Experts

Paper • 2311.10768 • Published Nov 15, 2023 • 16
SAM-CLIP: Merging Vision Foundation Models towards Semantic and Spatial Understanding

Paper • 2310.15308 • Published Oct 23, 2023 • 22
An Image is Worth Multiple Words: Learning Object Level Concepts using Multi-Concept Prompt Learning

Paper • 2310.12274 • Published Oct 18, 2023 • 11
Language Modeling Is Compression

Paper • 2309.10668 • Published Sep 19, 2023 • 82
Finite Scalar Quantization: VQ-VAE Made Simple

Paper • 2309.15505 • Published Sep 27, 2023 • 21
Vision Transformers Need Registers

Paper • 2309.16588 • Published Sep 28, 2023 • 77
SLiMe: Segment Like Me

Paper • 2309.03179 • Published Sep 6, 2023 • 29
Gated recurrent neural networks discover attention

Paper • 2309.01775 • Published Sep 4, 2023 • 7
One Wide Feedforward is All You Need

Paper • 2309.01826 • Published Sep 4, 2023 • 31
Semantic-SAM: Segment and Recognize Anything at Any Granularity

Paper • 2307.04767 • Published Jul 10, 2023 • 21
Scaling MLPs: A Tale of Inductive Bias

Paper • 2306.13575 • Published Jun 23, 2023 • 14
MSViT: Dynamic Mixed-Scale Tokenization for Vision Transformers

Paper • 2307.02321 • Published Jul 5, 2023 • 7
CRAG -- Comprehensive RAG Benchmark

Paper • 2406.04744 • Published Jun 7 • 41