Minsoo Kim's picture

1 12 5

Minsoo Kim

minsoo2333

https://marsjacobs.github.io

AI & ML interests

LLM compression

Recent Activity

authored a paper about 2 months ago

Enhancing Computation Efficiency in Large Language Models through Weight and Activation Quantization

authored a paper about 2 months ago

Improving Conversational Abilities of Quantized Large Language Models via Direct Preference Alignment

authored a paper about 2 months ago

InfiniPot: Infinite Context Processing on Memory-Constrained LLMs

View all activity

Organizations

None yet

minsoo2333's activity

upvoted a paper about 2 months ago

A Controlled Study on Long Context Extension and Generalization in LLMs

Paper • 2409.12181 • Published Sep 18 • 43

upvoted 3 papers 4 months ago

Finch: Prompt-guided Key-Value Cache Compression

Paper • 2408.00167 • Published Jul 31 • 13

Characterizing Prompt Compression Methods for Long Context Inference

Paper • 2407.08892 • Published Jul 11 • 9

LazyLLM: Dynamic Token Pruning for Efficient Long Context LLM Inference

Paper • 2407.14057 • Published Jul 19 • 44

upvoted a collection 5 months ago

Gradient's Long Context Models

6 items • Updated Jun 13 • 2

upvoted a paper 5 months ago

MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention

Paper • 2407.02490 • Published Jul 2 • 23

upvoted a paper 6 months ago

Block Transformer: Global-to-Local Language Modeling for Fast Inference

Paper • 2406.02657 • Published Jun 4 • 37

upvoted a paper 7 months ago

TransformerFAM: Feedback attention is working memory

Paper • 2404.09173 • Published Apr 14 • 43

upvoted a collection 7 months ago

Meta Llama 3

This collection hosts the transformers and original repos of the Meta Llama 3 and Llama Guard 2 releases • 5 items • Updated Sep 25 • 683

upvoted a paper 7 months ago

LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens

Paper • 2402.13753 • Published Feb 21 • 112

upvoted a paper 9 months ago

Speculative Streaming: Fast LLM Inference without Auxiliary Models

Paper • 2402.11131 • Published Feb 16 • 42

upvoted a paper about 1 year ago

QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models

Paper • 2309.14717 • Published Sep 26, 2023 • 44