ChunkAttention: Efficient Self-Attention with Prefix-Aware KV Cache and Two-Phase Partition Paper • 2402.15220 • Published Feb 23 • 19
GPTVQ: The Blessing of Dimensionality for LLM Quantization Paper • 2402.15319 • Published Feb 23 • 19
DiLightNet: Fine-grained Lighting Control for Diffusion-based Image Generation Paper • 2402.11929 • Published Feb 19 • 9
MobiLlama: Towards Accurate and Lightweight Fully Transparent GPT Paper • 2402.16840 • Published Feb 26 • 23
Greedy Growing Enables High-Resolution Pixel-Based Diffusion Models Paper • 2405.16759 • Published May 27 • 7
Trans-LoRA: towards data-free Transferable Parameter Efficient Finetuning Paper • 2405.17258 • Published May 27 • 14
Perplexed by Perplexity: Perplexity-Based Data Pruning With Small Reference Models Paper • 2405.20541 • Published May 30 • 20
Turbo Sparse: Achieving LLM SOTA Performance with Minimal Activated Parameters Paper • 2406.05955 • Published Jun 10 • 22