stereoplegic ()))?!?((() – Community Activity

commented 2 papers 2 days ago

CritiPrefill: A Segment-wise Criticality-based Approach for Prefilling Acceleration in LLMs

Paper • 2409.12490 • Published 13 days ago •

2

Inference-Friendly Models With MixAttention

Paper • 2409.15012 • Published 9 days ago •

2

commented a paper 4 days ago

InfiniGen: Efficient Generative Inference of Large Language Models with Dynamic KV Cache Management

Paper • 2406.19707 • Published Jun 28 •

2

commented 3 papers 7 days ago

Expediting and Elevating Large Language Model Reasoning via Hidden Chain-of-Thought Decoding

Paper • 2409.08561 • Published 19 days ago •

2

Diver: Large Language Model Decoding with Span-Level Mutual Information Verification

Paper • 2406.02120 • Published Jun 4 •

2

EMS-SD: Efficient Multi-sample Speculative Decoding for Accelerating Large Language Models

Paper • 2405.07542 • Published May 13 •

2

commented 8 papers 8 days ago

PipeInfer: Accelerating LLM Inference using Asynchronous Pipelined Speculation

Paper • 2407.11798 • Published Jul 16 • 1 •

2

Turning Trash into Treasure: Accelerating Inference of Large Language Models with Token Recycling

Paper • 2408.08696 • Published Aug 16 •

2

KOALA: Enhancing Speculative Decoding for LLM via Multi-Layer Draft Heads with Adversarial Learning

Paper • 2408.08146 • Published Aug 15 •

2

Parallel Decoding via Hidden Transfer for Lossless Large Language Model Acceleration

Paper • 2404.12022 • Published Apr 18 •

2

Accelerating Inference in Large Language Models with a Unified Layer Skipping Strategy

Paper • 2404.06954 • Published Apr 10 •

2

commented 2 papers 9 days ago

GrootVL: Tree Topology is All You Need in State Space Model

Paper • 2406.02395 • Published Jun 4 •

2

MoDeGPT: Modular Decomposition for Large Language Model Compression

Paper • 2408.09632 • Published Aug 19 •

2

commented 2 papers 10 days ago

InstInfer: In-Storage Attention Offloading for Cost-Effective Long-Context LLM Inference

Paper • 2409.04992 • Published 24 days ago • 1 •

2

Palu: Compressing KV-Cache with Low-Rank Projection

Paper • 2407.21118 • Published Jul 30 • 1 •

2

commented 5 papers 18 days ago

Post-Training Sparse Attention with Double Sparsity

Paper • 2408.07092 • Published Aug 11 •

2

Efficient LLM Training and Serving with Heterogeneous Context Sharding among Attention Heads

Paper • 2407.17678 • Published Jul 25 •

2

MemLong: Memory-Augmented Retrieval for Long Text Modeling

Paper • 2408.16967 • Published Aug 30 • 2 •

2

E2LLM: Encoder Elongated Large Language Models for Long-Context Understanding and Reasoning

Paper • 2409.06679 • Published 22 days ago •

2

UIO-LLMs: Unbiased Incremental Optimization for Long-Context LLMs

Paper • 2406.18173 • Published Jun 26 •

2

commented a paper 22 days ago

Eigen Attention: Attention in Low-Rank Space for KV Cache Compression

Paper • 2408.05646 • Published Aug 10 •

2

commented 3 papers 23 days ago

On the Benefits of Rank in Attention Layers

Paper • 2407.16153 • Published Jul 23 •

2

Weighted Grouped Query Attention in Transformers

Paper • 2407.10855 • Published Jul 15 •

2

Theory, Analysis, and Best Practices for Sigmoid Self-Attention

Paper • 2409.04431 • Published 26 days ago • 1 •

2

commented 2 papers about 1 month ago

Retrieval Augmented Generation or Long-Context LLMs? A Comprehensive Study and Hybrid Approach

Paper • 2407.16833 • Published Jul 23 •

2

MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding

Paper • 2408.11049 • Published Aug 20 • 10 •

3

commented 8 papers about 2 months ago

σ-GPTs: A New Approach to Autoregressive Models

Paper • 2404.09562 • Published Apr 15 • 4 •

2

Parallelizing Autoregressive Generation with Variational State Space Models

Paper • 2407.08415 • Published Jul 11 •

2

Transformer Language Models without Positional Encodings Still Learn Positional Information

Paper • 2203.16634 • Published Mar 30, 2022 • 5 •

2

Multi-Word Tokenization for Sequence Compression

Paper • 2402.09949 • Published Feb 15 •

2

Position Prediction as an Effective Pretraining Strategy

Paper • 2207.07611 • Published Jul 15, 2022 • 1 •

2

HyperAttention: Long-context Attention in Near-Linear Time

Paper • 2310.05869 • Published Oct 9, 2023 • 2 •

2

IceFormer: Accelerated Inference with Long-Sequence Transformers on CPUs

Paper • 2405.02842 • Published May 5 • 1 •

2

Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters

Paper • 2408.04093 • Published Aug 7 • 4 •

2

commented 23 papers 2 months ago

RazorAttention: Efficient KV Cache Compression Through Retrieval Heads

Paper • 2407.15891 • Published Jul 22 •

2

SampleAttention: Near-Lossless Acceleration of Long Context LLM Inference with Adaptive Structured Sparse Attention

Paper • 2406.15486 • Published Jun 17 •

2

LongHeads: Multi-Head Attention is Secretly a Long Context Processor

Paper • 2402.10685 • Published Feb 16 • 1 •

2

Farewell to Length Extrapolation, a Training-Free Infinite Context with Finite Attention Scope

Paper • 2407.15176 • Published Jul 21 •

2

InfLLM: Unveiling the Intrinsic Capacity of LLMs for Understanding Extremely Long Sequences with Training-Free Memory

Paper • 2402.04617 • Published Feb 7 • 4 •

2

Multi-Scale VMamba: Hierarchy in Hierarchy Visual State Space Model

Paper • 2405.14174 • Published May 23 •

2

Longhorn: State Space Models are Amortized Online Learners

Paper • 2407.14207 • Published Jul 19 • 16 •

3

Mixture of Nested Experts: Adaptive Processing of Visual Tokens

Paper • 2407.19985 • Published Jul 29 • 34 •

4

RecycleGPT: An Autoregressive Language Model with Recyclable Module

Paper • 2308.03421 • Published Aug 7, 2023 • 7 •

2

SEED: Accelerating Reasoning Tree Construction via Scheduled Speculative Decoding

Paper • 2406.18200 • Published Jun 26 •

2

$\text{Memory}^3$: Language Modeling with Explicit Memory

Paper • 2407.01178 • Published Jul 1 • 3 •

2

Efficient Transformers with Dynamic Token Pooling

Paper • 2211.09761 • Published Nov 17, 2022 •

2

Crafting the Path: Robust Query Rewriting for Information Retrieval

Paper • 2407.12529 • Published Jul 17 •

2

Conversational Query Reformulation with the Guidance of Retrieved Documents

Paper • 2407.12363 • Published Jul 17 •

2

CHIQ: Contextual History Enhancement for Improving Query Rewriting in Conversational Search

Paper • 2406.05013 • Published Jun 7 •

2

Adaptive Query Rewriting: Aligning Rewriters through Marginal Probability of Conversational Answers

Paper • 2406.10991 • Published Jun 16 •

2

Factual Dialogue Summarization via Learning from Large Language Models

Paper • 2406.14709 • Published Jun 20 •

2

AdaCQR: Enhancing Query Reformulation for Conversational Search via Sparse and Dense Retrieval Alignment

Paper • 2407.01965 • Published Jul 2 •

2

Automatically Generating Numerous Context-Driven SFT Data for LLMs across Diverse Granularity

Paper • 2405.16579 • Published May 26 •

2

Raw Text is All you Need: Knowledge-intensive Multi-turn Instruction Tuning for Large Language Model

Paper • 2407.03040 • Published Jul 3 •

2

Synthesizing Conversations from Unlabeled Documents using Automatic Response Segmentation

Paper • 2406.03703 • Published Jun 6 • 1 •

2

Stateful Memory-Augmented Transformers for Dialogue Modeling

Paper • 2209.07634 • Published Sep 15, 2022 • 1 •

2

Pointer-Guided Pre-Training: Infusing Large Language Models with Paragraph-Level Contextual Awareness

Paper • 2406.04156 • Published Jun 6 •

2

)))?!?(((

AI & ML interests

Organizations

stereoplegic's activity