Models
Datasets
Spaces
Posts
Docs
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2404.16710

Ternary LLMs & Knowledge distillation & SOTA

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Paper • 2402.17764 • Published Feb 27 • 592
Layer Skip: Enabling Early Exit Inference and Self-Speculative Decoding

Paper • 2404.16710 • Published Apr 25 • 57
Beyond Scaling Laws: Understanding Transformer Performance with Associative Memory

Paper • 2405.08707 • Published May 14 • 27
Token-Scaled Logit Distillation for Ternary Weight Generative Language Models

Paper • 2308.06744 • Published Aug 13, 2023 • 1

about 2 hours ago

MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training

Paper • 2311.17049 • Published Nov 28, 2023
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

Paper • 2405.04434 • Published May 7 • 13
A Study of Autoregressive Decoders for Multi-Tasking in Computer Vision

Paper • 2303.17376 • Published Mar 30, 2023
Sigmoid Loss for Language Image Pre-Training

Paper • 2303.15343 • Published Mar 27, 2023 • 4

Speculative Decoding

Kangaroo: Lossless Self-Speculative Decoding via Double Early Exiting

Paper • 2404.18911 • Published Apr 29 • 29
Accelerating LLM Inference with Staged Speculative Decoding

Paper • 2308.04623 • Published Aug 8, 2023 • 23
An Emulator for Fine-Tuning Large Language Models using Small Language Models

Paper • 2310.12962 • Published Oct 19, 2023 • 14
The Curious Case of Neural Text Degeneration

Paper • 1904.09751 • Published Apr 22, 2019 • 3

Personal reading list

Layer Skip: Enabling Early Exit Inference and Self-Speculative Decoding

Paper • 2404.16710 • Published Apr 25 • 57
Make Your LLM Fully Utilize the Context

Paper • 2404.16811 • Published Apr 25 • 52

Papers - Inference - Draft Model - Early Exit - Dropout

Layer Skip: Enabling Early Exit Inference and Self-Speculative Decoding

Paper • 2404.16710 • Published Apr 25 • 57

Papers - Inference - Early Exit

Layer Skip: Enabling Early Exit Inference and Self-Speculative Decoding

Paper • 2404.16710 • Published Apr 25 • 57

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Paper • 2402.17764 • Published Feb 27 • 592
BitNet: Scaling 1-bit Transformers for Large Language Models

Paper • 2310.11453 • Published Oct 17, 2023 • 96
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models

Paper • 2404.02258 • Published Apr 2 • 104
TransformerFAM: Feedback attention is working memory

Paper • 2404.09173 • Published Apr 14 • 43

Papers - Custom Layers

Unleashing the Power of Pre-trained Language Models for Offline Reinforcement Learning

Paper • 2310.20587 • Published Oct 31, 2023 • 16
JoMA: Demystifying Multilayer Transformers via JOint Dynamics of MLP and Attention

Paper • 2310.00535 • Published Oct 1, 2023 • 2
Does Circuit Analysis Interpretability Scale? Evidence from Multiple Choice Capabilities in Chinchilla

Paper • 2307.09458 • Published Jul 18, 2023 • 10
The Impact of Depth and Width on Transformer Language Model Generalization

Paper • 2310.19956 • Published Oct 30, 2023 • 9

Papers - Speculative Decoding

Accelerating LLM Inference with Staged Speculative Decoding

Paper • 2308.04623 • Published Aug 8, 2023 • 23
An Emulator for Fine-Tuning Large Language Models using Small Language Models

Paper • 2310.12962 • Published Oct 19, 2023 • 14
The Curious Case of Neural Text Degeneration

Paper • 1904.09751 • Published Apr 22, 2019 • 3
On Speculative Decoding for Multimodal Large Language Models

Paper • 2404.08856 • Published Apr 13 • 13

Dynamic Memory Compression: Retrofitting LLMs for Accelerated Inference

Paper • 2403.09636 • Published Mar 14 • 2
TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding

Paper • 2404.11912 • Published Apr 18 • 16
Infinite-LLM: Efficient LLM Service for Long Context with DistAttention and Distributed KVCache

Paper • 2401.02669 • Published Jan 5 • 14
Layer Skip: Enabling Early Exit Inference and Self-Speculative Decoding

Paper • 2404.16710 • Published Apr 25 • 57

Previous
1
2
Next

Company

© Hugging Face

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs