Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2411.17116

about 20 hours ago

Star Attention: Efficient LLM Inference over Long Sequences

Paper • 2411.17116 • Published 2 days ago • 25

about 18 hours ago

Natural Language Reinforcement Learning

Paper • 2411.14251 • Published 6 days ago • 25
The Impossible Test: A 2024 Unsolvable Dataset and A Chance for an AGI Quiz

Paper • 2411.14486 • Published 8 days ago • 7
Star Attention: Efficient LLM Inference over Long Sequences

Paper • 2411.17116 • Published 2 days ago • 25

about 12 hours ago

Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering

Paper • 2411.11504 • Published 9 days ago • 19
Top-nσ: Not All Logits Are You Need

Paper • 2411.07641 • Published 16 days ago • 17
Adaptive Decoding via Latent Preference Optimization

Paper • 2411.09661 • Published 13 days ago • 10
When Precision Meets Position: BFloat16 Breaks Down RoPE in Long-Context Training

Paper • 2411.13476 • Published 7 days ago • 13

about 4 hours ago

Selective Attention Improves Transformer

Paper • 2410.02703 • Published Oct 3 • 23
Differential Transformer

Paper • 2410.05258 • Published Oct 7 • 166
TidalDecode: Fast and Accurate LLM Decoding with Position Persistent Sparse Attention

Paper • 2410.05076 • Published Oct 7 • 7
SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs

Paper • 2410.13276 • Published Oct 17 • 25

about 10 hours ago

LLM Pruning and Distillation in Practice: The Minitron Approach

Paper • 2408.11796 • Published Aug 21 • 55
TableBench: A Comprehensive and Complex Benchmark for Table Question Answering

Paper • 2408.09174 • Published Aug 17 • 51
To Code, or Not To Code? Exploring Impact of Code in Pre-training

Paper • 2408.10914 • Published Aug 20 • 41
Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications

Paper • 2408.11878 • Published Aug 20 • 51

about 7 hours ago

Exponentially Faster Language Modelling

Paper • 2311.10770 • Published Nov 15, 2023 • 118
Star Attention: Efficient LLM Inference over Long Sequences

Paper • 2411.17116 • Published 2 days ago • 25

about 6 hours ago

DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models

Paper • 2309.14509 • Published Sep 25, 2023 • 17
LLM Augmented LLMs: Expanding Capabilities through Composition

Paper • 2401.02412 • Published Jan 4 • 36
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models

Paper • 2401.06066 • Published Jan 11 • 43
Tuning Language Models by Proxy

Paper • 2401.08565 • Published Jan 16 • 21

Company

© Hugging Face

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs