Models
Datasets
Spaces
Posts
Docs
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2410.10814

about 8 hours ago

Self-Rewarding Language Models

Paper • 2401.10020 • Published Jan 18 • 142
Orion-14B: Open-source Multilingual Large Language Models

Paper • 2401.12246 • Published Jan 20 • 10
MambaByte: Token-free Selective State Space Model

Paper • 2401.13660 • Published Jan 24 • 49
MM-LLMs: Recent Advances in MultiModal Large Language Models

Paper • 2401.13601 • Published Jan 24 • 44

about 11 hours ago

Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free

Paper • 2410.10814 • Published 3 days ago • 36

about 12 hours ago

Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free

Paper • 2410.10814 • Published 3 days ago • 36

about 13 hours ago

RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval

Paper • 2409.10516 • Published about 1 month ago • 34
Measuring and Enhancing Trustworthiness of LLMs in RAG through Grounded Attributions and Learning to Refuse

Paper • 2409.11242 • Published about 1 month ago • 5
Promptriever: Instruction-Trained Retrievers Can Be Prompted Like Language Models

Paper • 2409.11136 • Published about 1 month ago • 21
On the Diagram of Thought

Paper • 2409.10038 • Published Sep 16 • 11

about 15 hours ago

CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data

Paper • 2404.15653 • Published Apr 24 • 26
MoDE: CLIP Data Experts via Clustering

Paper • 2404.16030 • Published Apr 24 • 12
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning

Paper • 2405.12130 • Published May 20 • 45
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention

Paper • 2405.12981 • Published May 21 • 28

about 9 hours ago

ai21labs/Jamba-v0.1

Text Generation • Updated Sep 11 • 31.1k • 1.17k
databricks/dbrx-instruct

Text Generation • Updated Apr 19 • 3.64k • 1.1k
xai-org/grok-1

Text Generation • Updated Mar 28 • 1.49k • 2.16k
mistralai/Mistral-7B-Instruct-v0.2

Text Generation • Updated 20 days ago • 821k • • 2.54k

about 10 hours ago

DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models

Paper • 2309.14509 • Published Sep 25, 2023 • 17
LLM Augmented LLMs: Expanding Capabilities through Composition

Paper • 2401.02412 • Published Jan 4 • 36
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models

Paper • 2401.06066 • Published Jan 11 • 42
Tuning Language Models by Proxy

Paper • 2401.08565 • Published Jan 16 • 20

about 8 hours ago

LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models

Paper • 2309.12307 • Published Sep 21, 2023 • 87
Qwen Technical Report

Paper • 2309.16609 • Published Sep 28, 2023 • 34
RLHF Workflow: From Reward Modeling to Online RLHF

Paper • 2405.07863 • Published May 13 • 67
Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free

Paper • 2410.10814 • Published 3 days ago • 36

interesting stuff

about 3 hours ago

Chain-of-Verification Reduces Hallucination in Large Language Models

Paper • 2309.11495 • Published Sep 20, 2023 • 38
Adapting Large Language Models via Reading Comprehension

Paper • 2309.09530 • Published Sep 18, 2023 • 75
CulturaX: A Cleaned, Enormous, and Multilingual Dataset for Large Language Models in 167 Languages

Paper • 2309.09400 • Published Sep 17, 2023 • 82
Language Modeling Is Compression

Paper • 2309.10668 • Published Sep 19, 2023 • 82

Company

© Hugging Face

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs