AIMv2 Collection A collection of AIMv2 vision encoders that supports a number of resolutions, native resolution, and a distilled checkpoint. • 19 items • Updated 1 day ago • 45
LLM2CLIP Collection LLM2CLIP makes SOTA pretrained CLIP modal more SOTA ever. • 7 items • Updated 5 days ago • 37
view article Article Releasing the largest multilingual open pretraining dataset By Pclanglais • 11 days ago • 94
OpenCoder Collection OpenCoder is an open and reproducible code LLM family which matches the performance of top-tier code LLMs. • 8 items • Updated about 18 hours ago • 71
LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level Mathematical Reasoning Paper • 2410.02884 • Published Oct 3 • 50
MobileLLM Collection Optimizing Sub-billion Parameter Language Models for On-Device Use Cases (ICML 2024) https://arxiv.org/abs/2402.14905 • 8 items • Updated 17 days ago • 95
C4AI Aya Expanse Collection Aya Expanse is an open-weight research release of a model with highly advanced multilingual capabilities. • 3 items • Updated about 1 month ago • 26
C4AI Aya 23 Collection Aya 23 is an open weights research release of an instruction fine-tuned model with highly advanced multilingual capabilities. • 4 items • Updated Aug 6 • 50
The Big Benchmarks Collection Collection Gathering benchmark spaces on the hub (beyond the Open LLM Leaderboard) • 13 items • Updated 6 days ago • 158
Granite 3.0 Language Models Collection A series of language models trained by IBM licensed under Apache 2.0 license. We release both the base pretrained and instruct models. • 8 items • Updated 20 days ago • 89
LayerSkip Collection Models continually pretrained using LayerSkip - https://arxiv.org/abs/2404.16710 • 8 items • Updated 3 days ago • 43
HelpSteer2-Preference: Complementing Ratings with Preferences Paper • 2410.01257 • Published Oct 2 • 21
Llama-3.1-Nemotron-70B Collection SOTA models on Arena Hard and RewardBench as of 1 Oct 2024. • 6 items • Updated Oct 15 • 141
Gemma-APS Release Collection Gemma models for text-to-propositions segmentation. The models are distilled from fine-tuned Gemini Pro model applied to multi-domain synthetic data. • 3 items • Updated Oct 15 • 19