Palmyra (Writer license) Collection Palmyra LLMs under Writer license https://writer.com/legal/open-model-license/ β’ 8 items β’ Updated Aug 17 β’ 6
πͺ SmolLM Collection A series of smol LLMs: 135M, 360M and 1.7B. We release base and Instruct models as well as the training corpus and some WebGPU demos β’ 12 items β’ Updated Aug 18 β’ 174
Understanding the performance gap between online and offline alignment algorithms Paper β’ 2405.08448 β’ Published May 14 β’ 14
The Big Benchmarks Collection Collection Gathering benchmark spaces on the hub (beyond the Open LLM Leaderboard) β’ 12 items β’ Updated May 28 β’ 135
LLM Leaderboard best models β€οΈβπ₯ Collection A daily uploaded list of models with best evaluations on the LLM leaderboard: β’ 264 items β’ Updated Jun 22 β’ 397
view article Article Fine-tuning Florence-2 - Microsoft's Cutting-edge Vision Language Models Jun 24 β’ 168
BABILong: Testing the Limits of LLMs with Long Context Reasoning-in-a-Haystack Paper β’ 2406.10149 β’ Published Jun 14 β’ 48
Qwen2 Collection Qwen2 language models, including pretrained and instruction-tuned models of 5 sizes, including 0.5B, 1.5B, 7B, 57B-A14B, and 72B. β’ 39 items β’ Updated 12 days ago β’ 339
Embedding Model Datasets Collection A curated subset of the datasets that work out of the box with Sentence Transformers: https://huggingface.co/datasets?other=sentence-transformers β’ 67 items β’ Updated Jul 3 β’ 63
view article Article StarCoder2-Instruct: Fully Transparent and Permissive Self-Alignment for Code Generation Apr 29 β’ 71
Qwen1.5 Collection Qwen1.5 is the improved version of Qwen, the large language model series developed by Alibaba Cloud. β’ 55 items β’ Updated 12 days ago β’ 206
ZeroGPU Spaces Collection ZeroGPU Spaces made by the community β’ 17 items β’ Updated Jun 6 β’ 218
V3D: Video Diffusion Models are Effective 3D Generators Paper β’ 2403.06738 β’ Published Mar 11 β’ 28
Design2Code: How Far Are We From Automating Front-End Engineering? Paper β’ 2403.03163 β’ Published Mar 5 β’ 93
AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling Paper β’ 2402.12226 β’ Published Feb 19 β’ 40