MOSEL: 950,000 Hours of Speech Data for Open-Source Speech Foundation Model Training on EU Languages Paper ā¢ 2410.01036 ā¢ Published Oct 1 ā¢ 14
HeadGAP: Few-shot 3D Head Avatar via Generalizable Gaussian Priors Paper ā¢ 2408.06019 ā¢ Published Aug 12 ā¢ 13
Lotus: Diffusion-based Visual Foundation Model for High-quality Dense Prediction Paper ā¢ 2409.18124 ā¢ Published Sep 26 ā¢ 31
Llama 3.2 Collection Meta's new Llama 3.2 vision and text models including 1B, 3B, 11B and 90B. Includes GGUF, 4-bit bnb and original versions. ā¢ 20 items ā¢ Updated 6 days ago ā¢ 40
Qwen2.5 Collection Qwen2.5 language models, including pretrained and instruction-tuned models of 7 sizes, including 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B. ā¢ 45 items ā¢ Updated Sep 18 ā¢ 388
ReMamba: Equip Mamba with Effective Long-Sequence Modeling Paper ā¢ 2408.15496 ā¢ Published Aug 28 ā¢ 10
The Mamba in the Llama: Distilling and Accelerating Hybrid Models Paper ā¢ 2408.15237 ā¢ Published Aug 27 ā¢ 37
The Russian-focused embedders' exploration: ruMTEB benchmark and Russian embedding model design Paper ā¢ 2408.12503 ā¢ Published Aug 22 ā¢ 23
Controllable Text Generation for Large Language Models: A Survey Paper ā¢ 2408.12599 ā¢ Published Aug 22 ā¢ 63
Jamba-1.5 Collection The AI21 Jamba family of models are state-of-the-art, hybrid SSM-Transformer instruction following foundation models ā¢ 2 items ā¢ Updated Aug 22 ā¢ 82
view article Article Llama-3.1-Storm-8B: Improved SLM with Self-Curation + Model Merging By akjindal53244 ā¢ Aug 19 ā¢ 73
Transformer Language Models without Positional Encodings Still Learn Positional Information Paper ā¢ 2203.16634 ā¢ Published Mar 30, 2022 ā¢ 5
Qwen2-Audio Collection Audio-language model series based on Qwen2 ā¢ 4 items ā¢ Updated Sep 18 ā¢ 45