LLM2CLIP Collection LLM2CLIP makes SOTA pretrained CLIP modal more SOTA ever. • 7 items • Updated 2 days ago • 35
view article Article Releasing the largest multilingual open pretraining dataset By Pclanglais • 8 days ago • 94
GLiClass Collection Generalist and Light-weighted Models for Zero-shot Text Classification • 13 items • Updated Sep 17 • 11
Llama 3.2 Collection This collection hosts the transformers and original repos of the Llama 3.2 and Llama Guard 3 • 15 items • Updated 28 days ago • 481
Aya Datasets Collection The Aya Collection is a massive multilingual collection for over 100 languages consisting of 513 million instances of prompts and completions. • 5 items • Updated Jun 28 • 13
The first neural machine translation system for the Erzya language Paper • 2209.09368 • Published Sep 19, 2022 • 1
Seamless: Multilingual Expressive and Streaming Speech Translation Paper • 2312.05187 • Published Dec 8, 2023 • 13
WebInstruct 🌐 Embeddings 🧱 Models Collection A collection of SoTA embeddings model fine-tuned on WebInstruct dataset to learn to pair instructions with its responses • 3 items • Updated Sep 4 • 11
Teaching Llama a New Language Through Cross-Lingual Knowledge Transfer Paper • 2404.04042 • Published Apr 5 • 1
xGen-MM (BLIP-3): A Family of Open Large Multimodal Models Paper • 2408.08872 • Published Aug 16 • 97
Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing Paper • 2406.08464 • Published Jun 12 • 65
Instruction Pre-Training: Language Models are Supervised Multitask Learners Paper • 2406.14491 • Published Jun 20 • 85
GoLLIE Collection We present GoLLIE, a Large Language Model trained to follow annotation guidelines that outperforms previous approaches on zero-shot IE. • 4 items • Updated Mar 11 • 18