Llama3.2 Collection Meta goes small with Llama3.2, both in 1B and 3B. • 9 items • Updated 4 days ago • 3
Llama 3.2 Evals Collection This collection provides detailed information on how we derived the reported benchmark metrics for the Llama 3.2 models, including the configurations • 4 items • Updated 6 days ago • 14
Llama 3.2 Collection This collection hosts the transformers and original repos of the Llama 3.2 and Llama Guard 3 • 11 items • Updated 6 days ago • 305
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models Paper • 2409.17146 • Published 6 days ago • 85
Streaming Diffusion Policy: Fast Policy Synthesis with Variable Noise Diffusion Models Paper • 2406.04806 • Published Jun 7 • 1
A Preliminary Study of o1 in Medicine: Are We Closer to an AI Doctor? Paper • 2409.15277 • Published 8 days ago • 33
Michelangelo: Long Context Evaluations Beyond Haystacks via Latent Structure Queries Paper • 2409.12640 • Published 12 days ago • 2
3DTopia-XL: Scaling High-quality 3D Asset Generation via Primitive Diffusion Paper • 2409.12957 • Published 12 days ago • 17
Improve Mathematical Reasoning in Language Models by Automated Process Supervision Paper • 2406.06592 • Published Jun 5 • 23
Self-Reflection in LLM Agents: Effects on Problem-Solving Performance Paper • 2405.06682 • Published May 5 • 2
Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters Paper • 2408.03314 • Published Aug 6 • 33
LLM Reasoning Papers Collection Papers to improve reasoning capabilities of LLMs • 13 items • Updated 6 days ago • 43
Training Language Models to Self-Correct via Reinforcement Learning Paper • 2409.12917 • Published 12 days ago • 126
Scaling Synthetic Data Creation with 1,000,000,000 Personas Paper • 2406.20094 • Published Jun 28 • 93
Moshi v0.1 Release Collection MLX, Candle & PyTorch model checkpoints released as part of the Moshi release from Kyutai. Run inference via: https://github.com/kyutai-labs/moshi • 13 items • Updated 13 days ago • 193
Llama3-8B-1.58 Collection A trio of powerful models: fine-tuned from Llama3-8b-Instruct, with BitNet architecture! • 3 items • Updated 17 days ago • 8
DataGemma Release Collection A series of pioneering open models that help ground LLMs in real-world data through Data Commons. • 2 items • Updated 19 days ago • 76
Draw an Audio: Leveraging Multi-Instruction for Video-to-Audio Synthesis Paper • 2409.06135 • Published 21 days ago • 14
view article Article Training and Finetuning Embedding Models with Sentence Transformers v3 May 28 • 148
SciLitLLM: How to Adapt LLMs for Scientific Literature Understanding Paper • 2408.15545 • Published Aug 28 • 32
Writing in the Margins: Better Inference Pattern for Long Context Retrieval Paper • 2408.14906 • Published Aug 27 • 137
Cerebras DocChat Collection GPT-4 Level Conversational QA Trained In a Few Hours • 5 items • Updated Aug 21 • 3
Llama-3.1 Quantization Collection Neural Magic quantized Llama-3.1 models • 21 items • Updated 5 days ago • 35
Minitron Collection A family of compressed models obtained via pruning and knowledge distillation • 8 items • Updated about 12 hours ago • 54
Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers Paper • 2408.06195 • Published Aug 12 • 57
🦅 🐍 FalconMamba 7B Collection This collection features the FalconMamba 7B base model, the instruction-tuned version, their 4-bit and GGUF variants, and the demo. • 13 items • Updated 13 days ago • 25
Arctic-embed Collection A collection of text embedding models optimized for retrieval accuracy and efficiency • 6 items • Updated Jul 18 • 14
view article Article The case for specialized pre-training: ultra-fast foundation models for dedicated tasks By Pclanglais • Aug 4 • 24
Improving Retrieval Augmented Language Model with Self-Reasoning Paper • 2407.19813 • Published Jul 29 • 6
Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing Paper • 2406.08464 • Published Jun 12 • 62
Probabilistic Programming with Programmable Variational Inference Paper • 2406.15742 • Published Jun 22 • 2
The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale Paper • 2406.17557 • Published Jun 25 • 84
view article Article 🔥 Argilla 2.0: the data-centric tool for AI makers 🤗 By dvilasuero • Jul 30 • 33
Trace is the New AutoDiff -- Unlocking Efficient Optimization of Computational Workflows Paper • 2406.16218 • Published Jun 23 • 1
AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agents Paper • 2407.18901 • Published Jul 26 • 31
MambaVision Collection MambaVision: A Hybrid Mamba-Transformer Vision Backbone. Includes tiny, tiny2, small, base, large and large2 variants. • 8 items • Updated about 12 hours ago • 12
NuminaMath Collection Datasets and models for training SOTA math LLMs. See our GitHub for training & inference code: https://github.com/project-numina/aimo-progress-prize • 6 items • Updated Jul 21 • 56
TaskGen: A Task-Based, Memory-Infused Agentic Framework using StrictJSON Paper • 2407.15734 • Published Jul 22 • 1
Llama 3.1 Collection This collection hosts the transformers and original repos of the Llama 3.1, Llama Guard 3 and Prompt Guard models • 11 items • Updated 6 days ago • 585
Fast Matrix Multiplications for Lookup Table-Quantized LLMs Paper • 2407.10960 • Published Jul 15 • 11