⛈️ Llama-3.1 Storm Models Collection Fine-tuned Llama 3.1 8B model with superior reasoning, conversation abilities, and function calling! • 3 items • Updated Aug 25 • 15
MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding Paper • 2408.11049 • Published Aug 20 • 10
Code Evaluation Collection Collection of Papers on Code Evaluation (from code generation language models) • 44 items • Updated 26 days ago • 12
Llama-3.1 Quantization Collection Neural Magic quantized Llama-3.1 models • 21 items • Updated 5 days ago • 35
Llama 3.1 Collection This collection hosts the transformers and original repos of the Llama 3.1, Llama Guard 3 and Prompt Guard models • 11 items • Updated 6 days ago • 585
FP8 LLMs for vLLM Collection Accurate FP8 quantized models by Neural Magic, ready for use with vLLM! • 43 items • Updated 4 days ago • 53
Agentless: Demystifying LLM-based Software Engineering Agents Paper • 2407.01489 • Published Jul 1 • 42
Tower: An Open Multilingual Large Language Model for Translation-Related Tasks Paper • 2402.17733 • Published Feb 27 • 3
LLM Compiler Collection Meta LLM Compiler is a state-of-the-art LLM that builds upon Code Llama with improved performance for code optimization and compiler reasoning. • 4 items • Updated Jun 27 • 147
MeshAnything: Artist-Created Mesh Generation with Autoregressive Transformers Paper • 2406.10163 • Published Jun 14 • 32
Instruction Pre-Training: Language Models are Supervised Multitask Learners Paper • 2406.14491 • Published Jun 20 • 85
VideoLLaMA 2 Collection Optimized VideoLLaMA with improved spatial-temporal modeling and better audio understanding capability • 11 items • Updated Aug 31 • 17
Parakeet Collection NeMo Parakeet ASR Models attain strong speech recognition accuracy while being efficient for inference. Available in CTC and RNN-Transducer variants. • 8 items • Updated about 12 hours ago • 17
SteerLM Collection A collection of models and datasets relating to SteerLM and HelpSteer. • 7 items • Updated about 12 hours ago • 12
Nemotron 4 340B Collection Nemotron-4: open models for Synthetic Data Generation (SDG). Includes Base, Instruct, and Reward models. • 4 items • Updated about 12 hours ago • 156
Qwen2 Collection Qwen2 language models, including pretrained and instruction-tuned models of 5 sizes, including 0.5B, 1.5B, 7B, 57B-A14B, and 72B. • 39 items • Updated 13 days ago • 339
Zephyr 7B Collection Models, datasets, and demos associated with Zephyr 7B. For code to train the models, see: https://github.com/huggingface/alignment-handbook • 9 items • Updated Apr 12 • 144
Zephyr ORPO Collection Models and datasets to align LLMs with Odds Ratio Preference Optimisation (ORPO). Recipes here: https://github.com/huggingface/alignment-handbook • 3 items • Updated Apr 12 • 16
CommonCatalog Collection Common Catalog, a dataset with Creative Commons licensed images and machine-generated caption pairs • 8 items • Updated May 16 • 14
🕹️ AI Games Collection An ongoing collection of games you can play on HF Spaces • 14 items • Updated Jun 20 • 25
Granite Code Models Collection A series of code models trained by IBM licensed under Apache 2.0 license. We release both the base pretrained and instruct models. • 23 items • Updated Aug 30 • 161
Llama3-ChatQA-1.5 Collection Llama3-ChatQA-1.5 models excel at conversational question answering (QA) and retrieval-augmented generation (RAG). • 6 items • Updated about 12 hours ago • 39
Arctic Collection A collection of pre-trained dense-MoE Hybrid transformer models • 2 items • Updated Apr 24 • 22
view article Article Introducing Idefics2: A Powerful 8B Vision-Language Model for the community Apr 15 • 161
Aurora-M models Collection Aurora-M models (base, biden-harris redteams and instruct) • 5 items • Updated May 6 • 17
A little guide to building Large Language Models in 2024 Collection Resources mentioned by @thomwolf in https://x.com/Thom_Wolf/status/1773340316835131757 • 19 items • Updated Apr 1 • 14
The SPRIGHT T2I collection Collection This collection contains the datasets, model, paper, and demo associated with the SPRIGHT (SPatially RIGHT) release. • 5 items • Updated Apr 2 • 5
The Case for Co-Designing Model Architectures with Hardware Paper • 2401.14489 • Published Jan 25 • 2
Qwen1.5 Collection Qwen1.5 is the improved version of Qwen, the large language model series developed by Alibaba Cloud. • 55 items • Updated 13 days ago • 206
DBRX Collection DBRX is a mixture-of-experts (MoE) large language model trained from scratch by Databricks. • 3 items • Updated Mar 27 • 90
StreamMultiDiffusion: Real-Time Interactive Generation with Region-Based Semantic Control Paper • 2403.09055 • Published Mar 14 • 24
Wav2Vec 2.0 Collection A collection for the first release of Wav2Vec 2.0, a speech encoder that learns powerful representations from unlabelled audio data. • 8 items • Updated Jan 16 • 16
Load 4bit models 4x faster Collection Native bitsandbytes 4bit pre quantized models • 25 items • Updated 4 days ago • 45
WhisperKit Collection Models, datasets and evaluation results for WhisperKit: https://github.com/argmaxinc/WhisperKit • 4 items • Updated 26 days ago • 6
Long-Form Test Sets Collection A collection of long-form (samples > 30s) datasets used to evaluate the Distil-Whisper models. • 5 items • Updated Mar 21 • 5
Training Datasets Collection A collection of pseudo-labelled datasets used to train the Distil-Whisper model. • 9 items • Updated Mar 21 • 14
distil-large-v3 Collection This collection contains the model repositories for distil-large-v3, which provides support for the most popular Whisper libraries. • 4 items • Updated Mar 21 • 6
Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling Paper • 2311.00430 • Published Nov 1, 2023 • 56
VLOGGER: Multimodal Diffusion for Embodied Avatar Synthesis Paper • 2403.08764 • Published Mar 13 • 34
Awesome Document AI Collection A collection of open-source document AI 📄 📝 📈 • 27 items • Updated Mar 11 • 68