2:4 sparse versions of Llama-3.1, including transfer learning
Neural Magic
company
Verified
AI & ML interests
LLMs, optimization, compression, sparsification, quantization, pruning, distillation, NLP, CV
Organization Card
The Future of AI is Open
Neural Magic helps developers in accelerating deep learning performance using automated model compression technologies and inference engines. Download our compression-aware inference engines and open source tools for fast model inference.
- nm-vllm: Enterprise-ready inferencing system based on the open-source library, vLLM, for at-scale operationalization of performant open-source LLMs
- LLM Compressor: HF-native library for applying quantization and sparsity algorithms to llms for optimized deployment with vLLM
- DeepSparse: Inference runtime offering accelerated performance on CPUs and APIs to integrate ML into your application
In this profile we provide accurate model checkpoints compressed with SOTA methods ready to run in vLLM such as W4A16, W8A16, W8A8 (int8 and fp8), and many more! If you would like help quantizing a model or have a request for us to add a checkpoint, please open an issue in https://github.com/vllm-project/llm-compressor.
Collections
12
spaces
8
Running
1
🔥
Quant Llms Text Generation
Quantized vs. Unquantized LLM: Text Generation Comparison
Running
on
CPU Upgrade
🏃
Llama 3 8B Chat Deepsparse
Sleeping
🏃
Llama 2 Sparse Transfer Chat Deepsparse
Runtime error
1
⚡
DeepSparse Sentiment Analysis
Runtime error
6
🏢
DeepSparse Named Entity Recognition
Sleeping
16
📚
Sparse Llama Gsm8k
models
259
neuralmagic/Sparse-Llama-3.1-8B-ultrachat_200k-2of4-quantized.w4a16
Text Generation
•
Updated
•
6
neuralmagic/Sparse-Llama-3.1-8B-ultrachat_200k-2of4
Text Generation
•
Updated
neuralmagic/Sparse-Llama-3.1-8B-evolcodealpaca-2of4-quantized.w4a16
Text Generation
•
Updated
neuralmagic/Sparse-Llama-3.1-8B-evolcodealpaca-2of4
Text Generation
•
Updated
neuralmagic/Sparse-Llama-3.1-8B-gsm8k-2of4
Text Generation
•
Updated
•
72
neuralmagic/Sparse-Llama-3.1-8B-gsm8k-2of4-quantized.w4a16
Text Generation
•
Updated
neuralmagic/Llama-3.1-8B-gsm8k
Updated
neuralmagic/Llama-3.1-8B-evolcodealpaca
Updated
neuralmagic/Llama-3.1-8B-ultrachat_200k
Updated
neuralmagic/Sparse-Llama-3.1-8B-2of4
Text Generation
•
Updated
•
1
datasets
13
neuralmagic/Inference_performance_Llama_3.1_vllm0.6.1.post2
Updated
•
5
neuralmagic/mmlu_it
Viewer
•
Updated
•
14k
•
434
neuralmagic/mmlu_fr
Viewer
•
Updated
•
14k
•
412
neuralmagic/mmlu_th
Viewer
•
Updated
•
14k
•
447
neuralmagic/mmlu_de
Viewer
•
Updated
•
14k
•
434
neuralmagic/mmlu_es
Viewer
•
Updated
•
14k
•
433
neuralmagic/mmlu_hi
Viewer
•
Updated
•
14k
•
449
neuralmagic/mmlu_pt
Viewer
•
Updated
•
14k
•
451
neuralmagic/quantized-llama-3.1-leaderboard-v2-evals
Viewer
•
Updated
•
247k
•
869
neuralmagic/quantized-llama-3.1-humaneval-evals
Viewer
•
Updated
•
73.8k
•
133