Neural Magic

company

Verified

https://neuralmagic.com/

neuralmagic

neuralmagic

AI & ML interests

LLMs, optimization, compression, sparsification, quantization, pruning, distillation, NLP, CV

Organization Card

Community About org cards

The Future of AI is Open

Neural Magic helps developers in accelerating deep learning performance using automated model compression technologies and inference engines. Download our compression-aware inference engines and open source tools for fast model inference.

nm-vllm: Enterprise-ready inferencing system based on the open-source library, vLLM, for at-scale operationalization of performant open-source LLMs
LLM Compressor: HF-native library for applying quantization and sparsity algorithms to llms for optimized deployment with vLLM
DeepSparse: Inference runtime offering accelerated performance on CPUs and APIs to integrate ML into your application

In this profile we provide accurate model checkpoints compressed with SOTA methods ready to run in vLLM such as W4A16, W8A16, W8A8 (int8 and fp8), and many more! If you would like help quantizing a model or have a request for us to add a checkpoint, please open an issue in https://github.com/vllm-project/llm-compressor.

Collections 12

spaces 8

Quant Llms Text Generation

Quantized vs. Unquantized LLM: Text Generation Comparison

Running on CPU Upgrade

Llama 3 8B Chat Deepsparse

Llama 2 Sparse Transfer Chat Deepsparse

DeepSparse Sentiment Analysis

DeepSparse Named Entity Recognition

Sparse Llama Gsm8k

models 259

neuralmagic/Sparse-Llama-3.1-8B-ultrachat_200k-2of4-quantized.w4a16

Text Generation • Updated 3 minutes ago • 6

neuralmagic/Sparse-Llama-3.1-8B-ultrachat_200k-2of4

Text Generation • Updated 5 minutes ago

neuralmagic/Sparse-Llama-3.1-8B-evolcodealpaca-2of4-quantized.w4a16

Text Generation • Updated 6 minutes ago

neuralmagic/Sparse-Llama-3.1-8B-evolcodealpaca-2of4

Text Generation • Updated 16 minutes ago

neuralmagic/Sparse-Llama-3.1-8B-gsm8k-2of4

Text Generation • Updated 16 minutes ago • 72

neuralmagic/Sparse-Llama-3.1-8B-gsm8k-2of4-quantized.w4a16

Text Generation • Updated 17 minutes ago

neuralmagic/Llama-3.1-8B-gsm8k

Updated about 3 hours ago

neuralmagic/Llama-3.1-8B-evolcodealpaca

Updated about 4 hours ago

neuralmagic/Llama-3.1-8B-ultrachat_200k

Updated about 4 hours ago

neuralmagic/Sparse-Llama-3.1-8B-2of4

Text Generation • Updated 1 day ago • 1

datasets 13

neuralmagic/Inference_performance_Llama_3.1_vllm0.6.1.post2

Updated 17 days ago • 5

neuralmagic/mmlu_it

Viewer • Updated 30 days ago • 14k • 434

neuralmagic/mmlu_fr

Viewer • Updated 30 days ago • 14k • 412

neuralmagic/mmlu_th

Viewer • Updated 30 days ago • 14k • 447

neuralmagic/mmlu_de

Viewer • Updated 30 days ago • 14k • 434

neuralmagic/mmlu_es

Viewer • Updated 30 days ago • 14k • 433

neuralmagic/mmlu_hi

Viewer • Updated 30 days ago • 14k • 449

neuralmagic/mmlu_pt

Viewer • Updated 30 days ago • 14k • 451

neuralmagic/quantized-llama-3.1-leaderboard-v2-evals

Viewer • Updated Oct 10 • 247k • 869

neuralmagic/quantized-llama-3.1-humaneval-evals

Viewer • Updated Oct 10 • 73.8k • 133