Edit model card

Minitron-4B-Base-FP8

FP8 quantized checkpoint of nvidia/Minitron-4B-Base for use with vLLM.

lm_eval --model vllm --model_args pretrained=mgoin/Minitron-4B-Base-FP8 --tasks gsm8k --num_fewshot 5 --batch_size auto

vllm (pretrained=mgoin/Minitron-4B-Base-FP8), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: auto
|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.2305|±  |0.0116|
|     |       |strict-match    |     5|exact_match|↑  |0.2282|±  |0.0116|
Downloads last month
1,495
Safetensors
Model size
4.19B params
Tensor type
BF16
·
F8_E4M3
·
Inference API
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for mgoin/Minitron-4B-Base-FP8

Quantized
this model

Collection including mgoin/Minitron-4B-Base-FP8