HuggingFaceTB
/

smollm-360M-instruct-add-basics

Text Generation

alignment-handbook

text-generation-inference

Inference Endpoints

Model card Files Files and versions Metrics Training metrics Community

[WIP] Optimized q4f16 ONNX export (Olive)

#6

by Xenova HF staff - opened Aug 15

base: refs/heads/main

←

from: refs/pr/6

Discussion Files changed

Files changed (2) hide show

config.json +3 -0
onnx/model_q4f16.onnx +2 -2

config.json CHANGED Viewed

@@ -25,6 +25,9 @@
   "tie_word_embeddings": true,
   "torch_dtype": "bfloat16",
   "transformers_version": "4.42.3",
   "use_cache": true,
   "vocab_size": 49152
 }

   "tie_word_embeddings": true,
   "torch_dtype": "bfloat16",
   "transformers_version": "4.42.3",
+  "transformers.js_config": {
+    "kv_cache_dtype": "float16"
+  },
   "use_cache": true,
   "vocab_size": 49152
 }

onnx/model_q4f16.onnx CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:e1a788453e1393e8642f43ca729b7f2301ba61cc1f8ac1f1904c809869fc1ffb
-size 272513495

 version https://git-lfs.github.com/spec/v1
+oid sha256:8eb23549361696ffe4350e2d68d34fe92575e14182282a4bb33f9ee59836bdd6
+size 299014965