HiroseKoichi
/

Llama-3-8B-Stroganoff-4.0

Text Generation

nsfw

Not-For-All-Audiences

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

HiroseKoichi commited on Sep 30

Commit

163365d

•

1 Parent(s): ae418f7

Update README.md

Files changed (1) hide show

README.md +7 -0

README.md CHANGED Viewed

@@ -28,6 +28,13 @@ Task-Arithmetic is a linear merge that first subtracts the base model from the f
 Dare, Della, and Breadcrumbs are all enhancements to Ties and Task-Arithmetic that aim to improve the resulting merge by zeroing out certain weights. While they all remove weights before merging takes place, they each do it a bit differently. Dare assigns a flat dropout rate, meaning all weights have an equal chance of being dropped; Della scales the dropout rate based on the magnitude of change from the base model, with the largest changes having the smallest dropout rate; and Breadcrumbs first removes any outliers and then begins zeroing out weights until it reaches the target density, starting with the smallest changes. I've done direct comparisons between Dare and Della with all the same parameters, and Della has consistently outperformed Dare. I haven't tested breadcrumbs much, but the idea behind it seems solid.
 # Details
 - **License**: [llama3](https://llama.meta.com/llama3/license/)
 - **Instruct Format**: [llama-3](https://llama.meta.com/docs/model-cards-and-prompt-formats/meta-llama-3/) or ChatML

 Dare, Della, and Breadcrumbs are all enhancements to Ties and Task-Arithmetic that aim to improve the resulting merge by zeroing out certain weights. While they all remove weights before merging takes place, they each do it a bit differently. Dare assigns a flat dropout rate, meaning all weights have an equal chance of being dropped; Della scales the dropout rate based on the magnitude of change from the base model, with the largest changes having the smallest dropout rate; and Breadcrumbs first removes any outliers and then begins zeroing out weights until it reaches the target density, starting with the smallest changes. I've done direct comparisons between Dare and Della with all the same parameters, and Della has consistently outperformed Dare. I haven't tested breadcrumbs much, but the idea behind it seems solid.
+# Quantization Formats
+**GGUF**
+- Static:
+    - https://huggingface.co/mradermacher/Llama-3-8B-Stroganoff-4.0-GGUF
+- Imatrix:
+    - https://huggingface.co/mradermacher/Llama-3-8B-Stroganoff-4.0-i1-GGUF
 # Details
 - **License**: [llama3](https://llama.meta.com/llama3/license/)
 - **Instruct Format**: [llama-3](https://llama.meta.com/docs/model-cards-and-prompt-formats/meta-llama-3/) or ChatML