HiroseKoichi commited on
Commit
163365d
1 Parent(s): ae418f7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -0
README.md CHANGED
@@ -28,6 +28,13 @@ Task-Arithmetic is a linear merge that first subtracts the base model from the f
28
 
29
  Dare, Della, and Breadcrumbs are all enhancements to Ties and Task-Arithmetic that aim to improve the resulting merge by zeroing out certain weights. While they all remove weights before merging takes place, they each do it a bit differently. Dare assigns a flat dropout rate, meaning all weights have an equal chance of being dropped; Della scales the dropout rate based on the magnitude of change from the base model, with the largest changes having the smallest dropout rate; and Breadcrumbs first removes any outliers and then begins zeroing out weights until it reaches the target density, starting with the smallest changes. I've done direct comparisons between Dare and Della with all the same parameters, and Della has consistently outperformed Dare. I haven't tested breadcrumbs much, but the idea behind it seems solid.
30
 
 
 
 
 
 
 
 
31
  # Details
32
  - **License**: [llama3](https://llama.meta.com/llama3/license/)
33
  - **Instruct Format**: [llama-3](https://llama.meta.com/docs/model-cards-and-prompt-formats/meta-llama-3/) or ChatML
 
28
 
29
  Dare, Della, and Breadcrumbs are all enhancements to Ties and Task-Arithmetic that aim to improve the resulting merge by zeroing out certain weights. While they all remove weights before merging takes place, they each do it a bit differently. Dare assigns a flat dropout rate, meaning all weights have an equal chance of being dropped; Della scales the dropout rate based on the magnitude of change from the base model, with the largest changes having the smallest dropout rate; and Breadcrumbs first removes any outliers and then begins zeroing out weights until it reaches the target density, starting with the smallest changes. I've done direct comparisons between Dare and Della with all the same parameters, and Della has consistently outperformed Dare. I haven't tested breadcrumbs much, but the idea behind it seems solid.
30
 
31
+ # Quantization Formats
32
+ **GGUF**
33
+ - Static:
34
+ - https://huggingface.co/mradermacher/Llama-3-8B-Stroganoff-4.0-GGUF
35
+ - Imatrix:
36
+ - https://huggingface.co/mradermacher/Llama-3-8B-Stroganoff-4.0-i1-GGUF
37
+
38
  # Details
39
  - **License**: [llama3](https://llama.meta.com/llama3/license/)
40
  - **Instruct Format**: [llama-3](https://llama.meta.com/docs/model-cards-and-prompt-formats/meta-llama-3/) or ChatML