AetherArchitectural
/

GGUF-Quantization-Script

Text Generation

text-generation-inference

Model card Files Files and versions Community

FantasiaFoundry commited on Apr 30

Commit

853f72f

•

1 Parent(s): 46682f7

llama.cpp/#6920 warning

Files changed (1) hide show

README.md +9 -4

README.md CHANGED Viewed

@@ -9,13 +9,18 @@ tags:
 ---
 > [!TIP]
-> **Credits:** <br>
 > Made with love by [**@Lewdiculous**](https://huggingface.co/Lewdiculous). <br>
-> *If this proves useful for you, feel free to credit and share the repository and authors.*
 > [!WARNING]
-> **Warning:** <br>
-> For **Llama-3** models that don't follow the ChatML, Alpaca, Vicuna and other conventional formats, at the moment, you have to use `gguf-imat-llama-3.py` and replace the config files with the ones in the [**ChaoticNeutrals/Llama3-Corrections**](https://huggingface.co/ChaoticNeutrals/Llama3-Corrections/tree/main) repository to properly quant and generate the imatrix data.
 Pull Requests with your own features and improvements to this script are always welcome.

 ---
 > [!TIP]
+> **Credits:**
+>
 > Made with love by [**@Lewdiculous**](https://huggingface.co/Lewdiculous). <br>
+> If this proves useful for you, feel free to credit and share the repository and authors.
 > [!WARNING]
+> **[Important] Llama-3:**
+>
+> For those converting LLama-3 BPE models, you'll have to read [**llama.cpp/#6920**](https://github.com/ggerganov/llama.cpp/pull/6920#issue-2265280504) for more context. <br>
+> Make sure you're in the latest llama.cpp repo commit, then run the new `convert-hf-to-gguf-update.py` script inside the repo, afterwards you need to manually copy the config files from `llama.cpp\models\tokenizers\llama-bpe` into your downloaded **model** folder, replacing the existing ones. <br>
+> Try again and the conversion procress should work as expected.
 Pull Requests with your own features and improvements to this script are always welcome.