|
--- |
|
license: apache-2.0 |
|
pipeline_tag: text-generation |
|
--- |
|
|
|
Update: |
|
1.5bit quantization is now merged with the main branch of llama.cpp, so no need for dev branch to be cloned. |
|
Default dataset used for imatrix computation "wiki.train.raw" can be donwloaded from here: https://huggingface.co/datasets/ggml-org/ci |
|
|
|
TheProfessor - 155B - 1bit GGUF |
|
--- |
|
A test run of upcoming "1-bit" quantisation in llama.cpp on Eric's and AbacusAI's 155B model "TheProfessor": |
|
https://huggingface.co/abacusai/TheProfessor-155b |
|
|
|
First run (v01) was a fail, not very coherent, so uploading only second attempt (v02). Second one seemed quite coherent, but it wasn't extensively tested. |
|
|
|
As of of this writing, "IQ1_S" quantization is an open PR on the main branch, so to test it, dev branch would need to be cloned and compiled, instead of the main branch: |
|
https://github.com/ggerganov/llama.cpp/pull/5453 |
|
|
|
For reference, some size and perplexity comparison: |
|
``` |
|
Size / PPL |
|
TheProfessor-155b-Q1_S-v02.gguf 31G 9.24 |
|
TheProfessor-155b-Q4_K_M.gguf 87G 5.29 |
|
``` |
|
|
|
1bit quantization requires imatrix computed first. My imatrix is not really "State-Of-The-Art" by any means. |
|
There might be plenty of room for improvement, for anyone with better hardware to have a go, by trying one or all of the following: |
|
1. Calculate imatrix from f16 or at least Q8 version of the model, instead of "Q4_K_M" (with "only" 96GB of RAM, Q4 was the largest I could load), |
|
2. Calculate imatrix with 1000 or even more chunks instead of only 100 (100 chuncks took 8 hours on my machine), |
|
3. Potentially use of better dataset for imatrix, instead of wikitext, might improve end results as well. |
|
|
|
|
|
Replication steps: |
|
--- |
|
1. Clone and compile "ik/iq1_s" dev branch of llama.cpp: |
|
``` |
|
git clone -b ik/iq1_s https://github.com/ggerganov/llama.cpp |
|
``` |
|
|
|
2. Generate imatrix file: |
|
``` |
|
./imatrix -m models/TheProfessor-155b-Q4_K_M.gguf -f datasets/wiki.train.raw -o models/TheProfessor-155b-v02.imatrix --chunks 100 -b 512 |
|
``` |
|
|
|
|
|
3. Quantise f16 to IQ1_S: |
|
``` |
|
./quantize --imatrix models/TheProfessor-155b-v02.imatrix models/TheProfessor-155b/ggml-model-f16.gguf models/TheProfessor-155b-Q1_S-v02.gguf IQ1_S |
|
``` |
|
|
|
|