Reproducibility
I am trying to perform the conversion to GGUF of gemma2-2b base model (not the IT), I am using the following instruction with the b3496 release of llama.cpp:
python ./llama.cpp/convert_hf_to_gguf.py gemma-2-2b --outtype f32 --outfile gemma-2-2b.FP32.gguf
did you convert the istruction-tuned version with the same instuction?
Yes that's the same conversion command I used
Yes that's the same conversion command I used
Thank you. I am getting something very strange with the FP32 version. I do not know if you have experienced something similar but I am observing a very strange perplexity on the FP32 model.
I am computing the perplexity of different quantized versions of the model with the following commands:
git clone https://huggingface.co/google/gemma-2-2b
cd llama.cpp
python ./llama.cpp/convert_hf_to_gguf.py gemma-2-2b --outtype f32 --outfile gemma-2-2b.FP32.gguf
python ./llama.cpp/convert_hf_to_gguf.py gemma-2-2b --outtype q8_0 --outfile gemma-2-2b-Q8_0.gguf
./llama-quantize ../gemma-2-2b.FP32.gguf ../gemma-2-2b-Q6_K.gguf Q6_K
./llama-quantize ../gemma-2-2b.FP32.gguf ../gemma-2-2b-Q5_K_M.gguf Q5_K_M
./llama-quantize ../gemma-2-2b.FP32.gguf ../gemma-2-2b-Q5_K_S.gguf Q5_K_S
./llama-quantize ../gemma-2-2b.FP32.gguf ../gemma-2-2b-Q4_K_M.gguf Q4_K_M
./llama-quantize ../gemma-2-2b.FP32.gguf ../gemma-2-2b-Q4_K_S.gguf Q4_K_S
./llama-quantize ../gemma-2-2b.FP32.gguf ../gemma-2-2b-Q3_K_L.gguf Q3_K_L
./llama-quantize ../gemma-2-2b.FP32.gguf ../gemma-2-2b-Q3_K_M.gguf Q3_K_M
./llama-quantize ../gemma-2-2b.FP32.gguf ../gemma-2-2b-Q3_K_S.gguf Q3_K_S
./llama-quantize ../gemma-2-2b.FP32.gguf ../gemma-2-2b-Q2_K.gguf Q2_K
./llama-perplexity -m ../gemma-2-2b.FP32.gguf -f ../wikitext-2-raw/wiki.test.raw
./llama-perplexity -m ../gemma-2-2b-Q8_0.gguf -f ../wikitext-2-raw/wiki.test.raw
./llama-perplexity -m ../gemma-2-2b-Q6_K.gguf -f ../wikitext-2-raw/wiki.test.raw
./llama-perplexity -m ../gemma-2-2b-Q5_K_M.gguf -f ../wikitext-2-raw/wiki.test.raw
./llama-perplexity -m ../gemma-2-2b-Q5_K_S.gguf -f ../wikitext-2-raw/wiki.test.raw
./llama-perplexity -m ../gemma-2-2b-Q4_K_M.gguf -f ../wikitext-2-raw/wiki.test.raw
./llama-perplexity -m ../gemma-2-2b-Q4_K_S.gguf -f ../wikitext-2-raw/wiki.test.raw
./llama-perplexity -m ../gemma-2-2b-Q3_K_L.gguf -f ../wikitext-2-raw/wiki.test.raw
./llama-perplexity -m ../gemma-2-2b-Q3_K_M.gguf -f ../wikitext-2-raw/wiki.test.raw
./llama-perplexity -m ../gemma-2-2b-Q3_K_S.gguf -f ../wikitext-2-raw/wiki.test.raw
./llama-perplexity -m ../gemma-2-2b-Q2_K.gguf -f ../wikitext-2-raw/wiki.test.raw
And very strangely, the perplexity of the FP32 is the worst with respect to all the others... It is at the same level of the Q2_K more or less. I hope that there are not still some bugs on the conversion of gemma2 in GGUF (this could impact also your conversions).
I have opened a discussion on the repo of llama.cpp: https://github.com/ggerganov/llama.cpp/discussions/9020