What is this?

This is GGUF of Qwen2-57B-A14B-Instruct.
I think this is the world's first successful example to make Qwen2-57B-A14B-Instruct's gguf with imatrix.

imatrix dataset

TFMC/imatrix-dataset-for-japanese-llm.
This dataset contains English and many Japanese sentence.

How to made it

First I made Q6_K then tried to convert it to i-quants with --allow-requantize option.
Surprisingly (re)quantizeing process has been completed without ploblems.
I will show more detail below.

Step1: Making GGUF of f16

At first, I converted safetensors to GGUF.

Step2: Converting f16 to Q8_0

Second, I converted f16 to Q8_0.
This is aiming to accelerate next process because I don't have enough memory to deal with high precision tensor.

Step3: Calculating imatrix

I calculate imatrix with the Q8_0.
I seem to some people succeed to calculate imatrix so I think anyone can make imatrix.
At this time, I use '-fa' option because I want to finish culculation as much as possible.
However, later I knew some people claim Qwen2 needs -fa option to work correctly.

Step4: Making Q6_K temporarily

This is most important step. Firstly I converted f16 to Q6_K.
Never try to make i-quants directly. No one may succeed to make it directly.

Step5: Converting Q6 to i-quants with the imatrix

I converted the Q6_K to i-quants with imatrix.
Strangely the process has been finished and the i-quants may work.

environment

GeForce RTX 3090 and llama.cpp windows binary b3065

License

Apache 2.0

Developer

Alibaba Cloud