Poor performance
#5
by
HAvietisov
- opened
All quantized variations, as well as fp16, perform extremely poorly on extractive question answering, when inference ran via ctransformers.
Responses are extremely different on avx and avx2 engines, given the same promt, and in general, really bad and often don't contain answer to the question at all, contrary to un-quantized MPT-7B-instruct or quantized MPT-30B-instruct