Poor performance

by HAvietisov - opened Jul 10, 2023

Jul 10, 2023

•

edited Jul 10, 2023

All quantized variations, as well as fp16, perform extremely poorly on extractive question answering, when inference ran via ctransformers.
Responses are extremely different on avx and avx2 engines, given the same promt, and in general, really bad and often don't contain answer to the question at all, contrary to un-quantized MPT-7B-instruct or quantized MPT-30B-instruct

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment