question about quants

#12
by prudant - opened

this kind of "LLM" for embeddings can be quantized, by example to AWQ o GPTQ format?
regards!

Alibaba-NLP org

Indeed, gte embedding models can be quantized to reduce their computational requirements and memory footprint.

can you give me a little info of how get started with that? wich format, library or useful starting poing please !

Sign up or log in to comment