question about quants
#12
by
prudant
- opened
this kind of "LLM" for embeddings can be quantized, by example to AWQ o GPTQ format?
regards!
Indeed, gte embedding models can be quantized to reduce their computational requirements and memory footprint.
can you give me a little info of how get started with that? wich format, library or useful starting poing please !