Quantize only LLM / Leave Vision Tower Untouched
#2
by
Jotschi
- opened
Have you tried to quantize only the LLM part and leave the 400M vision tower untouched? I'm curious whether this would improve the quality of the output.
I think it is possible to iterate the tensors using model.named_parameters() selectively quantize the layers by skipping all "vision_tower" params.
It seems https://huggingface.co/nm-testing/pixtral-fp8-test has this setup.
Jotschi
changed discussion status to
closed