Quantize only LLM / Leave Vision Tower Untouched

by Jotschi - opened Oct 4

Oct 4

Have you tried to quantize only the LLM part and leave the 400M vision tower untouched? I'm curious whether this would improve the quality of the output.
I think it is possible to iterate the tensors using model.named_parameters() selectively quantize the layers by skipping all "vision_tower" params.

Jotschi

Oct 4

It seems https://huggingface.co/nm-testing/pixtral-fp8-test has this setup.

Jotschi changed discussion status to closed Oct 4

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment