Add fp16/int8 weights
#2
by
mkshing
- opened
This PR enables to use this model with Colab Free plan by int8 quantization.
Here's the link to the demo in colab.
mkshing
changed pull request status to
open
Generally LGTM! by the way, if we don't include variant="int8"
in the from_pretrained
, it will just load the original fp32 version, is that correct?
model = AutoModelForCausalLM.from_pretrained(
model_id,
trust_remote_code=True,
variant="int8",
low_cpu_mem_usage=True,
load_in_8bit=True,
)
Exactly!
So, if I'm correct, it loads fp32 weights first and convert to int8 in this case.
model = AutoModelForCausalLM.from_pretrained(
model_id,
trust_remote_code=True,
- variant="int8",
low_cpu_mem_usage=True,
load_in_8bit=True,
)
nice! let's merge this. By the way, do you want to also include the variant
as a colab dropdown (with default use int8
) like model_id
so people can be aware of that?
leemeng
changed pull request status to
merged