The code provided in the model card does not work
at least TextStreamer is not imported but thats an easy one :D, also for the snapshot download i get a 401, config.tokenizer_name does not seem to be there. Anyway if modded the script so it works with a local repo, if i understand snapshot_download it does nothing else than that. But i'm getting a strange error
The script:
`import torch
from awq.quantize.quantizer import real_quantize_model_weight
from transformers import AutoModelForCausalLM, AutoConfig, AutoTokenizer, TextStreamer
from accelerate import init_empty_weights, load_checkpoint_and_dispatch
from huggingface_hub import snapshot_download
model_name = "./meta-llama-Llama-2-13b-chat-hf-w4-g128-awq"
Config
config = AutoConfig.from_pretrained(model_name, trust_remote_code=True)
print(config)
Tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name)
streamer = TextStreamer(tokenizer, skip_special_tokens=True)
Model
w_bit = 4
q_config = {
"zero_point": True,
"q_group_size": 128,
}
#load_quant = snapshot_download(model_name)
with init_empty_weights():
model = AutoModelForCausalLM.from_config(config=config, torch_dtype=torch.float16, trust_remote_code=True)
real_quantize_model_weight(model, w_bit=w_bit, q_config=q_config, init_only=True)
model = load_checkpoint_and_dispatch(model, model_name, device_map="balanced")
Inference
prompt = f'''What is the difference between nuclear fusion and fission?
###Response:'''
input_ids = tokenizer(prompt, return_tensors='pt').input_ids.cuda()
output = model.generate(
inputs=input_ids,
temperature=0.7,
max_new_tokens=512,
top_p=0.15,
top_k=0,
repetition_penalty=1.1,
eos_token_id=tokenizer.eos_token_id,
streamer=streamer)` the stack trace: File "/home/robert/llm/conda/script.py", line 42, in
output = model.generate(
^^^^^^^^^^^^^^^
File "/opt/conda/envs/transformer/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/envs/transformer/lib/python3.11/site-packages/transformers/generation/utils.py", line 1538, in generate
return self.greedy_search(
^^^^^^^^^^^^^^^^^^^
File "/opt/conda/envs/transformer/lib/python3.11/site-packages/transformers/generation/utils.py", line 2362, in greedy_search
outputs = self(
^^^^^
File "/opt/conda/envs/transformer/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/envs/transformer/lib/python3.11/site-packages/transformers/models/llama/modeling_llama.py", line 806, in forward
outputs = self.model(
^^^^^^^^^^^
File "/opt/conda/envs/transformer/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/envs/transformer/lib/python3.11/site-packages/transformers/models/llama/modeling_llama.py", line 693, in forward
layer_outputs = decoder_layer(
^^^^^^^^^^^^^^
File "/opt/conda/envs/transformer/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/envs/transformer/lib/python3.11/site-packages/transformers/models/llama/modeling_llama.py", line 408, in forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
^^^^^^^^^^^^^^^
File "/opt/conda/envs/transformer/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/envs/transformer/lib/python3.11/site-packages/transformers/models/llama/modeling_llama.py", line 291, in forward
query_slices = self.q_proj.weight.split((self.num_heads * self.head_dim) // self.pretraining_tp, dim=0)
^^^^^^^^^^^^^^^^^^
File "/opt/conda/envs/transformer/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1614, in getattr
raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'WQLinear' object has no attribute 'weight'. Did you mean: 'qweight'?
Hey @AnalogAiBert ,
Thanks for reporting this.
at least TextStreamer is not imported but thats an easy one :D
I have updated the repo with TextStreamer
import.
also for the snapshot download i get a 401
I'll look into this.
config.tokenizer_name does not seem to be there
I've fixed this, please see the updated instructions in the model card.
But i'm getting a strange error
Please check the config.json
for the appropriate version of transformers library: "transformers_version": "4.30.2"
I'm closing this for now. The snapshot_download
method does work most of the times, if more people complain about it, I'll look at it again.
Otherwise, I have addressed most of the issues with this post.
P.S., I just checked one AWQ uploaded model and it seems to work with the latest transformers release (transformers==4.31.0
)
Thanks a lot, it works now just fine. I would have answered and thanked you earlier, got rate limited :D 1 message every 48 hours thats why provided the code in Markdown ... any way thanks :D