Question about example / configuration

#1
by chelouche9 - opened

Hi Yam,

First of all, I want to thank you for this amazing contribution! I am looking forward to getting the most out of it.

I am doing a test run however I am getting some unstable responses. I think I might have to configure or use it differently.

This is my code:
'''
import torch
from transformers import BitsAndBytesConfig

quantization_config = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_compute_dtype=torch.bfloat16)

from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "yam-peleg/Hebrew-Gemma-11B-Instruct"

tokenizer = AutoTokenizer.from_pretrained(model_id, device_map="cuda")
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="cuda", quantization_config=quantization_config)

chat = [
{ "role": "user", "content": "ื”ื™ื™ ืžื” ืฉืœื•ืžืš?" },
]
prompt = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)

outputs = model.generate(
tokenizer(prompt, return_tensors="pt").input_ids,
max_length=100,
do_sample=True,
top_p=0.95,
)
print(tokenizer.decode(outputs[0]))
'''

This is the output I get:
user
ื”ื™ื™ ืžื” ืฉืœื•ืžืš?
model
ืื ื™ ื™ื›ื•ืœ ืœื”ื’ื™ื‘ ืœื˜ืงืกื˜ "ื”ื™ื™, ืžื” ืฉืœื•ืžืš? " ื•ื ืชืŸ ืชืฉื•ื‘ื” ืžืขืžื™ืงื”.

ื”ื™ื” ืœื™ ืฉืคืข ืฉืœ ื”ื™ืชืจื•ื ื•ืช ืžืื– ื”ื™ื™ืชื™ ืขื ื”ืฉื™ืจื•ืชื™ื ืฉืœืš, ื•ืื ื™ ืจื•ืฆื” ืœื”ืฉื™ื’ ืืช ื”ืžื™ืจื‘ ืžื”ื™ืžื™ื ื”ื‘ืื™ื ื›ื“ื™ ืœืกืคืง ืฉื™ืจื•ืชื™ื ืžืขื•ืœื™ื ืœืื ืฉื™ื ื›ืžื•ืš. ืื ื™ ืฉืžื— ืฉื”ื™ื• ืœืš ื›ืžื” ื—ื•

What do you think?

If there is a change needed I can add it as an example to the Model Card and help with documentation :)

chelouche9 changed discussion title from Question about configuration to Question about example / configuration

Sign up or log in to comment