The model stops after generating one new token
The LLM just generated one new token (?) and stopped . It is behaving like this repeatedly.
Have I missed something from my side ?
Load model directly
import flash_attn
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-3-mini-4k-instruct", trust_remote_code=True)
phi_model = AutoModelForCausalLM.from_pretrained("microsoft/Phi-3-mini-4k-instruct", trust_remote_code=True)
%%time
device = 'cuda'
phi_model.to(device)
def get_ids_sigIn_password_email(get_ids_sigIn_pass_email_prompt):
inputs = tokenizer.encode(get_ids_sigIn_pass_email_prompt, return_tensors="pt").to(device)
Find the index where the generated tokens start
input_length = len(tokenizer.encode(get_ids_sigIn_pass_email_prompt))
print(f"input_length -> {input_length}")
Generate a response
phi_model.eval()
outputs = phi_model.generate(inputs, max_new_tokens= 500)
print(f"shape of output is {outputs[0].shape}")
full_decoded_output = tokenizer.decode(outputs[0], skip_special_tokens=True)
decoded_output = full_decoded_output
return decoded_output
out_f2=get_ids_sigIn_password_email("write a paragraph about What is the meaning of life")
print(out_f2)
input_length -> 11
shape of output is torch.Size([13])
write a paragraph about What is the meaning of life?
I solved it using the prompt style mentioned in the documentation.
All I had to do was to use tokens <|user|> and <|assistant|> in such manner :
prompt_our_side = f"""<|user|>
{prompt}
<|assistant|>
"""
I solved it using the prompt style mentioned in the documentation.
All I had to do was to use tokens <|user|> and <|assistant|> in such manner :
prompt_our_side = f"""<|user|>
{prompt}
<|assistant|>
"""
Please close this issue as it has been resolved.