Llama responses are broken during conversation
#64
by
gusakovskyi
- opened
Hello, I have used a llama locally with Fast Chat and also with Replicate API, and always at some moment during conversation is borkes, like:
- Respond with infinite quotes("""""""""""""""....)
- Repeating some tokens (youyouyouyouyouyouyou... )
- responds with only first tokens (I AM) and nothing more.
- In scope of one response stops generate readable text and returns something senseless
Here is an exmple:
Here was a question about the history of USA and at some point it starts to return some strange text
import transformers
import torch
model_id = "meta-llama/Meta-Llama-3-8B"
pipeline = transformers.pipeline(
"text-generation", model=model_id, model_kwargs={"torch_dtype": torch.bfloat16}, device_map="auto"
)
pipeline("hi")
why carsh and not give response?