Model repeating information and "spitting out" random characters

#14
by brazilianslib - opened

First of all, congratulations on the launch. Gemma 2 9B is, at least in my tests, the best model for PT-BR. Much better than much larger models.
However, problems are constantly happening, such as:

Repeat information;
"Spit" text infinitely;
Place tags like "</start_of" at the end of your answer.
I am eagerly awaiting a solution.

Once again, I thank the entire Google Gemma team.

Google org

Hello! Can you make sure you're on the latest transformers version, v4.42.3?
We added soft-capping in this version which may result in better results in your tests.

Hello! Can you make sure you're on the latest transformers version, v4.42.3?
We added soft-capping in this version which may result in better results in your tests.

Just perfect! Amazing multilingual model!

Hello! Can you make sure you're on the latest transformers version, v4.42.3?
We added soft-capping in this version which may result in better results in your tests.

I installed this version, the problem is that when I use flash_attention_2, i get 100% random output in 4bits.
(attn_implementation="flash_attention_2")

Google org

Hi @zokica , @GPT007 , We recommend using with eager attention for Gemma2 models. Please refer to this doc for more details. Thank you.

But they did made a fix for flash attention 2, which does not work. It is supposed to fix things but this did not work.

I get the same results for eager and spd attention.

Google org

Hi, I hope the issue has been resolved. Please let us know if any further assistance is needed. Thanks!

Sign up or log in to comment