Suggestion: NaN logits when padding used
Hi, this is a really interesting model. Had fun playing around with it.
I've come across the following issue which I thought would be good to raise here. The nan issues described in this thread: https://github.com/huggingface/transformers/issues/32390 is also an issue with this model when the inputs are padded.
I found updating the code example from the model card, changing the torch.bfloat16
to torch.float16
fixed this issue for me.
Hi @WillBankes ,
I executed the code that was reported to cause NaN issues when using padding, as described in this GitHub thread https://github.com/huggingface/transformers/issues/32390 . However, I didn't encounter the NaN logits issue with the model google/shieldgemma-2b. You can refer to the detailed execution in this Colab notebook https://colab.research.google.com/gist/Gopi-Uppari/ffe907c215f0ebfdfb16e1f173c54942/nan-logits-when-padding-used.ipynb .
Thank you.