Why removed rope_scaling

#11
by theodotus - opened

For some reason I can't inicialize training, and also see removed rope_scaling==null line
Do they have any correlation?

google/gemma-2b-it rope_scaling => null and google/gemma-1.1-2b-it rope_scaling => ????

Y se marchó "rope_scaling":
https://www.youtube.com/watch?v=MxMoq4YRvJo

Google org

Hi @theodotus , Yes, the inability to initialize training and the removal of the rope_scaling==null line could be correlated.

The rope_scaling parameter is often used for adjusting or enabling Rotary Positional Embeddings (RoPE). If this parameter is crucial for your model's architecture and it has been removed or set to null, the model might not be able to correctly handle positional encodings, leading to initialization failures. Thank you.

Sign up or log in to comment