Change rope scaling to match max embedding size

#16

Rope theta appears to be configured for 32k context length, hitting a max position embedding of 131072 using the formula from https://arxiv.org/pdf/2310.05209 Given:
β = 1,000,000^(log_(T_train / 2π) (T_new / 2π))

Where t_train is 32768 (the old context length) and t_extra is 131072 (the new context length) solving the eq gives us 9,370,821 for the new value.

import math
def calculate_beta():
    # Log_[t_train/2pi](t_new/2pi)
    inner = math.log((8192*4*4) / (2 * math.pi), (8192*4) / (2 * math.pi))
    
    # base ^ inner
    beta = 1000000 ** inner
    
    return beta
Blackroot changed pull request title from Update config.json to Change rope scaling to match max embedding size
Ready to merge
This branch is ready to get merged automatically.

Sign up or log in to comment