Change rope scaling to match max embedding size
#16
by
Blackroot
- opened
Rope theta appears to be configured for 32k context length, hitting a max position embedding of 131072 using the formula from https://arxiv.org/pdf/2310.05209 Given:
β = 1,000,000^(log_(T_train / 2π) (T_new / 2π))
Where t_train is 32768 (the old context length) and t_extra is 131072 (the new context length) solving the eq gives us 9,370,821 for the new value.
import math
def calculate_beta():
# Log_[t_train/2pi](t_new/2pi)
inner = math.log((8192*4*4) / (2 * math.pi), (8192*4) / (2 * math.pi))
# base ^ inner
beta = 1000000 ** inner
return beta
Blackroot
changed pull request title from
Update config.json
to Change rope scaling to match max embedding size