How is this different from v1?
#2
by
amgadhasan
- opened
Title
it seems they changed rope theta to 1e6 for all their models.
32k context
@Yuuru What is the source of this information?
@mrfakename , vllm says it btw when loading the model
[…] max_seq_len=32768, download_dir=None, load_format=auto, tensor_parallel_size=1, quantization=None, seed=0
Yeah it would be interesting to understand how it's actually different from the first one.