How much data does 32k have?
#5
by
wnma3mz
- opened
https://arxiv.org/abs/2407.10671
The article mentions pretrain tokens with 7T. However, it is trained with 4096 lengths first, and I am curious about how many of them are 32k lengths?
To enhance the long-context capability of Qwen2, we augmented the context length from 4,096 tokens to 32,768 tokens during the concluding phase of pre-training. This expansion was complemented by the introduction of a significantly increased volume of high-quality, lengthy data.