Batching token length

#144
by mishavee - opened

When training bloom how many tokens could the input be? 2048?

BigScience Workshop org

Actually it was trained with sequence length 2048 but the model supports any length, you can try generating more tokens to infinity (with some performance degradation as you increase the length) it's linked to our use of alibi

how much degradation? Are you saying I can put 100000 words in one training example?
thanks

BigScience Workshop org

It's specific to your setup.

For more explanation on what ALIBI is: https://arxiv.org/abs/2108.12409
For some plots where you can understand how good it becomes on long sequence, we had a preliminary result (on 1B model) in: https://arxiv.org/abs/2210.15424 (Figure 2)

That's from a modeling perspective. From a pure hardware perspective, longer sequence means more memory footprint, so you might get out of memory issues when using 100_000 words (depending on your setup).

Sign up or log in to comment