Training
#3
by
NePe
- opened
You can train models on kaggle with TPU. I'm currently experimenting with neox-125m and it takes about 7h for 2 epoch on a 1.7GB dataset (~715M tokens) with a batch size of 80*2048 tokens.