Model was trained for 10 hours 3.5 epochs (14000 steps) on 8 x 40GB A100 GPUs with the following arguments
!python train.py --config tv2o-medium --max-len 4096 --acc-grad 8
Model was trained for 10 hours 3.5 epochs (14000 steps) on 8 x 40GB A100 GPUs with the following arguments
!python train.py --config tv2o-medium --max-len 4096 --acc-grad 8