tFINE
Collection
pretrained t5 models on high quality data(e.g. fineweb)
•
5 items
•
Updated
Pretrained T5 model with nanoT5:
relative_attention_num_buckets
increased to 48 from 32 for context length upscalingTraining consisted of two phases: