Difference between pile-t5-[size]-flan and pile-t5-[size]-1T-flan?
#1
by
jvroig
- opened
Hello again! Just noticed that there's both a "pile-t5-base-flan" and a "pile-t5-base-1T-flan" flavor (and same for other sizes).
If the model with the "1T" in the name means pretrained with 1 trillion tokens of the Pile, what does the model without 1T in the name mean? (I'm just deciding which one of the variants to test; ideally, it'd be whichever had more pretraining)
Thanks!
Yes, 1T is the older version that uses the 1millionth step (or 1 trillion tokens seen) of Pile-T5-Base the pile-t5-base-flan
now refers to finetuning with Flan on Pile-T5-Base that's been trained on 2 trillion tokens.
lintang
changed discussion status to
closed