Difference between pile-t5-[size]-flan and pile-t5-[size]-1T-flan?

#1
by jvroig - opened

Hello again! Just noticed that there's both a "pile-t5-base-flan" and a "pile-t5-base-1T-flan" flavor (and same for other sizes).

If the model with the "1T" in the name means pretrained with 1 trillion tokens of the Pile, what does the model without 1T in the name mean? (I'm just deciding which one of the variants to test; ideally, it'd be whichever had more pretraining)

Thanks!

Owner

Yes, 1T is the older version that uses the 1millionth step (or 1 trillion tokens seen) of Pile-T5-Base the pile-t5-base-flan now refers to finetuning with Flan on Pile-T5-Base that's been trained on 2 trillion tokens.

lintang changed discussion status to closed

Sign up or log in to comment