yhavinga/ul2-large-en-nl · MLP activiation

Mar 2, 2023

I saw you used 'silu' function on config file but you mentioned 'gelu' function on config.gin file. I would like to know how you did this.

yhavinga

Owner Mar 3, 2023

I pretrained with t5x and when converting to huggingface format and writing the config.json I made a mistake to put 'silu' for the activation function (as I was peering at google/ul2* config.json for inspiration). With that config.json I then finetuned (using huggingface sequence to sequence finetuning, so using silu) on en-nl and that is this model. So it is actually pre-trained with gelu and finetuned using silu. As I didn't notice this until after fine-tuning, and the results looked good (very good actually compared to what I've been able to do with the same translation dataset and t5, though I think this is probably attributed to the MoD objective, not my activation function mixup) I decided to make the model publically available. I've fixed the config.json to use gelu for the pre-trained ul2-dutch models, and kept silu in the fine-tuned en-nl models that were actually finetuned using silu.

yhavinga changed discussion status to closed Mar 9, 2023

yhavinga changed discussion status to open Mar 9, 2023

jhpassion0621 changed discussion status to closed Mar 29, 2023