MLP activiation
I saw you used 'silu' function on config file but you mentioned 'gelu' function on config.gin file. I would like to know how you did this.
I pretrained with t5x and when converting to huggingface format and writing the config.json I made a mistake to put 'silu' for the activation function (as I was peering at google/ul2* config.json for inspiration). With that config.json I then finetuned (using huggingface sequence to sequence finetuning, so using silu) on en-nl and that is this model. So it is actually pre-trained with gelu and finetuned using silu. As I didn't notice this until after fine-tuning, and the results looked good (very good actually compared to what I've been able to do with the same translation dataset and t5, though I think this is probably attributed to the MoD objective, not my activation function mixup) I decided to make the model publically available. I've fixed the config.json to use gelu for the pre-trained ul2-dutch models, and kept silu in the fine-tuned en-nl models that were actually finetuned using silu.