nanoT5-base-65kBPE-v2
This is a "raw" pretrained model intended to be fine-tuned on downstream tasks
- SiLU/gated-SiLU activation
- 25% mask rate during pretrain
- 65k vocab size, adapted claude3 tokenizer
training code: https://github.com/pszemraj/nanoT5/tree/any-tokenizer
plots
more details are under checkpoints/
loss
gradients
weights
- Downloads last month
- 7
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.