Edit model card

tFINE-850m-24x24-1024ctx

Pretrained T5 model with nanoT5:

  • ~850m parameters, 24 layers in encoder, 24 layers in decoder
  • sentencepiece tokenizer with 48k vocab & byte-pair fallback
    • handles whitespaces etc correctly (unlike original T5 tokenizer)
  • 1024 ctx during pretrain
  • relative_attention_num_buckets increased to 48 from 32 for context length upscaling

Experiment logs

Training consisted of two phases:

  • TODO
  • TODO
Downloads last month
76
Safetensors
Model size
854M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train pszemraj/tFINE-850m-24x24-1024ctx

Collection including pszemraj/tFINE-850m-24x24-1024ctx