--- license: apache-2.0 datasets: - HuggingFaceTB/smollm-corpus language: - en pipeline_tag: text2text-generation library_name: transformers --- # tFINE-850m-24x24-1024ctx Pretrained T5 model with [nanoT5](https://github.com/pszemraj/nanoT5/tree/fineweb-edu-test): - ~850m parameters, 24 layers in encoder, 24 layers in decoder - sentencepiece tokenizer with 48k vocab & byte-pair fallback - handles whitespaces etc correctly (_unlike original T5 tokenizer_) - 1024 ctx during pretrain - `relative_attention_num_buckets` increased to 48 from 32 for context length upscaling ## Experiment logs Training consisted of two phases: - TODO - TODO