tinyllama-proteinpretrain-quinoa

Full model finetuning of TinyLLaMA-1.1B on the "research" split (quinoa protein sequences) of GreenBeing-Proteins dataset.

Notes: pretraining only on sequences leads the model to only generate protein sequences, eventually repeating VVVV ot KKKK.

This model may be replaced with mixed training (bio/chem text and protein).
This model might need "biotokens" to represent the amino acids instead of using the existing tokenizer.

More details TBD

Downloads last month: 13

Safetensors

Model size

1.1B params

Tensor type

F32

Inference Examples

Text Generation

This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for monsoon-nlp/tinyllama-proteinpretrain-quinoa

Base model

TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T

Finetuned

(89)

this model

monsoon-nlp
/

tinyllama-proteinpretrain-quinoa

tinyllama-proteinpretrain-quinoa

Model tree for monsoon-nlp/tinyllama-proteinpretrain-quinoa

Datasets used to train monsoon-nlp/tinyllama-proteinpretrain-quinoa