Bio Series
Collection
Embeddings and NLG related to biology / amino acid sequences
•
10 items
•
Updated
•
1
TinyLLaMA model with continued pretraining / full-model finetuning on amino acids and simulated science textbooks.
The goal is to a create models which understand amino acid sequences and natural language descriptions or Q&A.
Training data was shuffled with:
CoLab notebook: https://colab.research.google.com/drive/1dah43byt-T0HQC9eCigNbxSZ8aHu6s-W?usp=sharing
To fit on an L4 GPU, it was necessary to use max_length=400 and train_batch_size=1
The following hyperparameters were used during training: