AmelieSchreiber
/

esm2_t12_35M_qlora_binding_2600K_cp1

Model card Files Files and versions Community

AmelieSchreiber commited on Oct 1, 2023

Commit

28bad80

•

1 Parent(s): 86ae9fd

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -6,7 +6,7 @@ license: mit
 This model is the ESM-2 model [esm2_t12_35M_UR50D](https://huggingface.co/facebook/esm2_t12_35M_UR50D) finetuned with QLoRA on
 [this dataset](https://huggingface.co/datasets/AmelieSchreiber/2600K_binding_sites) of 2.6M protein sequences with binding and active
-site annotations. The model and dataset size were scaled in a one-to-one way (following the Chinchilla paper) up from the smaller
 QLoRA adaptations of the `esm2_t6_8M_UR50D` models which were trained on 600K proteins. Since this model is 4.375 times larger, a dataset
 approximately 4.375 times larger is needed if Chinchilla scaling laws hold for QLoRA finetuning of protein language models. Determining if
 such scaling laws also hold is part of this project, so checking for improvements in performance metrics over a period of 3 epochs, as well

 This model is the ESM-2 model [esm2_t12_35M_UR50D](https://huggingface.co/facebook/esm2_t12_35M_UR50D) finetuned with QLoRA on
 [this dataset](https://huggingface.co/datasets/AmelieSchreiber/2600K_binding_sites) of 2.6M protein sequences with binding and active
+site annotations from UniProt. The model and dataset size were scaled in a one-to-one way (following the Chinchilla paper) up from the smaller
 QLoRA adaptations of the `esm2_t6_8M_UR50D` models which were trained on 600K proteins. Since this model is 4.375 times larger, a dataset
 approximately 4.375 times larger is needed if Chinchilla scaling laws hold for QLoRA finetuning of protein language models. Determining if
 such scaling laws also hold is part of this project, so checking for improvements in performance metrics over a period of 3 epochs, as well