File size: 1,136 Bytes
99f97cf 5e48e4f f6db0f4 5e48e4f f6db0f4 5e48e4f f6db0f4 5e48e4f 99f97cf 5e48e4f f6db0f4 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 |
---
language: protein
tags:
- protein language model
datasets:
- BFD
- Custom Rosetta
---
# ProtBert-BFD finetuned on Rosetta 20,40,60AA dataset
This model is finetuned to predict Rosetta fold energy using a dataset of 300k protein sequences:
100k of 20AA, 100k of 40AA, and 100k of 60AA
Current model in this repo: `prot_bert_bfd-finetuned-032822_1323`
## Performance
- 20AA sequences (1k eval set):\
Metrics: 'mae': 0.100418, 'r2': 0.989028, 'mse': 0.016266, 'rmse': 0.127537
- 40AA sequences (10k eval set):\
Metrics: 'mae': 0.173888, 'r2': 0.963361, 'mse': 0.048218, 'rmse': 0.219587
- 60AA sequences (10k eval set):\
Metrics: 'mae': 0.235238, 'r2': 0.930164, 'mse': 0.088131, 'rmse': 0.2968
## `prot_bert_bfd` from ProtTrans
The starting pretrained model is from ProtTrans, trained on 2.1 billion proteins from BFD.
It was trained on protein sequences using a masked language modeling (MLM) objective. It was introduced in
[this paper](https://doi.org/10.1101/2020.07.12.199554) and first released in
[this repository](https://github.com/agemagician/ProtTrans).
> Created by [Ladislav Rampasek](https://rampasek.github.io)
|