rampasek
/

prot_bert_bfd_rosetta204060aa

Text Classification

protein language model

Inference Endpoints

Model card Files Files and versions Community

prot_bert_bfd_rosetta204060aa / README.md

rampasek's picture

init

5e48e4f over 2 years ago

|

1.02 kB

	# ProtBert-BFD finetuned on Rosetta 20,40,60AA dataset

	This model is finetuned to predict Rosetta fold energy using a dataset of 300k protein sequences:
	100k of 20AA, 100k of 40AA, and 100k of 60AA

	Current model in this repo: `prot_bert_bfd-finetuned-032822_1323`

	## Performance

	On a held-out eval set the performance is:
	- 20AA sequences (1k eval set):
	Metrics: 'mae': 0.100418, 'r2': 0.989028, 'mse': 0.016266, 'rmse': 0.127537
	- 40AA sequences (10k eval set):
	Metrics: 'mae': 0.173888, 'r2': 0.963361, 'mse': 0.048218, 'rmse': 0.219587
	- 60AA sequences (10k eval set):
	Metrics: 'mae': 0.235238, 'r2': 0.930164, 'mse': 0.088131, 'rmse': 0.2968

	## `prot_bert_bfd` from ProtTrans
	The starting pretrained model is from ProtTrans, trained on 2.1 billion proteins from BFD.
	It was trained on protein sequences using a masked language modeling (MLM) objective. It was introduced in
	[this paper](https://doi.org/10.1101/2020.07.12.199554) and first released in
	[this repository](https://github.com/agemagician/ProtTrans).