license: mit
widget:
- text: >-
MERVAVVGVPMDLGANRRGVDMGPSALRYARLLEQLEDLGYTVEDLGDVPVSLARASRRRGRGLAYLEEIRAAALVLKERLAALPEGVFPIVLGGDHSLSMGSVAGAARGRRVGVVWVDAHADFNTPETSPSGNVHGMPLAVLSGLGHPRLTEVFRAVDPKDVVLVGVRSLDPGEKRLLKEAGVRVY
Label Semantics:
Label 0: Non-crystallizable (Negative)
Label 1: Crystallizable (Positive)
Dataset
Model
ESMCrystal_t6_8M_v1
ESMCrystal_t6_8M_v1 is a state-of-the-art protein crystallization prediction model finetuned on esm2_t6_8M_UR50D, having 6 layers and 8M parameters with the size of approx. 31.4MB using transfer learning to predict whether an input protein sequence will crystallize or not.
Accuracy :
Dataset | Accuracy |
---|---|
DeepCrystal Test | 0.7913593256059009 |
BCrystal test | 0.7811975377728035 |
SP test | 0.6962025316455697 |
TR test | 0.8191699604743083 |
Comparision Table:
Count | Positives | Negatives | TP | FP | FN | TN | Precision | Recall | F1 | Accuracy | ROC | Mathew's Coefficient | PPV | NPV | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Test | 1898 | 898 | 1000 | 532 | 362 | 34 | 966 | 0.5950783 | 0.93992933 | 0.72876712 | 0.79091869 | 0.9467 | 0.611906376 | 0.5950783 | 0.966 |
BCrystal Test | 1787 | 891 | 896 | 531 | 360 | 31 | 865 | 0.5959596 | 0.94483986 | 0.73090158 | 0.78119754 | 0.9396 | 0.604504011 | 0.5959596 | 0.96540179 |
SP Test | 237 | 148 | 89 | 80 | 68 | 4 | 85 | 0.54054054 | 0.95238095 | 0.68965517 | 0.69620253 | 0.9328 | 0.501728679 | 0.54054054 | 0.95505618 |
TR Test | 1012 | 374 | 638 | 207 | 167 | 16 | 622 | 0.55347594 | 0.92825112 | 0.69346734 | 0.81916996 | 0.9562 | 0.615341231 | 0.55347594 | 0.97492163 |
Graphs
ROC-AUC Curve
PR-AUC Curve
Final scores :
- on DeepCrystal test:
precision | recall | f1-score | support | |
---|---|---|---|---|
non-crystallizable | 0.73 | 0.97 | 0.83 | 1000 |
crystallizable | 0.94 | 0.60 | 0.73 | 898 |
accuracy | 0.79 | 1898 | ||
macro avg | 0.83 | 0.78 | 0.78 | 1898 |
weighted avg | 0.83 | 0.79 | 0.78 | 1898 |
- on BCrystal test:
precision | recall | f1-score | support | |
---|---|---|---|---|
non-crystallizable | 0.71 | 0.97 | 0.82 | 896 |
crystallizable | 0.94 | 0.60 | 0.73 | 891 |
accuracy | 0.78 | 1787 | ||
macro avg | 0.83 | 0.78 | 0.77 | 1787 |
weighted avg | 0.83 | 0.78 | 0.77 | 1787 |
- on SP test:
precision | recall | f1-score | support | |
---|---|---|---|---|
non-crystallizable | 0.56 | 0.96 | 0.70 | 89 |
crystallizable | 0.95 | 0.54 | 0.69 | 148 |
accuracy | 0.70 | 237 | ||
macro avg | 0.75 | 0.75 | 0.70 | 237 |
weighted avg | 0.80 | 0.70 | 0.69 | 237 |
- on TR test:
precision | recall | f1-score | support | |
---|---|---|---|---|
non-crystallizable | 0.79 | 0.97 | 0.87 | 638 |
crystallizable | 0.93 | 0.55 | 0.69 | 374 |
accuracy | 0.82 | 1012 | ||
macro avg | 0.86 | 0.76 | 0.78 | 1012 |
weighted avg | 0.84 | 0.82 | 0.81 | 1012 |
Confusion matrix:
- on DeepCrystal test:
| 532 | 362 |
| 34 | 966 |
- on BCrystal test:
| 531 | 360 |
| 31 | 865 |
- on SP test:
| 80 | 68 |
| 4 | 85 |
- on TR test:
| 207 | 167 |
| 16 | 622 |
Metrics
roc score:
on DeepCrystal test: 0.9467594654788418
on BCrystal test: 0.946546316337983
on SP test: 0.9328120255086547
on TR test: 0.9562804888270497
Mathews Coefficient:
on DeepCrystal test: 0.6130826598876417
on BCrystal test: 0.6045040114572474
on SP test: 0.5017286791304684
on TR test: 0.6153412305503776
NPV:
on DeepCrystal test: 0.966
on BCrystal test: 0.9654017857142857
on SP test: 0.9550561797752809
on TR test: 0.9749216300940439
PPV:
on DeepCrystal test: 0.5968819599109132
on BCrystal test: 0.5959595959595959
on SP test: 0.5405405405405406
on TR test: 0.553475935828877
Researchers:
Credits: