ESMCrystal_t6_8M_v1 / README.md
jaykmr's picture
Update README.md
1879f56
metadata
license: mit
widget:
  - text: >-
      MERVAVVGVPMDLGANRRGVDMGPSALRYARLLEQLEDLGYTVEDLGDVPVSLARASRRRGRGLAYLEEIRAAALVLKERLAALPEGVFPIVLGGDHSLSMGSVAGAARGRRVGVVWVDAHADFNTPETSPSGNVHGMPLAVLSGLGHPRLTEVFRAVDPKDVVLVGVRSLDPGEKRLLKEAGVRVY

Label Semantics:

Label 0: Non-crystallizable (Negative)

Label 1: Crystallizable (Positive)

Dataset

  1. DeepCrystal Train
  2. DeepCrystal Test
  3. BCrystal Test
  4. SP Test
  5. TR Test

Model

ESMCrystal_t6_8M_v1

ESMCrystal_t6_8M_v1 is a state-of-the-art protein crystallization prediction model finetuned on esm2_t6_8M_UR50D, having 6 layers and 8M parameters with the size of approx. 31.4MB using transfer learning to predict whether an input protein sequence will crystallize or not.

Accuracy :

Dataset Accuracy
DeepCrystal Test 0.7913593256059009
BCrystal test 0.7811975377728035
SP test 0.6962025316455697
TR test 0.8191699604743083

Comparision Table:

Count Positives Negatives TP FP FN TN Precision Recall F1 Accuracy ROC Mathew's Coefficient PPV NPV
Test 1898 898 1000 532 362 34 966 0.5950783 0.93992933 0.72876712 0.79091869 0.9467 0.611906376 0.5950783 0.966
BCrystal Test 1787 891 896 531 360 31 865 0.5959596 0.94483986 0.73090158 0.78119754 0.9396 0.604504011 0.5959596 0.96540179
SP Test 237 148 89 80 68 4 85 0.54054054 0.95238095 0.68965517 0.69620253 0.9328 0.501728679 0.54054054 0.95505618
TR Test 1012 374 638 207 167 16 622 0.55347594 0.92825112 0.69346734 0.81916996 0.9562 0.615341231 0.55347594 0.97492163

Graphs

ROC-AUC Curve

  • DeepCrystal Test Test ROC-AUC Curve

  • BCrystal Test BCrystal Test ROC-AUC Curve

  • SP Test SP Test ROC-AUC Curve

  • TR Test TR Test ROC-AUC Curve

PR-AUC Curve

  • DeepCrystal Test Test PR-AUC Curve

  • BCrystal Test BCrystal Test PR-AUC Curve

  • SP Test SP Test PR-AUC Curve

  • TR Test TR Test PR-AUC Curve

Final scores :

  • on DeepCrystal test:
precision recall f1-score support
non-crystallizable 0.73 0.97 0.83 1000
crystallizable 0.94 0.60 0.73 898
accuracy 0.79 1898
macro avg 0.83 0.78 0.78 1898
weighted avg 0.83 0.79 0.78 1898
  • on BCrystal test:
precision recall f1-score support
non-crystallizable 0.71 0.97 0.82 896
crystallizable 0.94 0.60 0.73 891
accuracy 0.78 1787
macro avg 0.83 0.78 0.77 1787
weighted avg 0.83 0.78 0.77 1787
  • on SP test:
precision recall f1-score support
non-crystallizable 0.56 0.96 0.70 89
crystallizable 0.95 0.54 0.69 148
accuracy 0.70 237
macro avg 0.75 0.75 0.70 237
weighted avg 0.80 0.70 0.69 237
  • on TR test:
precision recall f1-score support
non-crystallizable 0.79 0.97 0.87 638
crystallizable 0.93 0.55 0.69 374
accuracy 0.82 1012
macro avg 0.86 0.76 0.78 1012
weighted avg 0.84 0.82 0.81 1012

Confusion matrix:

  • on DeepCrystal test:
    | 532 | 362 |
    |  34 | 966 |
  • on BCrystal test:
    | 531 | 360 |
    |  31 | 865 |
  • on SP test:
    | 80 | 68 |
    |  4 | 85 |
  • on TR test:
   | 207 | 167 |
   |  16 | 622 |

Metrics

roc score:

  • on DeepCrystal test: 0.9467594654788418

  • on BCrystal test: 0.946546316337983

  • on SP test: 0.9328120255086547

  • on TR test: 0.9562804888270497

Mathews Coefficient:

  • on DeepCrystal test: 0.6130826598876417

  • on BCrystal test: 0.6045040114572474

  • on SP test: 0.5017286791304684

  • on TR test: 0.6153412305503776

NPV:

  • on DeepCrystal test: 0.966

  • on BCrystal test: 0.9654017857142857

  • on SP test: 0.9550561797752809

  • on TR test: 0.9749216300940439

PPV:

  • on DeepCrystal test: 0.5968819599109132

  • on BCrystal test: 0.5959595959595959

  • on SP test: 0.5405405405405406

  • on TR test: 0.553475935828877

Researchers:

Credits: