Spaces:
Sleeping
Sleeping
MassimoGregorioTotaro
commited on
Commit
•
475d75f
1
Parent(s):
fba8f5e
checkbox fix, instructions update
Browse files- LICENSE +1 -1
- app.py +1 -1
- instructions.md +8 -5
LICENSE
CHANGED
@@ -1,4 +1,4 @@
|
|
1 |
-
Copyright (c)
|
2 |
|
3 |
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
|
4 |
|
|
|
1 |
+
Copyright (c) 2023, Massimo G. Totaro All rights reserved.
|
2 |
|
3 |
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
|
4 |
|
app.py
CHANGED
@@ -46,7 +46,7 @@ with open("instructions.md", "r", encoding="utf-8") as md,\
|
|
46 |
value=""
|
47 |
)
|
48 |
model_name = Dropdown(MODELS, label="Model", value="facebook/esm2_t30_150M_UR50D")
|
49 |
-
scoring_strategy = Checkbox(value=True, label="Use
|
50 |
btn = Button(value="Run")
|
51 |
out = HTML()
|
52 |
bto = File(
|
|
|
46 |
value=""
|
47 |
)
|
48 |
model_name = Dropdown(MODELS, label="Model", value="facebook/esm2_t30_150M_UR50D")
|
49 |
+
scoring_strategy = Checkbox(value=True, label="Use higher accuracy scoring", interactive=True)
|
50 |
btn = Button(value="Run")
|
51 |
out = HTML()
|
52 |
bto = File(
|
instructions.md
CHANGED
@@ -1,8 +1,12 @@
|
|
1 |
# **ESM-Scan**
|
2 |
Calculate the <u>fitness of single amino acid substitutions</u> on proteins, using a [zero-shot](https://doi.org/10.1101/2021.07.09.450648) [language model predictor](https://github.com/facebookresearch/esm)
|
3 |
|
4 |
-
|
5 |
-
|
|
|
|
|
|
|
|
|
6 |
|
7 |
### **Setup**
|
8 |
No setup is required, just fill the input boxes with the required data and click on the `Run` button.
|
@@ -21,11 +25,10 @@ Running a calculation resumes the tool from standby, the first run might take lo
|
|
21 |
+ any other *different input*: a deep mutational scan of the full sequence will be performed
|
22 |
- the ESM model to use for the calculations can be chosen among those that are available on Hugging Face Model Hub;
|
23 |
`esm2_t33_650M_UR50D` offers the best expense-accuracy tradeoff[*](https://doi.org/10.1126/science.ade2574)
|
24 |
-
- the `masked-marginals` scoring strategy considers sequence context
|
25 |
-
in case of long runtimes, you can tick the box off to speed the calculations up significantly, sacrificing accuracy
|
26 |
- when running a deep mutational scan, it is recommended to use smaller models (8M, 35M, 150M parameters), since the runtime is significant, especially for longer sequences and the server might be overloaded;
|
27 |
over 30 min might be necessary for calculating a 300-residue-long sequence with larger models
|
28 |
-
in general, accuracy is influenced
|
29 |
the scoring strategy computational cost scales with the number of substitutions tested, while the model’s with the wild-type sequence length
|
30 |
- it is possible to calculate the effect of multiple concurrent substitutions, but this has to be done manually, by changing the input sequence and running the calculation again
|
31 |
|
|
|
1 |
# **ESM-Scan**
|
2 |
Calculate the <u>fitness of single amino acid substitutions</u> on proteins, using a [zero-shot](https://doi.org/10.1101/2021.07.09.450648) [language model predictor](https://github.com/facebookresearch/esm)
|
3 |
|
4 |
+
If you use this tool in your research, please cite:
|
5 |
+
- Totaro, M.G. (2023). “ESM-Scan - a tool to guide amino acid substitutions.” bioRxiv. [doi.org/10.1101/2023.12.12.571273](https://doi.org/10.1101/2023.12.12.571273)
|
6 |
+
- Meier, J. (2021). “Language Models Enable Zero-Shot Prediction of the Effects of Mutations on Protein Function.” bioRxiv (Cold Spring Harbor Laboratory), July. [doi.org/10.1101/2021.07.09.450648](https://doi.org/10.1101/2021.07.09.450648)
|
7 |
+
|
8 |
+
<details>
|
9 |
+
<summary> <b> USAGE INSTRUCTIONS </b> </summary>
|
10 |
|
11 |
### **Setup**
|
12 |
No setup is required, just fill the input boxes with the required data and click on the `Run` button.
|
|
|
25 |
+ any other *different input*: a deep mutational scan of the full sequence will be performed
|
26 |
- the ESM model to use for the calculations can be chosen among those that are available on Hugging Face Model Hub;
|
27 |
`esm2_t33_650M_UR50D` offers the best expense-accuracy tradeoff[*](https://doi.org/10.1126/science.ade2574)
|
28 |
+
- the more accurate `masked-marginals` scoring strategy considers sequence context during inferences, increasing the runtime significantly; if the wait is too long, you can tick the box off to speed the calculations, sacrificing accuracy
|
|
|
29 |
- when running a deep mutational scan, it is recommended to use smaller models (8M, 35M, 150M parameters), since the runtime is significant, especially for longer sequences and the server might be overloaded;
|
30 |
over 30 min might be necessary for calculating a 300-residue-long sequence with larger models
|
31 |
+
in general, accuracy is influenced more by the scoring strategy and less so by the model size, so it is suggested to reduce the latter first when optimising for runtime;
|
32 |
the scoring strategy computational cost scales with the number of substitutions tested, while the model’s with the wild-type sequence length
|
33 |
- it is possible to calculate the effect of multiple concurrent substitutions, but this has to be done manually, by changing the input sequence and running the calculation again
|
34 |
|