Spaces:
Sleeping
ESM-Scan
Calculate the fitness of single amino acid substitutions on proteins, using a zero-shot language model predictor
USAGE INSTRUCTIONS
Setup
No setup is required, just fill the input boxes with the required data and click on the Run
button.
A list of examples can be found at the bottom of the page, click on them to autofill the fields.
If the server is not used for some time, it will go into standby.
Running a calculation resumes the tool from standby, the first run might take longer due to startup and model loading.
Input
- write the protein full amino acid sequence to be analysed in the Sequence text box
jolly charachters (e.g.-X.B
) can be inserted but, at the moment, visualisation cannot handle them - write the substitutions to test in the Substitutions box
there are three running modes that can be used, depending on the input:- single substitution or list thereof (in the form of
R218K R218W
): the single substitution is scored - residue position or list thereof: all possible substitutions will be evaluated
- same-length sequence: the differing amino acid substitutions will be evaluated, one by one
- any other different input: a deep mutational scan of the full sequence will be performed
- single substitution or list thereof (in the form of
- the ESM model to use for the calculations can be chosen among those that are available on Hugging Face Model Hub;
esm2_t33_650M_UR50D
offers the best expense-accuracy tradeoff* - the
masked-marginals
scoring strategy considers sequence context at inference time, being slower but more accurate;
in case of long runtimes, you can tick the box off to speed the calculations up significantly, sacrificing accuracy - when running a deep mutational scan, it is recommended to use smaller models (8M, 35M, 150M parameters), since the runtime is significant, especially for longer sequences and the server might be overloaded;
over 30 min might be necessary for calculating a 300-residue-long sequence with larger models
in general, accuracy is influenced significantly by the scoring strategy and less so by the model size, so it is suggested to reduce the latter first when optimising for runtime;
the scoring strategy computational cost scales with the number of substitutions tested, while the model’s with the wild-type sequence length - it is possible to calculate the effect of multiple concurrent substitutions, but this has to be done manually, by changing the input sequence and running the calculation again
Output
Your results will be shown in a color-coded table, except for the deep mutational scan which will yield a heatmap.
The output data can be downloaded from the box at the bottom.
File extensions are not supported by the server and need to be appended to the filenames after downloading:
CSV
for tablesSVG
for full-sequence deep mutational scan