Edit model card

Model Overview

The OphPred model is a machine learning-based tool developed to predict the optimal pH of enzyme activity directly from protein sequences. Utilizing the ESM-2 protein language model combined with KNN (k-nearest neighbors) and XGBoost algorithms, OphPred provides robust and reliable predictions across various enzyme classes. The model has been rigorously validated using different train-validation splitting strategies, including random, homology-based, PFAM-based, and EC-based splits. OphPred is designed to be fast and efficient, making it suitable for high-throughput screening of large protein libraries.

Key Features:

  • Input: Protein sequences.
  • Output: Predicted optimal pH range for enzyme activity.
  • Performance: Demonstrated strong predictive accuracy with a mean absolute error (MAE) as low as 0.6 and Spearman correlation up to 0.77 when enriched with additional data.
  • Use Cases: Useful for protein engineering, enzyme optimization in biotechnology, and exploring protein space for desired enzymatic properties.

Citation

If you use this model, please cite the authors as follows:

Zaretckii, M.; Buslaev, P.; Kozlovskii, I.; Morozov, A.; Popov, P. Approaching Optimal pH Enzyme Prediction with Large Language Models. ACS Synth. Biol. 2024, 10, DOI: 10.1021/acssynbio.4c00465.

Further Reading

You can read the full paper describing the development and validation of OphPred at this link.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference API
Unable to determine this model's library. Check the docs .