|
--- |
|
license: mit |
|
tags: |
|
- METL |
|
- biology |
|
- protein |
|
--- |
|
|
|
# METL |
|
|
|
Mutational Effect Transfer Learning (METL) is a framework for pretraining and finetuning biophysics-informed protein language models. |
|
|
|
|
|
## Model Details |
|
|
|
This repository contains a wrapper meant to facilitate the ease of use of METL models. |
|
Usage of this wrapper will be provided below. |
|
Models are hosted on [Zenodo](https://zenodo.org/doi/10.5281/zenodo.11051644) and will be downloaded by this wrapper when used. |
|
|
|
### Model Description |
|
|
|
METL is discussed in the [paper](https://doi.org/10.1101/2024.03.15.585128) in further detail. |
|
The GitHub [repo](https://github.com/gitter-lab/metl) contains more documentation and includes scripts for training and predicting with METL. |
|
Google Colab notebooks for finetuning and predicting on publicly available METL models are available as well [here](https://github.com/gitter-lab/metl/tree/main/notebooks). |
|
|
|
### Model Sources |
|
|
|
- **Repository:** [METL repo](https://github.com/gitter-lab/metl) |
|
- **Paper:** [METL preprint](https://doi.org/10.1101/2024.03.15.585128) |
|
- **Demo:** [Hugging Face Spaces demo](https://huggingface.co/spaces/gitter-lab/METL_demo) |
|
|
|
## How to Get Started with the Model |
|
|
|
Use the code below to get started with the model. |
|
|
|
Running METL requires the following packages: |
|
``` |
|
transformers==4.42.4 |
|
numpy>=1.23.2 |
|
networkx>=2.6.3 |
|
scipy>=1.9.1 |
|
biopandas>=0.2.7 |
|
``` |
|
|
|
In order to run the example, a PDB file for the GB1 protein structure must be installed. |
|
It is provided [here](https://github.com/gitter-lab/metl-pretrained/blob/main/pdbs/2qmt_p.pdb) and in raw format [here](https://raw.githubusercontent.com/gitter-lab/metl-pretrained/main/pdbs/2qmt_p.pdb). |
|
|
|
After installing those packages and downloading the above file, you may run METL with the following code example (assuming the downloaded file is in the same place as the script): |
|
|
|
```python |
|
from transformers import AutoModel |
|
import torch |
|
|
|
metl = AutoModel.from_pretrained('gitter-lab/METL', trust_remote_code=True) |
|
|
|
|
|
model = "metl-l-2m-3d-gb1" |
|
wt = "MQYKLILNGKTLKGETTTEAVDAATAEKVFKQYANDNGVDGEWTYDDATKTFTVTE" |
|
variants = '["T17P,T54F", "V28L,F51A"]' |
|
pdb_path = './2qmt_p.pdb' |
|
|
|
metl.load_from_ident(model_id) |
|
|
|
metl.eval() |
|
|
|
encoded_variants = metl.encoder.encode_variants(sequence, variant) |
|
|
|
with torch.no_grad(): |
|
predictions = metl(torch.tensor(encoded_variants), pdb_fn=pdb_path) |
|
|
|
``` |
|
|
|
## Citation |
|
|
|
Biophysics-based protein language models for protein engineering |
|
Sam Gelman, Bryce Johnson, Chase Freschlin, Sameer D’Costa, Anthony Gitter, Philip A. Romero |
|
bioRxiv 2024.03.15.585128; doi: https://doi.org/10.1101/2024.03.15.585128 |
|
|
|
## Model Card Contact |
|
|
|
For questions and comments about METL, the best way to reach out is through opening a GitHub issue in the [METL repository](https://github.com/gitter-lab/metl/issues). |