METL / README.md
agitter's picture
Update readme
934ed6d verified
---
license: mit
tags:
- METL
- biology
- protein
---
# METL
Mutational Effect Transfer Learning (METL) is a framework for pretraining and finetuning biophysics-informed protein language models.
## Model Details
This repository contains a wrapper meant to facilitate the ease of use of METL models.
Usage of this wrapper will be provided below.
Models are hosted on [Zenodo](https://zenodo.org/doi/10.5281/zenodo.11051644) and will be downloaded by this wrapper when used.
### Model Description
METL is discussed in the [paper](https://doi.org/10.1101/2024.03.15.585128) in further detail.
The GitHub [repo](https://github.com/gitter-lab/metl) contains more documentation and includes scripts for training and predicting with METL.
Google Colab notebooks for finetuning and predicting on publicly available METL models are available as well [here](https://github.com/gitter-lab/metl/tree/main/notebooks).
### Model Sources
- **Repository:** [METL repo](https://github.com/gitter-lab/metl)
- **Paper:** [METL preprint](https://doi.org/10.1101/2024.03.15.585128)
- **Demo:** [Hugging Face Spaces demo](https://huggingface.co/spaces/gitter-lab/METL_demo)
## How to Get Started with the Model
Use the code below to get started with the model.
Running METL requires the following packages:
```
transformers==4.42.4
numpy>=1.23.2
networkx>=2.6.3
scipy>=1.9.1
biopandas>=0.2.7
```
In order to run the example, a PDB file for the GB1 protein structure must be installed.
It is provided [here](https://github.com/gitter-lab/metl-pretrained/blob/main/pdbs/2qmt_p.pdb) and in raw format [here](https://raw.githubusercontent.com/gitter-lab/metl-pretrained/main/pdbs/2qmt_p.pdb).
After installing those packages and downloading the above file, you may run METL with the following code example (assuming the downloaded file is in the same place as the script):
```python
from transformers import AutoModel
import torch
metl = AutoModel.from_pretrained('gitter-lab/METL', trust_remote_code=True)
model = "metl-l-2m-3d-gb1"
wt = "MQYKLILNGKTLKGETTTEAVDAATAEKVFKQYANDNGVDGEWTYDDATKTFTVTE"
variants = '["T17P,T54F", "V28L,F51A"]'
pdb_path = './2qmt_p.pdb'
metl.load_from_ident(model_id)
metl.eval()
encoded_variants = metl.encoder.encode_variants(sequence, variant)
with torch.no_grad():
predictions = metl(torch.tensor(encoded_variants), pdb_fn=pdb_path)
```
## Citation
Biophysics-based protein language models for protein engineering
Sam Gelman, Bryce Johnson, Chase Freschlin, Sameer D’Costa, Anthony Gitter, Philip A. Romero
bioRxiv 2024.03.15.585128; doi: https://doi.org/10.1101/2024.03.15.585128
## Model Card Contact
For questions and comments about METL, the best way to reach out is through opening a GitHub issue in the [METL repository](https://github.com/gitter-lab/metl/issues).