File size: 2,816 Bytes
e323240
 
 
 
934ed6d
 
e323240
 
 
 
 
 
 
 
 
934ed6d
 
 
e323240
 
 
934ed6d
 
 
e323240
934ed6d
e323240
934ed6d
 
 
e323240
 
 
 
 
 
 
 
 
 
 
 
 
 
934ed6d
 
e323240
934ed6d
e323240
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
253c877
e323240
934ed6d
 
e323240
 
 
 
934ed6d
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
---
license: mit
tags:
- METL
- biology
- protein
---

# METL

Mutational Effect Transfer Learning (METL) is a framework for pretraining and finetuning biophysics-informed protein language models. 


## Model Details

This repository contains a wrapper meant to facilitate the ease of use of METL models.
Usage of this wrapper will be provided below.
Models are hosted on [Zenodo](https://zenodo.org/doi/10.5281/zenodo.11051644) and will be downloaded by this wrapper when used.

### Model Description

METL is discussed in the [paper](https://doi.org/10.1101/2024.03.15.585128) in further detail.
The GitHub [repo](https://github.com/gitter-lab/metl) contains more documentation and includes scripts for training and predicting with METL.
Google Colab notebooks for finetuning and predicting on publicly available METL models are available as well [here](https://github.com/gitter-lab/metl/tree/main/notebooks).

### Model Sources

- **Repository:** [METL repo](https://github.com/gitter-lab/metl)
- **Paper:** [METL preprint](https://doi.org/10.1101/2024.03.15.585128)
- **Demo:** [Hugging Face Spaces demo](https://huggingface.co/spaces/gitter-lab/METL_demo)

## How to Get Started with the Model

Use the code below to get started with the model.

Running METL requires the following packages:
```
transformers==4.42.4
numpy>=1.23.2
networkx>=2.6.3
scipy>=1.9.1
biopandas>=0.2.7
```

In order to run the example, a PDB file for the GB1 protein structure must be installed.
It is provided [here](https://github.com/gitter-lab/metl-pretrained/blob/main/pdbs/2qmt_p.pdb) and in raw format [here](https://raw.githubusercontent.com/gitter-lab/metl-pretrained/main/pdbs/2qmt_p.pdb).

After installing those packages and downloading the above file, you may run METL with the following code example (assuming the downloaded file is in the same place as the script):

```python
from transformers import AutoModel
import torch

metl = AutoModel.from_pretrained('gitter-lab/METL', trust_remote_code=True)


model = "metl-l-2m-3d-gb1"
wt = "MQYKLILNGKTLKGETTTEAVDAATAEKVFKQYANDNGVDGEWTYDDATKTFTVTE"
variants = '["T17P,T54F", "V28L,F51A"]'
pdb_path = './2qmt_p.pdb'

metl.load_from_ident(model_id)

metl.eval()

encoded_variants = metl.encoder.encode_variants(sequence, variant)

with torch.no_grad():   
  predictions = metl(torch.tensor(encoded_variants), pdb_fn=pdb_path)
  
```

## Citation

Biophysics-based protein language models for protein engineering  
Sam Gelman, Bryce Johnson, Chase Freschlin, Sameer D’Costa, Anthony Gitter, Philip A. Romero  
bioRxiv 2024.03.15.585128; doi: https://doi.org/10.1101/2024.03.15.585128

## Model Card Contact

For questions and comments about METL, the best way to reach out is through opening a GitHub issue in the [METL repository](https://github.com/gitter-lab/metl/issues).