Spaces:

ProteinDesignLab
/

protpardelle

Running on T4

App Files Files Community

Simon Duerr commited on Sep 13, 2023

Commit

00aa807

•

1 Parent(s): 0291496

add proteinmpnn

Browse files

This view is limited to 50 files because it contains too many changes. See raw diff

Files changed (50) hide show

ProteinMPNN +0 -1
ProteinMPNN/LICENSE +21 -0
ProteinMPNN/README.md +111 -0
ProteinMPNN/ca_model_weights/v_48_002.pt +3 -0
ProteinMPNN/ca_model_weights/v_48_010.pt +3 -0
ProteinMPNN/ca_model_weights/v_48_020.pt +3 -0
ProteinMPNN/colab_notebooks/README.md +1 -0
ProteinMPNN/colab_notebooks/ca_only_quickdemo.ipynb +0 -0
ProteinMPNN/colab_notebooks/quickdemo.ipynb +0 -0
ProteinMPNN/colab_notebooks/quickdemo_wAF2.ipynb +612 -0
ProteinMPNN/examples/submit_example_1.sh +28 -0
ProteinMPNN/examples/submit_example_2.sh +34 -0
ProteinMPNN/examples/submit_example_3.sh +27 -0
ProteinMPNN/examples/submit_example_3_score_only.sh +28 -0
ProteinMPNN/examples/submit_example_3_score_only_from_fasta.sh +30 -0
ProteinMPNN/examples/submit_example_4.sh +40 -0
ProteinMPNN/examples/submit_example_4_non_fixed.sh +40 -0
ProteinMPNN/examples/submit_example_5.sh +44 -0
ProteinMPNN/examples/submit_example_6.sh +34 -0
ProteinMPNN/examples/submit_example_7.sh +29 -0
ProteinMPNN/examples/submit_example_8.sh +34 -0
ProteinMPNN/examples/submit_example_pssm.sh +49 -0
ProteinMPNN/helper_scripts/assign_fixed_chains.py +39 -0
ProteinMPNN/helper_scripts/make_bias_AA.py +27 -0
ProteinMPNN/helper_scripts/make_bias_per_res_dict.py +53 -0
ProteinMPNN/helper_scripts/make_fixed_positions_dict.py +59 -0
ProteinMPNN/helper_scripts/make_pos_neg_tied_positions_dict.py +73 -0
ProteinMPNN/helper_scripts/make_pssm_input_dict.py +36 -0
ProteinMPNN/helper_scripts/make_tied_positions_dict.py +61 -0
ProteinMPNN/helper_scripts/other_tools/make_omit_AA.py +39 -0
ProteinMPNN/helper_scripts/other_tools/make_pssm_dict.py +64 -0
ProteinMPNN/helper_scripts/parse_multiple_chains.out +1 -0
ProteinMPNN/helper_scripts/parse_multiple_chains.py +163 -0
ProteinMPNN/helper_scripts/parse_multiple_chains.sh +7 -0
ProteinMPNN/inputs/PDB_complexes/pdbs/3HTN.pdb +0 -0
ProteinMPNN/inputs/PDB_complexes/pdbs/4YOW.pdb +0 -0
ProteinMPNN/inputs/PDB_homooligomers/pdbs/4GYT.pdb +0 -0
ProteinMPNN/inputs/PDB_homooligomers/pdbs/6EHB.pdb +0 -0
ProteinMPNN/inputs/PDB_monomers/pdbs/5L33.pdb +0 -0
ProteinMPNN/inputs/PDB_monomers/pdbs/6MRR.pdb +0 -0
ProteinMPNN/inputs/PSSM_inputs/3HTN.npz +0 -0
ProteinMPNN/inputs/PSSM_inputs/4YOW.npz +0 -0
ProteinMPNN/outputs/example_1_outputs/parsed_pdbs.jsonl +2 -0
ProteinMPNN/outputs/example_1_outputs/seqs/5L33.fa +6 -0
ProteinMPNN/outputs/example_1_outputs/seqs/6MRR.fa +6 -0
ProteinMPNN/outputs/example_2_outputs/assigned_pdbs.jsonl +1 -0
ProteinMPNN/outputs/example_2_outputs/parsed_pdbs.jsonl +0 -0
ProteinMPNN/outputs/example_2_outputs/seqs/3HTN.fa +6 -0
ProteinMPNN/outputs/example_2_outputs/seqs/4YOW.fa +6 -0
ProteinMPNN/outputs/example_3_outputs/seqs/3HTN.fa +6 -0

ProteinMPNN DELETED Viewed

	@@ -1 +0,0 @@
1	- Subproject commit 8907e6671bfbfc92303b5f79c4b5e6ce47cdef57

ProteinMPNN/LICENSE ADDED Viewed

	@@ -0,0 +1,21 @@

+MIT License
+Copyright (c) 2022 Justas Dauparas
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

ProteinMPNN/README.md ADDED Viewed

	@@ -0,0 +1,111 @@

+# ProteinMPNN
+![ProteinMPNN](https://docs.google.com/drawings/d/e/2PACX-1vTtnMBDOq8TpHIctUfGN8Vl32x5ISNcPKlxjcQJF2q70PlaH2uFlj2Ac4s3khnZqG1YxppdMr0iTyk-/pub?w=889&h=358)
+Read [ProteinMPNN paper](https://www.biorxiv.org/content/10.1101/2022.06.03.494563v1).
+To run ProteinMPNN clone this github repo and install Python>=3.0, PyTorch, Numpy.
+Full protein backbone models: `vanilla_model_weights/v_48_002.pt, v_48_010.pt, v_48_020.pt, v_48_030.pt`, `soluble_model_weights/v_48_010.pt, v_48_020.pt`.
+CA only models: `ca_model_weights/v_48_002.pt, v_48_010.pt, v_48_020.pt`. Enable flag `--ca_only` to use these models.
+Helper scripts: `helper_scripts` - helper functions to parse PDBs, assign which chains to design, which residues to fix, adding AA bias, tying residues etc.
+Code organization:
+* `protein_mpnn_run.py` - the main script to initialialize and run the model.
+* `protein_mpnn_utils.py` - utility functions for the main script.
+* `examples/` - simple code examples.
+* `inputs/` - input PDB files for examples
+* `outputs/` - outputs from examples
+* `colab_notebooks/` - Google Colab examples
+* `training/` - code and data to retrain the model
+-----------------------------------------------------------------------------------------------------
+Input flags for `protein_mpnn_run.py`:
+```
+    argparser.add_argument("--suppress_print", type=int, default=0, help="0 for False, 1 for True")
+    argparser.add_argument("--ca_only", action="store_true", default=False, help="Parse CA-only structures and use CA-only models (default: false)")
+    argparser.add_argument("--path_to_model_weights", type=str, default="", help="Path to model weights folder;")
+    argparser.add_argument("--model_name", type=str, default="v_48_020", help="ProteinMPNN model name: v_48_002, v_48_010, v_48_020, v_48_030; v_48_010=version with 48 edges 0.10A noise")
+    argparser.add_argument("--use_soluble_model", action="store_true", default=False, help="Flag to load ProteinMPNN weights trained on soluble proteins only.")
+    argparser.add_argument("--seed", type=int, default=0, help="If set to 0 then a random seed will be picked;")
+    argparser.add_argument("--save_score", type=int, default=0, help="0 for False, 1 for True; save score=-log_prob to npy files")
+    argparser.add_argument("--path_to_fasta", type=str, default="", help="score provided input sequence in a fasta format; e.g. GGGGGG/PPPPS/WWW for chains A, B, C sorted alphabetically and separated by /")
+    argparser.add_argument("--save_probs", type=int, default=0, help="0 for False, 1 for True; save MPNN predicted probabilites per position")
+    argparser.add_argument("--score_only", type=int, default=0, help="0 for False, 1 for True; score input backbone-sequence pairs")
+    argparser.add_argument("--conditional_probs_only", type=int, default=0, help="0 for False, 1 for True; output conditional probabilities p(s_i given the rest of the sequence and backbone)")
+    argparser.add_argument("--conditional_probs_only_backbone", type=int, default=0, help="0 for False, 1 for True; if true output conditional probabilities p(s_i given backbone)")
+    argparser.add_argument("--unconditional_probs_only", type=int, default=0, help="0 for False, 1 for True; output unconditional probabilities p(s_i given backbone) in one forward pass")
+    argparser.add_argument("--backbone_noise", type=float, default=0.00, help="Standard deviation of Gaussian noise to add to backbone atoms")
+    argparser.add_argument("--num_seq_per_target", type=int, default=1, help="Number of sequences to generate per target")
+    argparser.add_argument("--batch_size", type=int, default=1, help="Batch size; can set higher for titan, quadro GPUs, reduce this if running out of GPU memory")
+    argparser.add_argument("--max_length", type=int, default=200000, help="Max sequence length")
+    argparser.add_argument("--sampling_temp", type=str, default="0.1", help="A string of temperatures, 0.2 0.25 0.5. Sampling temperature for amino acids. Suggested values 0.1, 0.15, 0.2, 0.25, 0.3. Higher values will lead to more diversity.")
+    argparser.add_argument("--out_folder", type=str, help="Path to a folder to output sequences, e.g. /home/out/")
+    argparser.add_argument("--pdb_path", type=str, default='', help="Path to a single PDB to be designed")
+    argparser.add_argument("--pdb_path_chains", type=str, default='', help="Define which chains need to be designed for a single PDB ")
+    argparser.add_argument("--jsonl_path", type=str, help="Path to a folder with parsed pdb into jsonl")
+    argparser.add_argument("--chain_id_jsonl",type=str, default='', help="Path to a dictionary specifying which chains need to be designed and which ones are fixed, if not specied all chains will be designed.")
+    argparser.add_argument("--fixed_positions_jsonl", type=str, default='', help="Path to a dictionary with fixed positions")
+    argparser.add_argument("--omit_AAs", type=list, default='X', help="Specify which amino acids should be omitted in the generated sequence, e.g. 'AC' would omit alanine and cystine.")
+    argparser.add_argument("--bias_AA_jsonl", type=str, default='', help="Path to a dictionary which specifies AA composion bias if neededi, e.g. {A: -1.1, F: 0.7} would make A less likely and F more likely.")
+    argparser.add_argument("--bias_by_res_jsonl", default='', help="Path to dictionary with per position bias.")
+    argparser.add_argument("--omit_AA_jsonl", type=str, default='', help="Path to a dictionary which specifies which amino acids need to be omited from design at specific chain indices")
+    argparser.add_argument("--pssm_jsonl", type=str, default='', help="Path to a dictionary with pssm")
+    argparser.add_argument("--pssm_multi", type=float, default=0.0, help="A value between [0.0, 1.0], 0.0 means do not use pssm, 1.0 ignore MPNN predictions")
+    argparser.add_argument("--pssm_threshold", type=float, default=0.0, help="A value between -inf + inf to restric per position AAs")
+    argparser.add_argument("--pssm_log_odds_flag", type=int, default=0, help="0 for False, 1 for True")
+    argparser.add_argument("--pssm_bias_flag", type=int, default=0, help="0 for False, 1 for True")
+    argparser.add_argument("--tied_positions_jsonl", type=str, default='', help="Path to a dictionary with tied positions")
+```
+-----------------------------------------------------------------------------------------------------
+For example to make a conda environment to run ProteinMPNN:
+* `conda create --name mlfold` - this creates conda environment called `mlfold`
+* `source activate mlfold` - this activate environment
+* `conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch` - install pytorch following steps from https://pytorch.org/
+-----------------------------------------------------------------------------------------------------
+These are provided `examples/`:
+* `submit_example_1.sh` - simple monomer example
+* `submit_example_2.sh` - simple multi-chain example
+* `submit_example_3.sh` - directly from the .pdb path
+* `submit_example_3_score_only.sh` - return score only (model's uncertainty)
+* `submit_example_3_score_only_from_fasta.sh` - return score only (model's uncertainty) loading sequence from fasta files
+* `submit_example_4.sh` - fix some residue positions
+* `submit_example_4_non_fixed.sh` - specify which positions to design
+* `submit_example_5.sh` - tie some positions together (symmetry)
+* `submit_example_6.sh` - homooligomer example
+* `submit_example_7.sh` - return sequence unconditional probabilities (PSSM like)
+* `submit_example_8.sh` - add amino acid bias
+* `submit_example_pssm.sh` - use PSSM bias when designing sequences
+-----------------------------------------------------------------------------------------------------
+Output example:
+```
+>3HTN, score=1.1705, global_score=1.2045, fixed_chains=['B'], designed_chains=['A', 'C'], model_name=v_48_020, git_hash=015ff820b9b5741ead6ba6795258f35a9c15e94b, seed=37
+NMYSYKKIGNKYIVSINNHTEIVKALNAFCKEKGILSGSINGIGAIGELTLRFFNPKTKAYDDKTFREQMEISNLTGNISSMNEQVYLHLHITVGRSDYSALAGHLLSAIQNGAGEFVVEDYSERISRTYNPDLGLNIYDFER/NMYSYKKIGNKYIVSINNHTEIVKALNAFCKEKGILSGSINGIGAIGELTLRFFNPKTKAYDDKTFREQMEISNLTGNISSMNEQVYLHLHITVGRSDYSALAGHLLSAIQNGAGEFVVEDYSERISRTYNPDLGLNIYDFER
+>T=0.1, sample=1, score=0.7291, global_score=0.9330, seq_recovery=0.5736
+NMYSYKKIGNKYIVSINNHTEIVKALKKFCEEKNIKSGSVNGIGSIGSVTLKFYNLETKEEELKTFNANFEISNLTGFISMHDNKVFLDLHITIGDENFSALAGHLVSAVVNGTCELIVEDFNELVSTKYNEELGLWLLDFEK/NMYSYKKIGNKYIVSINNHTDIVTAIKKFCEDKKIKSGTINGIGQVKEVTLEFRNFETGEKEEKTFKKQFTISNLTGFISTKDGKVFLDLHITFGDENFSALAGHLISAIVDGKCELIIEDYNEEINVKYNEELGLYLLDFNK
+>T=0.1, sample=2, score=0.7414, global_score=0.9355, seq_recovery=0.6075
+NMYKYKKIGNKYIVSINNHTEIVKAIKEFCKEKNIKSGTINGIGQVGKVTLRFYNPETKEYTEKTFNDNFEISNLTGFISTYKNEVFLHLHITFGKSDFSALAGHLLSAIVNGICELIVEDFKENLSMKYDEKTGLYLLDFEK/NMYKYKKIGNKYVVSINNHTEIVEALKAFCEDKKIKSGTVNGIGQVSKVTLKFFNIETKESKEKTFNKNFEISNLTGFISEINGEVFLHLHITIGDENFSALAGHLLSAVVNGEAILIVEDYKEKVNRKYNEELGLNLLDFNL
+```
+* `score` - average over residues that were designed negative log probability of sampled amino acids
+* `global score` - average over all residues in all chains negative log probability of sampled/fixed amino acids
+* `fixed_chains` - chains that were not designed (fixed)
+* `designed_chains` - chains that were redesigned
+* `model_name/CA_model_name` - model name that was used to generate results, e.g. `v_48_020`
+* `git_hash` - github version that was used to generate outputs
+* `seed` - random seed
+* `T=0.1` - temperature equal to 0.1 was used to sample sequences
+* `sample` - sequence sample number 1, 2, 3...etc
+-----------------------------------------------------------------------------------------------------
+```
+@article{dauparas2022robust,
+  title={Robust deep learning--based protein sequence design using ProteinMPNN},
+  author={Dauparas, Justas and Anishchenko, Ivan and Bennett, Nathaniel and Bai, Hua and Ragotte, Robert J and Milles, Lukas F and Wicky, Basile IM and Courbet, Alexis and de Haas, Rob J and Bethel, Neville and others},
+  journal={Science},
+  volume={378},
+  number={6615},
+  pages={49--56},
+  year={2022},
+  publisher={American Association for the Advancement of Science}
+}
+```
+-----------------------------------------------------------------------------------------------------

ProteinMPNN/ca_model_weights/v_48_002.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ec038b44a987d7c8351b6ed887c82a2370d54e45e55a6bdaf508a729cef0340e
+size 6624011

ProteinMPNN/ca_model_weights/v_48_010.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:cdb50498d45578d20b271fa7817b8cd8bfde3875ad69dbd3f5e4b5dd3e588301
+size 6624011

ProteinMPNN/ca_model_weights/v_48_020.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f28f40170e21858c5ff31ef50b6e63414ff76dc331b19f85aa8586a12031744a
+size 6624011

ProteinMPNN/colab_notebooks/README.md ADDED Viewed

	@@ -0,0 +1 @@


1	+ <a href="https://colab.research.google.com/github/dauparas/ProteinMPNN/blob/main/colab_notebooks/quickdemo.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

ProteinMPNN/colab_notebooks/ca_only_quickdemo.ipynb ADDED Viewed

The diff for this file is too large to render. See raw diff

ProteinMPNN/colab_notebooks/quickdemo.ipynb ADDED Viewed

The diff for this file is too large to render. See raw diff

ProteinMPNN/colab_notebooks/quickdemo_wAF2.ipynb ADDED Viewed

	@@ -0,0 +1,612 @@

+{
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "view-in-github",
+        "colab_type": "text"
+      },
+      "source": [
+        "<a href=\"https://colab.research.google.com/github/dauparas/ProteinMPNN/blob/main/colab_notebooks/quickdemo_wAF2.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "AYZebfKn8gef"
+      },
+      "source": [
+        "#ProteinMPNN w/AF2\n",
+        "This notebook is intended as a quick demo, more features to come!\n",
+        "\n",
+        "Examples: \n",
+        "1.   pdb: `6MRR`, homomer: `False`, designed_chain: `A`\n",
+        "2.   pdb: `1X2I`, homomer: `True`, designed_chain: `A,B` \n",
+        "     (for correct symmetric tying lenghts of homomer chains should be the same)"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "#@title Setup ProteinMPNN\n",
+        "import warnings\n",
+        "warnings.simplefilter(action='ignore', category=FutureWarning)\n",
+        "\n",
+        "import json, time, os, sys, glob, re\n",
+        "from google.colab import files\n",
+        "import numpy as np\n",
+        "\n",
+        "if not os.path.isdir(\"ProteinMPNN\"):\n",
+        "  os.system(\"git clone -q https://github.com/dauparas/ProteinMPNN.git\")\n",
+        "\n",
+        "if \"ProteinMPNN\" not in sys.path:\n",
+        "  sys.path.append('/content/ProteinMPNN')\n",
+        "\n",
+        "import matplotlib.pyplot as plt\n",
+        "import shutil\n",
+        "import warnings\n",
+        "import torch\n",
+        "from torch import optim\n",
+        "from torch.utils.data import DataLoader\n",
+        "from torch.utils.data.dataset import random_split, Subset\n",
+        "import copy\n",
+        "import torch.nn as nn\n",
+        "import torch.nn.functional as F\n",
+        "import random\n",
+        "import os.path\n",
+        "from protein_mpnn_utils import loss_nll, loss_smoothed, gather_edges, gather_nodes, gather_nodes_t, cat_neighbors_nodes, _scores, _S_to_seq, tied_featurize, parse_PDB\n",
+        "from protein_mpnn_utils import StructureDataset, StructureDatasetPDB, ProteinMPNN\n",
+        "\n",
+        "device = torch.device(\"cpu\")\n",
+        "#v_48_010=version with 48 edges 0.10A noise\n",
+        "model_name = \"v_48_020\" #@param [\"v_48_002\", \"v_48_010\", \"v_48_020\", \"v_48_030\"]\n",
+        "\n",
+        "\n",
+        "backbone_noise=0.00               # Standard deviation of Gaussian noise to add to backbone atoms\n",
+        "\n",
+        "path_to_model_weights='/content/ProteinMPNN/vanilla_model_weights'          \n",
+        "hidden_dim = 128\n",
+        "num_layers = 3 \n",
+        "model_folder_path = path_to_model_weights\n",
+        "if model_folder_path[-1] != '/':\n",
+        "    model_folder_path = model_folder_path + '/'\n",
+        "checkpoint_path = model_folder_path + f'{model_name}.pt'\n",
+        "\n",
+        "checkpoint = torch.load(checkpoint_path, map_location=device) \n",
+        "print('Number of edges:', checkpoint['num_edges'])\n",
+        "noise_level_print = checkpoint['noise_level']\n",
+        "print(f'Training noise level: {noise_level_print}A')\n",
+        "model = ProteinMPNN(num_letters=21, node_features=hidden_dim, edge_features=hidden_dim, hidden_dim=hidden_dim, num_encoder_layers=num_layers, num_decoder_layers=num_layers, augment_eps=backbone_noise, k_neighbors=checkpoint['num_edges'])\n",
+        "model.to(device)\n",
+        "model.load_state_dict(checkpoint['model_state_dict'])\n",
+        "model.eval()\n",
+        "print(\"Model loaded\")\n",
+        "\n",
+        "def make_tied_positions_for_homomers(pdb_dict_list):\n",
+        "    my_dict = {}\n",
+        "    for result in pdb_dict_list:\n",
+        "        all_chain_list = sorted([item[-1:] for item in list(result) if item[:9]=='seq_chain']) #A, B, C, ...\n",
+        "        tied_positions_list = []\n",
+        "        chain_length = len(result[f\"seq_chain_{all_chain_list[0]}\"])\n",
+        "        for i in range(1,chain_length+1):\n",
+        "            temp_dict = {}\n",
+        "            for j, chain in enumerate(all_chain_list):\n",
+        "                temp_dict[chain] = [i] #needs to be a list\n",
+        "            tied_positions_list.append(temp_dict)\n",
+        "        my_dict[result['name']] = tied_positions_list\n",
+        "    return my_dict\n",
+        "\n",
+        "#########################\n",
+        "def get_pdb(pdb_code=\"\"):\n",
+        "  if pdb_code is None or pdb_code == \"\":\n",
+        "    upload_dict = files.upload()\n",
+        "    pdb_string = upload_dict[list(upload_dict.keys())[0]]\n",
+        "    with open(\"tmp.pdb\",\"wb\") as out: out.write(pdb_string)\n",
+        "    return \"tmp.pdb\"\n",
+        "  else:\n",
+        "    os.system(f\"wget -qnc https://files.rcsb.org/view/{pdb_code}.pdb\")\n",
+        "    return f\"{pdb_code}.pdb\""
+      ],
+      "metadata": {
+        "id": "2nKSlaMlSpcf",
+        "cellView": "form"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "cellView": "form",
+        "id": "xMVlYh8Fv2of"
+      },
+      "outputs": [],
+      "source": [
+        "#@title #Run ProteinMPNN\n",
+        "\n",
+        "#@markdown #### Input Options\n",
+        "pdb='6MRR' #@param {type:\"string\"}\n",
+        "pdb = pdb.replace(\" \",\"\")\n",
+        "pdb_path = get_pdb(pdb)\n",
+        "#@markdown - pdb code (leave blank to get an upload prompt)\n",
+        "\n",
+        "homomer = False #@param {type:\"boolean\"}\n",
+        "designed_chain = \"A\" #@param {type:\"string\"}\n",
+        "fixed_chain = \"\" #@param {type:\"string\"}\n",
+        "\n",
+        "if designed_chain == \"\":\n",
+        "  designed_chain_list = []\n",
+        "else:\n",
+        "  designed_chain_list = re.sub(\"[^A-Za-z]+\",\",\", designed_chain).split(\",\")\n",
+        "\n",
+        "if fixed_chain == \"\":\n",
+        "  fixed_chain_list = []\n",
+        "else:\n",
+        "  fixed_chain_list = re.sub(\"[^A-Za-z]+\",\",\", fixed_chain).split(\",\")\n",
+        "\n",
+        "chain_list = list(set(designed_chain_list + fixed_chain_list))\n",
+        "\n",
+        "#@markdown - specified which chain(s) to design and which chain(s) to keep fixed. \n",
+        "#@markdown   Use comma:`A,B` to specifiy more than one chain\n",
+        "\n",
+        "#chain = \"A\" #@param {type:\"string\"}\n",
+        "#pdb_path_chains = chain\n",
+        "##@markdown - Define which chain to redesign\n",
+        "\n",
+        "#@markdown #### Design Options\n",
+        "num_seqs = 8 #@param [\"1\", \"2\", \"4\", \"8\", \"16\", \"32\", \"64\"] {type:\"raw\"}\n",
+        "num_seq_per_target = num_seqs\n",
+        "\n",
+        "#@markdown - Sampling temperature for amino acids, T=0.0 means taking argmax, T>>1.0 means sample randomly.\n",
+        "sampling_temp = \"0.1\" #@param [\"0.0001\", \"0.1\", \"0.15\", \"0.2\", \"0.25\", \"0.3\", \"0.5\"]\n",
+        "\n",
+        "\n",
+        "\n",
+        "save_score=0                      # 0 for False, 1 for True; save score=-log_prob to npy files\n",
+        "save_probs=0                      # 0 for False, 1 for True; save MPNN predicted probabilites per position\n",
+        "score_only=0                      # 0 for False, 1 for True; score input backbone-sequence pairs\n",
+        "conditional_probs_only=0          # 0 for False, 1 for True; output conditional probabilities p(s_i given the rest of the sequence and backbone)\n",
+        "conditional_probs_only_backbone=0 # 0 for False, 1 for True; if true output conditional probabilities p(s_i given backbone)\n",
+        "    \n",
+        "batch_size=1                      # Batch size; can set higher for titan, quadro GPUs, reduce this if running out of GPU memory\n",
+        "max_length=20000                  # Max sequence length\n",
+        "    \n",
+        "out_folder='.'                    # Path to a folder to output sequences, e.g. /home/out/\n",
+        "jsonl_path=''                     # Path to a folder with parsed pdb into jsonl\n",
+        "omit_AAs='X'                      # Specify which amino acids should be omitted in the generated sequence, e.g. 'AC' would omit alanine and cystine.\n",
+        "   \n",
+        "pssm_multi=0.0                    # A value between [0.0, 1.0], 0.0 means do not use pssm, 1.0 ignore MPNN predictions\n",
+        "pssm_threshold=0.0                # A value between -inf + inf to restric per position AAs\n",
+        "pssm_log_odds_flag=0               # 0 for False, 1 for True\n",
+        "pssm_bias_flag=0                   # 0 for False, 1 for True\n",
+        "\n",
+        "\n",
+        "##############################################################\n",
+        "\n",
+        "folder_for_outputs = out_folder\n",
+        "\n",
+        "NUM_BATCHES = num_seq_per_target//batch_size\n",
+        "BATCH_COPIES = batch_size\n",
+        "temperatures = [float(item) for item in sampling_temp.split()]\n",
+        "omit_AAs_list = omit_AAs\n",
+        "alphabet = 'ACDEFGHIKLMNPQRSTVWYX'\n",
+        "\n",
+        "omit_AAs_np = np.array([AA in omit_AAs_list for AA in alphabet]).astype(np.float32)\n",
+        "\n",
+        "chain_id_dict = None\n",
+        "fixed_positions_dict = None\n",
+        "pssm_dict = None\n",
+        "omit_AA_dict = None\n",
+        "bias_AA_dict = None\n",
+        "tied_positions_dict = None\n",
+        "bias_by_res_dict = None\n",
+        "bias_AAs_np = np.zeros(len(alphabet))\n",
+        "\n",
+        "\n",
+        "###############################################################\n",
+        "pdb_dict_list = parse_PDB(pdb_path, input_chain_list=chain_list)\n",
+        "dataset_valid = StructureDatasetPDB(pdb_dict_list, truncate=None, max_length=max_length)\n",
+        "\n",
+        "chain_id_dict = {}\n",
+        "chain_id_dict[pdb_dict_list[0]['name']]= (designed_chain_list, fixed_chain_list)\n",
+        "\n",
+        "print(chain_id_dict)\n",
+        "for chain in chain_list:\n",
+        "  l = len(pdb_dict_list[0][f\"seq_chain_{chain}\"])\n",
+        "  print(f\"Length of chain {chain} is {l}\")\n",
+        "\n",
+        "if homomer:\n",
+        "  tied_positions_dict = make_tied_positions_for_homomers(pdb_dict_list)\n",
+        "else:\n",
+        "  tied_positions_dict = None\n",
+        "\n",
+        "#################################################################\n",
+        "sequences = []\n",
+        "with torch.no_grad():\n",
+        "  print('Generating sequences...')\n",
+        "  for ix, protein in enumerate(dataset_valid):\n",
+        "    score_list = []\n",
+        "    all_probs_list = []\n",
+        "    all_log_probs_list = []\n",
+        "    S_sample_list = []\n",
+        "    batch_clones = [copy.deepcopy(protein) for i in range(BATCH_COPIES)]\n",
+        "    X, S, mask, lengths, chain_M, chain_encoding_all, chain_list_list, visible_list_list, masked_list_list, masked_chain_length_list_list, chain_M_pos, omit_AA_mask, residue_idx, dihedral_mask, tied_pos_list_of_lists_list, pssm_coef, pssm_bias, pssm_log_odds_all, bias_by_res_all, tied_beta = tied_featurize(batch_clones, device, chain_id_dict, fixed_positions_dict, omit_AA_dict, tied_positions_dict, pssm_dict, bias_by_res_dict)\n",
+        "    pssm_log_odds_mask = (pssm_log_odds_all > pssm_threshold).float() #1.0 for true, 0.0 for false\n",
+        "    name_ = batch_clones[0]['name']\n",
+        "\n",
+        "    randn_1 = torch.randn(chain_M.shape, device=X.device)\n",
+        "    log_probs = model(X, S, mask, chain_M*chain_M_pos, residue_idx, chain_encoding_all, randn_1)\n",
+        "    mask_for_loss = mask*chain_M*chain_M_pos\n",
+        "    scores = _scores(S, log_probs, mask_for_loss)\n",
+        "    native_score = scores.cpu().data.numpy()\n",
+        "\n",
+        "    for temp in temperatures:\n",
+        "        for j in range(NUM_BATCHES):\n",
+        "            randn_2 = torch.randn(chain_M.shape, device=X.device)\n",
+        "            if tied_positions_dict == None:\n",
+        "                sample_dict = model.sample(X, randn_2, S, chain_M, chain_encoding_all, residue_idx, mask=mask, temperature=temp, omit_AAs_np=omit_AAs_np, bias_AAs_np=bias_AAs_np, chain_M_pos=chain_M_pos, omit_AA_mask=omit_AA_mask, pssm_coef=pssm_coef, pssm_bias=pssm_bias, pssm_multi=pssm_multi, pssm_log_odds_flag=bool(pssm_log_odds_flag), pssm_log_odds_mask=pssm_log_odds_mask, pssm_bias_flag=bool(pssm_bias_flag), bias_by_res=bias_by_res_all)\n",
+        "                S_sample = sample_dict[\"S\"] \n",
+        "            else:\n",
+        "                sample_dict = model.tied_sample(X, randn_2, S, chain_M, chain_encoding_all, residue_idx, mask=mask, temperature=temp, omit_AAs_np=omit_AAs_np, bias_AAs_np=bias_AAs_np, chain_M_pos=chain_M_pos, omit_AA_mask=omit_AA_mask, pssm_coef=pssm_coef, pssm_bias=pssm_bias, pssm_multi=pssm_multi, pssm_log_odds_flag=bool(pssm_log_odds_flag), pssm_log_odds_mask=pssm_log_odds_mask, pssm_bias_flag=bool(pssm_bias_flag), tied_pos=tied_pos_list_of_lists_list[0], tied_beta=tied_beta, bias_by_res=bias_by_res_all)\n",
+        "            # Compute scores\n",
+        "                S_sample = sample_dict[\"S\"]\n",
+        "            log_probs = model(X, S_sample, mask, chain_M*chain_M_pos, residue_idx, chain_encoding_all, randn_2, use_input_decoding_order=True, decoding_order=sample_dict[\"decoding_order\"])\n",
+        "            mask_for_loss = mask*chain_M*chain_M_pos\n",
+        "            scores = _scores(S_sample, log_probs, mask_for_loss)\n",
+        "            scores = scores.cpu().data.numpy()\n",
+        "            all_probs_list.append(sample_dict[\"probs\"].cpu().data.numpy())\n",
+        "            all_log_probs_list.append(log_probs.cpu().data.numpy())\n",
+        "            S_sample_list.append(S_sample.cpu().data.numpy())\n",
+        "            for b_ix in range(BATCH_COPIES):\n",
+        "                masked_chain_length_list = masked_chain_length_list_list[b_ix]\n",
+        "                masked_list = masked_list_list[b_ix]\n",
+        "                seq_recovery_rate = torch.sum(torch.sum(torch.nn.functional.one_hot(S[b_ix], 21)*torch.nn.functional.one_hot(S_sample[b_ix], 21),axis=-1)*mask_for_loss[b_ix])/torch.sum(mask_for_loss[b_ix])\n",
+        "                seq = _S_to_seq(S_sample[b_ix], chain_M[b_ix])\n",
+        "                score = scores[b_ix]\n",
+        "                score_list.append(score)\n",
+        "                native_seq = _S_to_seq(S[b_ix], chain_M[b_ix])\n",
+        "                if b_ix == 0 and j==0 and temp==temperatures[0]:\n",
+        "                    start = 0\n",
+        "                    end = 0\n",
+        "                    list_of_AAs = []\n",
+        "                    for mask_l in masked_chain_length_list:\n",
+        "                        end += mask_l\n",
+        "                        list_of_AAs.append(native_seq[start:end])\n",
+        "                        start = end\n",
+        "                    native_seq = \"\".join(list(np.array(list_of_AAs)[np.argsort(masked_list)]))\n",
+        "                    l0 = 0\n",
+        "                    for mc_length in list(np.array(masked_chain_length_list)[np.argsort(masked_list)])[:-1]:\n",
+        "                        l0 += mc_length\n",
+        "                        native_seq = native_seq[:l0] + '/' + native_seq[l0:]\n",
+        "                        l0 += 1\n",
+        "                    sorted_masked_chain_letters = np.argsort(masked_list_list[0])\n",
+        "                    print_masked_chains = [masked_list_list[0][i] for i in sorted_masked_chain_letters]\n",
+        "                    sorted_visible_chain_letters = np.argsort(visible_list_list[0])\n",
+        "                    print_visible_chains = [visible_list_list[0][i] for i in sorted_visible_chain_letters]\n",
+        "                    native_score_print = np.format_float_positional(np.float32(native_score.mean()), unique=False, precision=4)\n",
+        "                    line = '>{}, score={}, fixed_chains={}, designed_chains={}, model_name={}\\n{}\\n'.format(name_, native_score_print, print_visible_chains, print_masked_chains, model_name, native_seq)\n",
+        "                    print(line.rstrip())\n",
+        "                start = 0\n",
+        "                end = 0\n",
+        "                list_of_AAs = []\n",
+        "                for mask_l in masked_chain_length_list:\n",
+        "                    end += mask_l\n",
+        "                    list_of_AAs.append(seq[start:end])\n",
+        "                    start = end\n",
+        "\n",
+        "                seq = \"\".join(list(np.array(list_of_AAs)[np.argsort(masked_list)]))\n",
+        "                l0 = 0\n",
+        "                for mc_length in list(np.array(masked_chain_length_list)[np.argsort(masked_list)])[:-1]:\n",
+        "                    l0 += mc_length\n",
+        "                    seq = seq[:l0] + '/' + seq[l0:]\n",
+        "                    l0 += 1\n",
+        "                score_print = np.format_float_positional(np.float32(score), unique=False, precision=4)\n",
+        "                seq_rec_print = np.format_float_positional(np.float32(seq_recovery_rate.detach().cpu().numpy()), unique=False, precision=4)\n",
+        "                line = '>T={}, sample={}, score={}, seq_recovery={}\\n{}\\n'.format(temp,b_ix,score_print,seq_rec_print,seq)\n",
+        "                sequences.append(seq)\n",
+        "                print(line.rstrip())\n",
+        "\n",
+        "\n",
+        "all_probs_concat = np.concatenate(all_probs_list)\n",
+        "all_log_probs_concat = np.concatenate(all_log_probs_list)\n",
+        "S_sample_concat = np.concatenate(S_sample_list)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "# Predict with AlphaFold2 (with single-sequence input)"
+      ],
+      "metadata": {
+        "id": "5mQ4VLG1dPsd"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "#@title Setup AlphaFold\n",
+        "\n",
+        "# import libraries\n",
+        "from IPython.utils import io\n",
+        "import os,sys,re\n",
+        "\n",
+        "if \"af_backprop\" not in sys.path:\n",
+        "  import tensorflow as tf\n",
+        "  import jax\n",
+        "  import jax.numpy as jnp\n",
+        "  import numpy as np\n",
+        "  import matplotlib\n",
+        "  from matplotlib import animation\n",
+        "  import matplotlib.pyplot as plt\n",
+        "  from IPython.display import HTML\n",
+        "  import tqdm.notebook\n",
+        "  TQDM_BAR_FORMAT = '{l_bar}{bar}| {n_fmt}/{total_fmt} [elapsed: {elapsed} remaining: {remaining}]'\n",
+        "\n",
+        "  with io.capture_output() as captured:\n",
+        "    # install ALPHAFOLD\n",
+        "    if not os.path.isdir(\"af_backprop\"):\n",
+        "      %shell git clone https://github.com/sokrypton/af_backprop.git\n",
+        "      %shell pip -q install biopython dm-haiku ml-collections py3Dmol\n",
+        "      %shell wget -qnc https://raw.githubusercontent.com/sokrypton/ColabFold/main/beta/colabfold.py\n",
+        "    if not os.path.isdir(\"params\"):\n",
+        "      %shell mkdir params\n",
+        "      %shell curl -fsSL https://storage.googleapis.com/alphafold/alphafold_params_2021-07-14.tar | tar x -C params\n",
+        "\n",
+        "  if not os.path.exists(\"MMalign\"):\n",
+        "    # install MMalign\n",
+        "    os.system(\"wget -qnc https://zhanggroup.org/MM-align/bin/module/MMalign.cpp\")\n",
+        "    os.system(\"g++ -static -O3 -ffast-math -o MMalign MMalign.cpp\")\n",
+        "\n",
+        "  def mmalign(pdb_a,pdb_b):\n",
+        "    # pass to MMalign\n",
+        "    output = os.popen(f'./MMalign {pdb_a} {pdb_b}')\n",
+        "    # parse outputs\n",
+        "    parse_float = lambda x: float(x.split(\"=\")[1].split()[0])\n",
+        "    tms = []\n",
+        "    for line in output:\n",
+        "      line = line.rstrip()\n",
+        "      if line.startswith(\"TM-score\"): tms.append(parse_float(line))\n",
+        "    return tms\n",
+        "\n",
+        "  # configure which device to use\n",
+        "  try:\n",
+        "    # check if TPU is available\n",
+        "    import jax.tools.colab_tpu\n",
+        "    jax.tools.colab_tpu.setup_tpu()\n",
+        "    print('Running on TPU')\n",
+        "    DEVICE = \"tpu\"\n",
+        "  except:\n",
+        "    if jax.local_devices()[0].platform == 'cpu':\n",
+        "      print(\"WARNING: no GPU detected, will be using CPU\")\n",
+        "      DEVICE = \"cpu\"\n",
+        "    else:\n",
+        "      print('Running on GPU')\n",
+        "      DEVICE = \"gpu\"\n",
+        "      # disable GPU on tensorflow\n",
+        "      tf.config.set_visible_devices([], 'GPU')\n",
+        "\n",
+        "  # import libraries\n",
+        "  sys.path.append('af_backprop')\n",
+        "  from utils import update_seq, update_aatype, get_plddt, get_pae\n",
+        "  import colabfold as cf\n",
+        "  from alphafold.common import protein as alphafold_protein\n",
+        "  from alphafold.data import pipeline\n",
+        "  from alphafold.model import data, config\n",
+        "  from alphafold.common import residue_constants\n",
+        "  from alphafold.model import model as alphafold_model\n",
+        "\n",
+        "# custom functions\n",
+        "def clear_mem():\n",
+        "  backend = jax.lib.xla_bridge.get_backend()\n",
+        "  for buf in backend.live_buffers(): buf.delete()\n",
+        "\n",
+        "def setup_model(max_len):\n",
+        "  clear_mem()\n",
+        "\n",
+        "  # setup model\n",
+        "  cfg = config.model_config(\"model_3_ptm\")\n",
+        "  cfg.model.num_recycle = 0\n",
+        "  cfg.data.common.num_recycle = 0\n",
+        "  cfg.data.eval.max_msa_clusters = 1\n",
+        "  cfg.data.common.max_extra_msa = 1\n",
+        "  cfg.data.eval.masked_msa_replace_fraction = 0\n",
+        "  cfg.model.global_config.subbatch_size = None\n",
+        "\n",
+        "  # get params\n",
+        "  model_param = data.get_model_haiku_params(model_name=\"model_3_ptm\", data_dir=\".\")\n",
+        "  model_runner = alphafold_model.RunModel(cfg, model_param, is_training=False, recycle_mode=\"none\")\n",
+        "\n",
+        "  model_params = []\n",
+        "  for k in [1,2,3,4,5]:\n",
+        "    if k == 3:\n",
+        "      model_params.append(model_param)\n",
+        "    else:\n",
+        "      params = data.get_model_haiku_params(model_name=f\"model_{k}_ptm\", data_dir=\".\")\n",
+        "      model_params.append({k: params[k] for k in model_runner.params.keys()})\n",
+        "\n",
+        "  seq = \"A\" * max_len\n",
+        "  length = len(seq)\n",
+        "  feature_dict = {\n",
+        "      **pipeline.make_sequence_features(sequence=seq, description=\"none\", num_res=length),\n",
+        "      **pipeline.make_msa_features(msas=[[seq]], deletion_matrices=[[[0]*length]])\n",
+        "  }\n",
+        "  inputs = model_runner.process_features(feature_dict,random_seed=0)\n",
+        "\n",
+        "  def runner(I, params):\n",
+        "    # update sequence\n",
+        "    inputs = I[\"inputs\"]\n",
+        "    inputs.update(I[\"prev\"])\n",
+        "\n",
+        "    seq = jax.nn.one_hot(I[\"seq\"],20)\n",
+        "    update_seq(seq, inputs)\n",
+        "    update_aatype(inputs[\"target_feat\"][...,1:], inputs)\n",
+        "\n",
+        "    # mask prediction\n",
+        "    mask = jnp.arange(inputs[\"residue_index\"].shape[0]) < I[\"length\"]\n",
+        "    inputs[\"seq_mask\"] = inputs[\"seq_mask\"].at[:].set(mask)\n",
+        "    inputs[\"msa_mask\"] = inputs[\"msa_mask\"].at[:].set(mask)\n",
+        "    inputs[\"residue_index\"] = jnp.where(mask, inputs[\"residue_index\"], 0)\n",
+        "\n",
+        "    # get prediction\n",
+        "    key = jax.random.PRNGKey(0)\n",
+        "    outputs = model_runner.apply(params, key, inputs)\n",
+        "\n",
+        "    prev = {\"init_msa_first_row\":outputs['representations']['msa_first_row'][None],\n",
+        "            \"init_pair\":outputs['representations']['pair'][None],\n",
+        "            \"init_pos\":outputs['structure_module']['final_atom_positions'][None]}\n",
+        "    \n",
+        "    aux = {\"final_atom_positions\":outputs[\"structure_module\"][\"final_atom_positions\"],\n",
+        "           \"final_atom_mask\":outputs[\"structure_module\"][\"final_atom_mask\"],\n",
+        "           \"plddt\":get_plddt(outputs),\"pae\":get_pae(outputs),\n",
+        "           \"length\":I[\"length\"], \"seq\":I[\"seq\"], \"prev\":prev,\n",
+        "           \"residue_idx\":inputs[\"residue_index\"][0]}\n",
+        "    return aux\n",
+        "\n",
+        "  return jax.jit(runner), model_params, {\"inputs\":inputs, \"length\":max_length}\n",
+        "\n",
+        "def save_pdb(outs, filename, Ls=None):\n",
+        "  '''save pdb coordinates'''\n",
+        "  p = {\"residue_index\":outs[\"residue_idx\"] + 1,\n",
+        "       \"aatype\":outs[\"seq\"],\n",
+        "       \"atom_positions\":outs[\"final_atom_positions\"],\n",
+        "       \"atom_mask\":outs[\"final_atom_mask\"],\n",
+        "       \"plddt\":outs[\"plddt\"]}\n",
+        "  p = jax.tree_map(lambda x:x[:outs[\"length\"]], p)\n",
+        "  b_factors = 100 * p.pop(\"plddt\")[:,None] * p[\"atom_mask\"]\n",
+        "  p = alphafold_protein.Protein(**p,b_factors=b_factors)\n",
+        "  pdb_lines = alphafold_protein.to_pdb(p)\n",
+        "  with open(filename, 'w') as f:\n",
+        "    f.write(pdb_lines)\n",
+        "  if Ls is not None:\n",
+        "    pdb_lines = cf.read_pdb_renum(filename, Ls)\n",
+        "    with open(filename, 'w') as f:\n",
+        "      f.write(pdb_lines)"
+      ],
+      "metadata": {
+        "cellView": "form",
+        "id": "4ZBUThXU7yY8"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "#@title Run AlphaFold\n",
+        "num_models = 1 #@param [\"1\",\"2\",\"3\",\"4\",\"5\"] {type:\"raw\"}\n",
+        "num_recycles = 1 #@param [\"0\",\"1\",\"2\",\"3\"] {type:\"raw\"}\n",
+        "num_sequences = len(sequences)\n",
+        "outs = []\n",
+        "positions = []\n",
+        "plddts = []\n",
+        "paes = []\n",
+        "LS = []\n",
+        "\n",
+        "with tqdm.notebook.tqdm(total=(num_recycles + 1) * num_models * num_sequences, bar_format=TQDM_BAR_FORMAT) as pbar:\n",
+        "  print(f\"seq_num model_num avg_pLDDT avg_pAE TMscore\")\n",
+        "  for s,ori_sequence in enumerate(sequences):\n",
+        "    Ls = [len(s) for s in ori_sequence.replace(\":\",\"/\").split(\"/\")]\n",
+        "    LS.append(Ls)\n",
+        "    sequence = re.sub(\"[^A-Z]\",\"\",ori_sequence)\n",
+        "    length = len(sequence)\n",
+        "\n",
+        "    # avoid recompiling if length within 25\n",
+        "    if \"max_len\" not in dir() or length > max_len or (max_len - length) > 25:\n",
+        "      max_len = length + 25\n",
+        "      runner, params, I = setup_model(max_len)\n",
+        "\n",
+        "    outs.append([])\n",
+        "    positions.append([])\n",
+        "    plddts.append([])\n",
+        "    paes.append([])\n",
+        "\n",
+        "    r = -1\n",
+        "    # pad sequence to max length\n",
+        "    seq = np.array([residue_constants.restype_order.get(aa,0) for aa in sequence])\n",
+        "    seq = np.pad(seq,[0,max_len-length],constant_values=-1)\n",
+        "    I[\"inputs\"]['residue_index'][:] = cf.chain_break(np.arange(max_len), Ls, length=32)\n",
+        "    I.update({\"seq\":seq, \"length\":length})\n",
+        "    \n",
+        "    # for each model\n",
+        "    for n in range(num_models):\n",
+        "      # restart recycle\n",
+        "      I[\"prev\"] = {'init_msa_first_row': np.zeros([1, max_len, 256]),\n",
+        "                  'init_pair': np.zeros([1, max_len, max_len, 128]),\n",
+        "                  'init_pos': np.zeros([1, max_len, 37, 3])}\n",
+        "      for r in range(num_recycles + 1):\n",
+        "        O = runner(I, params[n])\n",
+        "        O = jax.tree_map(lambda x:np.asarray(x), O)\n",
+        "        I[\"prev\"] = O[\"prev\"]\n",
+        "        pbar.update(1)\n",
+        "      \n",
+        "      positions[-1].append(O[\"final_atom_positions\"][:length])\n",
+        "      plddts[-1].append(O[\"plddt\"][:length])\n",
+        "      paes[-1].append(O[\"pae\"][:length,:length])\n",
+        "      outs[-1].append(O)\n",
+        "      save_pdb(outs[-1][-1], f\"out_seq_{s}_model_{n}.pdb\", Ls=LS[-1])\n",
+        "      tmscores = mmalign(pdb_path, f\"out_seq_{s}_model_{n}.pdb\")\n",
+        "      print(f\"{s} {n}\\t{plddts[-1][-1].mean():.3}\\t{paes[-1][-1].mean():.3}\\t{tmscores[-1]:.3}\")"
+      ],
+      "metadata": {
+        "cellView": "form",
+        "id": "p2uNokqudTSH"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "#@title Display 3D structure {run: \"auto\"}\n",
+        "#@markdown #### select which sequence to show (if more than one designed example)\n",
+        "seq_num = 0 #@param [\"0\",\"1\",\"2\",\"3\",\"4\",\"5\",\"6\",\"7\"] {type:\"raw\"}\n",
+        "assert seq_num < len(outs), f\"ERROR: seq_num ({seq_num}) exceeds number of designed sequences ({num_sequences})\"\n",
+        "model_num = 0 #@param [\"0\",\"1\",\"2\",\"3\",\"4\"] {type:\"raw\"}\n",
+        "assert model_num < len(outs[0]), f\"ERROR: model_num ({num_models}) exceeds number of model params used ({num_models})\"\n",
+        "#@markdown #### options\n",
+        "\n",
+        "color = \"confidence\" #@param [\"chain\", \"confidence\", \"rainbow\"]\n",
+        "if color == \"confidence\": color = \"lDDT\"\n",
+        "show_sidechains = False #@param {type:\"boolean\"}\n",
+        "show_mainchains = False #@param {type:\"boolean\"}\n",
+        "\n",
+        "v = cf.show_pdb(f\"out_seq_{seq_num}_model_{model_num}.pdb\", show_sidechains, show_mainchains, color,\n",
+        "                color_HP=True, size=(800,480), Ls=LS[seq_num])       \n",
+        "v.setHoverable({}, True,\n",
+        "               '''function(atom,viewer,event,container){if(!atom.label){atom.label=viewer.addLabel(\"      \"+atom.resn+\":\"+atom.resi,{position:atom,backgroundColor:'mintcream',fontColor:'black'});}}''',\n",
+        "               '''function(atom,viewer){if(atom.label){viewer.removeLabel(atom.label);delete atom.label;}}''')\n",
+        "v.show()           \n",
+        "if color == \"lDDT\":\n",
+        "  cf.plot_plddt_legend().show()\n",
+        "\n",
+        "# add confidence plots\n",
+        "cf.plot_confidence(plddts[seq_num][model_num]*100, paes[seq_num][model_num], Ls=LS[seq_num]).show()"
+      ],
+      "metadata": {
+        "cellView": "form",
+        "id": "0TNhcwok8d_w"
+      },
+      "execution_count": null,
+      "outputs": []
+    }
+  ],
+  "metadata": {
+    "colab": {
+      "name": "quickdemo_wAF2.ipynb",
+      "provenance": [],
+      "include_colab_link": true
+    },
+    "kernelspec": {
+      "display_name": "Python 3",
+      "name": "python3"
+    },
+    "language_info": {
+      "name": "python"
+    },
+    "accelerator": "GPU",
+    "gpuClass": "standard"
+  },
+  "nbformat": 4,
+  "nbformat_minor": 0
+}

ProteinMPNN/examples/submit_example_1.sh ADDED Viewed

	@@ -0,0 +1,28 @@

+#!/bin/bash
+#SBATCH -p gpu
+#SBATCH --mem=32g
+#SBATCH --gres=gpu:rtx2080:1
+#SBATCH -c 2
+#SBATCH --output=example_1.out
+source activate mlfold
+folder_with_pdbs="../inputs/PDB_monomers/pdbs/"
+output_dir="../outputs/example_1_outputs"
+if [ ! -d $output_dir ]
+then
+    mkdir -p $output_dir
+fi
+path_for_parsed_chains=$output_dir"/parsed_pdbs.jsonl"
+python ../helper_scripts/parse_multiple_chains.py --input_path=$folder_with_pdbs --output_path=$path_for_parsed_chains
+python ../protein_mpnn_run.py \
+        --jsonl_path $path_for_parsed_chains \
+        --out_folder $output_dir \
+        --num_seq_per_target 2 \
+        --sampling_temp "0.1" \
+        --seed 37 \
+        --batch_size 1

ProteinMPNN/examples/submit_example_2.sh ADDED Viewed

	@@ -0,0 +1,34 @@

+#!/bin/bash
+#SBATCH -p gpu
+#SBATCH --mem=32g
+#SBATCH --gres=gpu:rtx2080:1
+#SBATCH -c 2
+#SBATCH --output=example_2.out
+source activate mlfold
+folder_with_pdbs="../inputs/PDB_complexes/pdbs/"
+output_dir="../outputs/example_2_outputs"
+if [ ! -d $output_dir ]
+then
+    mkdir -p $output_dir
+fi
+path_for_parsed_chains=$output_dir"/parsed_pdbs.jsonl"
+path_for_assigned_chains=$output_dir"/assigned_pdbs.jsonl"
+chains_to_design="A B"
+python ../helper_scripts/parse_multiple_chains.py --input_path=$folder_with_pdbs --output_path=$path_for_parsed_chains
+python ../helper_scripts/assign_fixed_chains.py --input_path=$path_for_parsed_chains --output_path=$path_for_assigned_chains --chain_list "$chains_to_design"
+python ../protein_mpnn_run.py \
+        --jsonl_path $path_for_parsed_chains \
+        --chain_id_jsonl $path_for_assigned_chains \
+        --out_folder $output_dir \
+        --num_seq_per_target 2 \
+        --sampling_temp "0.1" \
+        --seed 37 \
+        --batch_size 1

ProteinMPNN/examples/submit_example_3.sh ADDED Viewed

	@@ -0,0 +1,27 @@

+#!/bin/bash
+#SBATCH -p gpu
+#SBATCH --mem=32g
+#SBATCH --gres=gpu:rtx2080:1
+#SBATCH -c 3
+#SBATCH --output=example_3.out
+source activate mlfold
+path_to_PDB="../inputs/PDB_complexes/pdbs/3HTN.pdb"
+output_dir="../outputs/example_3_outputs"
+if [ ! -d $output_dir ]
+then
+    mkdir -p $output_dir
+fi
+chains_to_design="A B"
+python ../protein_mpnn_run.py \
+        --pdb_path $path_to_PDB \
+        --pdb_path_chains "$chains_to_design" \
+        --out_folder $output_dir \
+        --num_seq_per_target 2 \
+        --sampling_temp "0.1" \
+        --seed 37 \
+        --batch_size 1

ProteinMPNN/examples/submit_example_3_score_only.sh ADDED Viewed

	@@ -0,0 +1,28 @@

+#!/bin/bash
+#SBATCH -p gpu
+#SBATCH --mem=32g
+#SBATCH --gres=gpu:rtx2080:1
+#SBATCH -c 3
+#SBATCH --output=example_3.out
+source activate mlfold
+path_to_PDB="../inputs/PDB_complexes/pdbs/3HTN.pdb"
+output_dir="../outputs/example_3_score_only_outputs"
+if [ ! -d $output_dir ]
+then
+    mkdir -p $output_dir
+fi
+chains_to_design="A B"
+python ../protein_mpnn_run.py \
+        --pdb_path $path_to_PDB \
+        --pdb_path_chains "$chains_to_design" \
+        --out_folder $output_dir \
+        --num_seq_per_target 10 \
+        --sampling_temp "0.1" \
+        --score_only 1 \
+        --seed 37 \
+        --batch_size 1

ProteinMPNN/examples/submit_example_3_score_only_from_fasta.sh ADDED Viewed

	@@ -0,0 +1,30 @@

+#!/bin/bash
+#SBATCH -p gpu
+#SBATCH --mem=32g
+#SBATCH --gres=gpu:rtx2080:1
+#SBATCH -c 3
+#SBATCH --output=example_3_from_fasta.out
+source activate mlfold
+path_to_PDB="../inputs/PDB_complexes/pdbs/3HTN.pdb"
+path_to_fasta="/home/justas/projects/github/ProteinMPNN/outputs/example_3_outputs/seqs/3HTN.fa"
+output_dir="../outputs/example_3_score_only_from_fasta_outputs"
+if [ ! -d $output_dir ]
+then
+    mkdir -p $output_dir
+fi
+chains_to_design="A B"
+python ../protein_mpnn_run.py \
+        --path_to_fasta $path_to_fasta \
+        --pdb_path $path_to_PDB \
+        --pdb_path_chains "$chains_to_design" \
+        --out_folder $output_dir \
+        --num_seq_per_target 5 \
+        --sampling_temp "0.1" \
+        --score_only 1 \
+        --seed 13 \
+        --batch_size 1

ProteinMPNN/examples/submit_example_4.sh ADDED Viewed

	@@ -0,0 +1,40 @@

+#!/bin/bash
+#SBATCH -p gpu
+#SBATCH --mem=32g
+#SBATCH --gres=gpu:rtx2080:1
+#SBATCH -c 3
+#SBATCH --output=example_4.out
+source activate mlfold
+folder_with_pdbs="../inputs/PDB_complexes/pdbs/"
+output_dir="../outputs/example_4_outputs"
+if [ ! -d $output_dir ]
+then
+    mkdir -p $output_dir
+fi
+path_for_parsed_chains=$output_dir"/parsed_pdbs.jsonl"
+path_for_assigned_chains=$output_dir"/assigned_pdbs.jsonl"
+path_for_fixed_positions=$output_dir"/fixed_pdbs.jsonl"
+chains_to_design="A C"
+#The first amino acid in the chain corresponds to 1 and not PDB residues index for now.
+fixed_positions="1 2 3 4 5 6 7 8 23 25, 10 11 12 13 14 15 16 17 18 19 20 40" #fixing/not designing residues 1 2 3...25 in chain A and residues 10 11 12...40 in chain C
+python ../helper_scripts/parse_multiple_chains.py --input_path=$folder_with_pdbs --output_path=$path_for_parsed_chains
+python ../helper_scripts/assign_fixed_chains.py --input_path=$path_for_parsed_chains --output_path=$path_for_assigned_chains --chain_list "$chains_to_design"
+python ../helper_scripts/make_fixed_positions_dict.py --input_path=$path_for_parsed_chains --output_path=$path_for_fixed_positions --chain_list "$chains_to_design" --position_list "$fixed_positions"
+python ../protein_mpnn_run.py \
+        --jsonl_path $path_for_parsed_chains \
+        --chain_id_jsonl $path_for_assigned_chains \
+        --fixed_positions_jsonl $path_for_fixed_positions \
+        --out_folder $output_dir \
+        --num_seq_per_target 2 \
+        --sampling_temp "0.1" \
+        --seed 37 \
+        --batch_size 1

ProteinMPNN/examples/submit_example_4_non_fixed.sh ADDED Viewed

	@@ -0,0 +1,40 @@

+#!/bin/bash
+#SBATCH -p gpu
+#SBATCH --mem=32g
+#SBATCH --gres=gpu:rtx2080:1
+#SBATCH -c 3
+#SBATCH --output=example_4_non_fixed.out
+source activate mlfold
+folder_with_pdbs="../inputs/PDB_complexes/pdbs/"
+output_dir="../outputs/example_4_non_fixed_outputs"
+if [ ! -d $output_dir ]
+then
+    mkdir -p $output_dir
+fi
+path_for_parsed_chains=$output_dir"/parsed_pdbs.jsonl"
+path_for_assigned_chains=$output_dir"/assigned_pdbs.jsonl"
+path_for_fixed_positions=$output_dir"/fixed_pdbs.jsonl"
+chains_to_design="A C"
+#The first amino acid in the chain corresponds to 1 and not PDB residues index for now.
+design_only_positions="1 2 3 4 5 6 7 8 9 10, 3 4 5 6 7 8" #design only these residues; use flag --specify_non_fixed
+python ../helper_scripts/parse_multiple_chains.py --input_path=$folder_with_pdbs --output_path=$path_for_parsed_chains
+python ../helper_scripts/assign_fixed_chains.py --input_path=$path_for_parsed_chains --output_path=$path_for_assigned_chains --chain_list "$chains_to_design"
+python ../helper_scripts/make_fixed_positions_dict.py --input_path=$path_for_parsed_chains --output_path=$path_for_fixed_positions --chain_list "$chains_to_design" --position_list "$design_only_positions" --specify_non_fixed
+python ../protein_mpnn_run.py \
+        --jsonl_path $path_for_parsed_chains \
+        --chain_id_jsonl $path_for_assigned_chains \
+        --fixed_positions_jsonl $path_for_fixed_positions \
+        --out_folder $output_dir \
+        --num_seq_per_target 2 \
+        --sampling_temp "0.1" \
+        --seed 37 \
+        --batch_size 1

ProteinMPNN/examples/submit_example_5.sh ADDED Viewed

	@@ -0,0 +1,44 @@

+#!/bin/bash
+#SBATCH -p gpu
+#SBATCH --mem=32g
+#SBATCH --gres=gpu:rtx2080:1
+#SBATCH -c 3
+#SBATCH --output=example_5.out
+source activate mlfold
+folder_with_pdbs="../inputs/PDB_complexes/pdbs/"
+output_dir="../outputs/example_5_outputs"
+if [ ! -d $output_dir ]
+then
+    mkdir -p $output_dir
+fi
+path_for_parsed_chains=$output_dir"/parsed_pdbs.jsonl"
+path_for_assigned_chains=$output_dir"/assigned_pdbs.jsonl"
+path_for_fixed_positions=$output_dir"/fixed_pdbs.jsonl"
+path_for_tied_positions=$output_dir"/tied_pdbs.jsonl"
+chains_to_design="A C"
+fixed_positions="9 10 11 12 13 14 15 16 17 18 19 20 21 22 23, 10 11 18 19 20 22"
+tied_positions="1 2 3 4 5 6 7 8, 1 2 3 4 5 6 7 8" #two list must match in length; residue 1 in chain A and C will be sampled togther;
+python ../helper_scripts/parse_multiple_chains.py --input_path=$folder_with_pdbs --output_path=$path_for_parsed_chains
+python ../helper_scripts/assign_fixed_chains.py --input_path=$path_for_parsed_chains --output_path=$path_for_assigned_chains --chain_list "$chains_to_design"
+python ../helper_scripts/make_fixed_positions_dict.py --input_path=$path_for_parsed_chains --output_path=$path_for_fixed_positions --chain_list "$chains_to_design" --position_list "$fixed_positions"
+python ../helper_scripts/make_tied_positions_dict.py --input_path=$path_for_parsed_chains --output_path=$path_for_tied_positions --chain_list "$chains_to_design" --position_list "$tied_positions"
+python ../protein_mpnn_run.py \
+        --jsonl_path $path_for_parsed_chains \
+        --chain_id_jsonl $path_for_assigned_chains \
+        --fixed_positions_jsonl $path_for_fixed_positions \
+        --tied_positions_jsonl $path_for_tied_positions \
+        --out_folder $output_dir \
+        --num_seq_per_target 2 \
+        --sampling_temp "0.1" \
+        --seed 37 \
+        --batch_size 1

ProteinMPNN/examples/submit_example_6.sh ADDED Viewed

	@@ -0,0 +1,34 @@

+#!/bin/bash
+#SBATCH -p gpu
+#SBATCH --mem=32g
+#SBATCH --gres=gpu:rtx2080:1
+#SBATCH -c 3
+#SBATCH --output=example_6.out
+source activate mlfold
+folder_with_pdbs="../inputs/PDB_homooligomers/pdbs/"
+output_dir="../outputs/example_6_outputs"
+if [ ! -d $output_dir ]
+then
+    mkdir -p $output_dir
+fi
+path_for_parsed_chains=$output_dir"/parsed_pdbs.jsonl"
+path_for_tied_positions=$output_dir"/tied_pdbs.jsonl"
+path_for_designed_sequences=$output_dir"/temp_0.1"
+python ../helper_scripts/parse_multiple_chains.py --input_path=$folder_with_pdbs --output_path=$path_for_parsed_chains
+python ../helper_scripts/make_tied_positions_dict.py --input_path=$path_for_parsed_chains --output_path=$path_for_tied_positions --homooligomer 1
+python ../protein_mpnn_run.py \
+        --jsonl_path $path_for_parsed_chains \
+        --tied_positions_jsonl $path_for_tied_positions \
+        --out_folder $output_dir \
+        --num_seq_per_target 2 \
+        --sampling_temp "0.2" \
+        --seed 37 \
+        --batch_size 1

ProteinMPNN/examples/submit_example_7.sh ADDED Viewed

	@@ -0,0 +1,29 @@

+#!/bin/bash
+#SBATCH -p gpu
+#SBATCH --mem=32g
+#SBATCH --gres=gpu:rtx2080:1
+#SBATCH -c 2
+#SBATCH --output=example_7.out
+source activate mlfold
+folder_with_pdbs="../inputs/PDB_monomers/pdbs/"
+output_dir="../outputs/example_7_outputs"
+if [ ! -d $output_dir ]
+then
+    mkdir -p $output_dir
+fi
+path_for_parsed_chains=$output_dir"/parsed_pdbs.jsonl"
+python ../helper_scripts/parse_multiple_chains.py --input_path=$folder_with_pdbs --output_path=$path_for_parsed_chains
+python ../protein_mpnn_run.py \
+        --jsonl_path $path_for_parsed_chains \
+        --out_folder $output_dir \
+        --num_seq_per_target 1 \
+        --sampling_temp "0.1" \
+        --unconditional_probs_only 1 \
+        --seed 37 \
+        --batch_size 1

ProteinMPNN/examples/submit_example_8.sh ADDED Viewed

	@@ -0,0 +1,34 @@

+#!/bin/bash
+#SBATCH -p gpu
+#SBATCH --mem=32g
+#SBATCH --gres=gpu:rtx2080:1
+#SBATCH -c 2
+#SBATCH --output=example_8.out
+source activate mlfold
+folder_with_pdbs="../inputs/PDB_monomers/pdbs/"
+output_dir="../outputs/example_8_outputs"
+if [ ! -d $output_dir ]
+then
+    mkdir -p $output_dir
+fi
+path_for_bias=$output_dir"/bias_pdbs.jsonl"
+#Adding global polar amino acid bias (Doug Tischer)
+AA_list="D E H K N Q R S T W Y"
+bias_list="1.39 1.39 1.39 1.39 1.39 1.39 1.39 1.39 1.39 1.39 1.39"
+python ../helper_scripts/make_bias_AA.py --output_path=$path_for_bias --AA_list="$AA_list" --bias_list="$bias_list"
+path_for_parsed_chains=$output_dir"/parsed_pdbs.jsonl"
+python ../helper_scripts/parse_multiple_chains.py --input_path=$folder_with_pdbs --output_path=$path_for_parsed_chains
+python ../protein_mpnn_run.py \
+        --jsonl_path $path_for_parsed_chains \
+        --out_folder $output_dir \
+        --bias_AA_jsonl $path_for_bias \
+        --num_seq_per_target 2 \
+        --sampling_temp "0.1" \
+        --seed 37 \
+        --batch_size 1

ProteinMPNN/examples/submit_example_pssm.sh ADDED Viewed

	@@ -0,0 +1,49 @@

+#!/bin/bash
+#SBATCH -p gpu
+#SBATCH --mem=32g
+#SBATCH --gres=gpu:rtx2080:1
+#SBATCH -c 2
+#SBATCH --output=example_2.out
+source activate mlfold
+#new_probabilities_using_PSSM = (1-pssm_multi*pssm_coef_gathered[:,None])*probs + pssm_multi*pssm_coef_gathered[:,None]*pssm_bias_gathered
+#probs - predictions from MPNN
+#pssm_bias_gathered - input PSSM bias (needs to be a probability distribution)
+#pssm_multi - a number between 0.0 (no bias) and 1.0 (no MPNN) inputed via flag --pssm_multi; this is a global number equally applied to all the residues
+#pssm_coef_gathered - a number between 0.0 (no bias) and 1.0 (no MPNN) inputed via ../helper_scripts/make_pssm_input_dict.py can be adjusted per residue level; i.e only apply PSSM bias to specific residues; or chains
+pssm_input_path="../inputs/PSSM_inputs"
+folder_with_pdbs="../inputs/PDB_complexes/pdbs/"
+output_dir="../outputs/example_pssm_outputs"
+if [ ! -d $output_dir ]
+then
+    mkdir -p $output_dir
+fi
+path_for_parsed_chains=$output_dir"/parsed_pdbs.jsonl"
+path_for_assigned_chains=$output_dir"/assigned_pdbs.jsonl"
+pssm=$output_dir"/pssm.jsonl"
+chains_to_design="A B"
+python ../helper_scripts/parse_multiple_chains.py --input_path=$folder_with_pdbs --output_path=$path_for_parsed_chains
+python ../helper_scripts/assign_fixed_chains.py --input_path=$path_for_parsed_chains --output_path=$path_for_assigned_chains --chain_list "$chains_to_design"
+python ../helper_scripts/make_pssm_input_dict.py --jsonl_input_path=$path_for_parsed_chains --PSSM_input_path=$pssm_input_path --output_path=$pssm
+python ../protein_mpnn_run.py \
+        --jsonl_path $path_for_parsed_chains \
+        --chain_id_jsonl $path_for_assigned_chains \
+        --out_folder $output_dir \
+        --num_seq_per_target 2 \
+        --sampling_temp "0.1" \
+        --seed 37 \
+        --batch_size 1 \
+        --pssm_jsonl $pssm \
+        --pssm_multi 0.3 \
+        --pssm_bias_flag 1

ProteinMPNN/helper_scripts/assign_fixed_chains.py ADDED Viewed

	@@ -0,0 +1,39 @@

+import argparse
+def main(args):
+    import json
+    with open(args.input_path, 'r') as json_file:
+        json_list = list(json_file)
+    global_designed_chain_list = []
+    if args.chain_list != '':
+        global_designed_chain_list = [str(item) for item in args.chain_list.split()]
+    my_dict = {}
+    for json_str in json_list:
+        result = json.loads(json_str)
+        all_chain_list = [item[-1:] for item in list(result) if item[:9]=='seq_chain'] #['A','B', 'C',...]
+        if len(global_designed_chain_list) > 0:
+            designed_chain_list = global_designed_chain_list
+        else:
+            #manually specify, e.g.
+            designed_chain_list = ["A"]
+        fixed_chain_list = [letter for letter in all_chain_list if letter not in designed_chain_list] #fix/do not redesign these chains
+        my_dict[result['name']]= (designed_chain_list, fixed_chain_list)
+    with open(args.output_path, 'w') as f:
+        f.write(json.dumps(my_dict) + '\n')
+if __name__ == "__main__":
+    argparser = argparse.ArgumentParser(formatter_class=argparse.ArgumentDefaultsHelpFormatter)
+    argparser.add_argument("--input_path", type=str, help="Path to the parsed PDBs")
+    argparser.add_argument("--output_path", type=str, help="Path to the output dictionary")
+    argparser.add_argument("--chain_list", type=str, default='', help="List of the chains that need to be designed")
+    args = argparser.parse_args()
+    main(args)
+# Output looks like this:
+# {"5TTA": [["A"], ["B"]], "3LIS": [["A"], ["B"]]}

ProteinMPNN/helper_scripts/make_bias_AA.py ADDED Viewed

	@@ -0,0 +1,27 @@

+import argparse
+def main(args):
+    import numpy as np
+    import json
+    bias_list = [float(item) for item in args.bias_list.split()]
+    AA_list = [str(item) for item in args.AA_list.split()]
+    my_dict = dict(zip(AA_list, bias_list))
+    with open(args.output_path, 'w') as f:
+        f.write(json.dumps(my_dict) + '\n')
+if __name__ == "__main__":
+    argparser = argparse.ArgumentParser(formatter_class=argparse.ArgumentDefaultsHelpFormatter)
+    argparser.add_argument("--output_path", type=str, help="Path to the output dictionary")
+    argparser.add_argument("--AA_list", type=str, default='', help="List of AAs to be biased")
+    argparser.add_argument("--bias_list", type=str, default='', help="AA bias strengths")
+    args = argparser.parse_args()
+    main(args)
+#e.g. output
+#{"A": -0.01, "G": 0.02}

ProteinMPNN/helper_scripts/make_bias_per_res_dict.py ADDED Viewed

	@@ -0,0 +1,53 @@

+import argparse
+def main(args):
+    import glob
+    import random
+    import numpy as np
+    import json
+    mpnn_alphabet = 'ACDEFGHIKLMNPQRSTVWYX'
+    mpnn_alphabet_dict = {'A': 0,'C': 1,'D': 2,'E': 3,'F': 4,'G': 5,'H': 6,'I': 7,'K': 8,'L': 9,'M': 10,'N': 11,'P': 12,'Q': 13,'R': 14,'S': 15,'T': 16,'V': 17,'W': 18,'Y': 19,'X': 20}
+    with open(args.input_path, 'r') as json_file:
+        json_list = list(json_file)
+    my_dict = {}
+    for json_str in json_list:
+        result = json.loads(json_str)
+        all_chain_list = [item[-1:] for item in list(result) if item[:10]=='seq_chain_']
+        bias_by_res_dict = {}
+        for chain in all_chain_list:
+            chain_length = len(result[f'seq_chain_{chain}'])
+            bias_per_residue = np.zeros([chain_length, 21])
+            if chain == 'A':
+                residues = [0, 1, 2, 3, 4, 5, 11, 12, 13, 14, 15]
+                amino_acids = [5, 9] #[G, L]
+                for res in residues:
+                    for aa in amino_acids:
+                        bias_per_residue[res, aa] = 100.5
+            if chain == 'C':
+                residues = [0, 1, 2, 3, 4, 5, 11, 12, 13, 14, 15]
+                amino_acids = range(21)[1:] #[G, L]
+                for res in residues:
+                    for aa in amino_acids:
+                        bias_per_residue[res, aa] = -100.5
+            bias_by_res_dict[chain] = bias_per_residue.tolist()
+        my_dict[result['name']] = bias_by_res_dict
+    with open(args.output_path, 'w') as f:
+        f.write(json.dumps(my_dict) + '\n')
+if __name__ == "__main__":
+    argparser = argparse.ArgumentParser(formatter_class=argparse.ArgumentDefaultsHelpFormatter)
+    argparser.add_argument("--input_path", type=str, help="Path to the parsed PDBs")
+    argparser.add_argument("--output_path", type=str, help="Path to the output dictionary")
+    args = argparser.parse_args()
+    main(args)

ProteinMPNN/helper_scripts/make_fixed_positions_dict.py ADDED Viewed

	@@ -0,0 +1,59 @@

+import argparse
+def main(args):
+    import glob
+    import random
+    import numpy as np
+    import json
+    import itertools
+    with open(args.input_path, 'r') as json_file:
+        json_list = list(json_file)
+    fixed_list = [[int(item) for item in one.split()] for one in args.position_list.split(",")]
+    global_designed_chain_list = [str(item) for item in args.chain_list.split()]
+    my_dict = {}
+    if not args.specify_non_fixed:
+        for json_str in json_list:
+            result = json.loads(json_str)
+            all_chain_list = [item[-1:] for item in list(result) if item[:9]=='seq_chain']
+            fixed_position_dict = {}
+            for i, chain in enumerate(global_designed_chain_list):
+                fixed_position_dict[chain] = fixed_list[i]
+            for chain in all_chain_list:
+                if chain not in global_designed_chain_list:
+                    fixed_position_dict[chain] = []
+            my_dict[result['name']] = fixed_position_dict
+    else:
+        for json_str in json_list:
+            result = json.loads(json_str)
+            all_chain_list = [item[-1:] for item in list(result) if item[:9]=='seq_chain']
+            fixed_position_dict = {}
+            for chain in all_chain_list:
+                seq_length = len(result[f'seq_chain_{chain}'])
+                all_residue_list = (np.arange(seq_length)+1).tolist()
+                if chain not in global_designed_chain_list:
+                    fixed_position_dict[chain] = all_residue_list
+                else:
+                    idx = np.argwhere(np.array(global_designed_chain_list) == chain)[0][0]
+                    fixed_position_dict[chain] = list(set(all_residue_list)-set(fixed_list[idx]))
+            my_dict[result['name']] = fixed_position_dict
+    with open(args.output_path, 'w') as f:
+        f.write(json.dumps(my_dict) + '\n')
+    #e.g. output
+    #{"5TTA": {"A": [1, 2, 3, 7, 8, 9, 22, 25, 33], "B": []}, "3LIS": {"A": [], "B": []}}
+if __name__ == "__main__":
+    argparser = argparse.ArgumentParser(formatter_class=argparse.ArgumentDefaultsHelpFormatter)
+    argparser.add_argument("--input_path", type=str, help="Path to the parsed PDBs")
+    argparser.add_argument("--output_path", type=str, help="Path to the output dictionary")
+    argparser.add_argument("--chain_list", type=str, default='', help="List of the chains that need to be fixed")
+    argparser.add_argument("--position_list", type=str, default='', help="Position lists, e.g. 11 12 14 18, 1 2 3 4 for first chain and the second chain")
+    argparser.add_argument("--specify_non_fixed", action="store_true", default=False, help="Allows specifying just residues that need to be designed (default: false)")
+    args = argparser.parse_args()
+    main(args)

ProteinMPNN/helper_scripts/make_pos_neg_tied_positions_dict.py ADDED Viewed

	@@ -0,0 +1,73 @@

+import argparse
+def main(args):
+    import glob
+    import random
+    import numpy as np
+    import json
+    import itertools
+    with open(args.input_path, 'r') as json_file:
+        json_list = list(json_file)
+    homooligomeric_state = args.homooligomer
+    if homooligomeric_state == 0:
+        tied_list = [[int(item) for item in one.split()] for one in args.position_list.split(",")]
+        global_designed_chain_list = [str(item) for item in args.chain_list.split()]
+        my_dict = {}
+        for json_str in json_list:
+            result = json.loads(json_str)
+            all_chain_list = sorted([item[-1:] for item in list(result) if item[:9]=='seq_chain']) #A, B, C, ...
+            tied_positions_list = []
+            for i, pos in enumerate(tied_list[0]):
+                temp_dict = {}
+                for j, chain in enumerate(global_designed_chain_list):
+                    temp_dict[chain] = [tied_list[j][i]] #needs to be a list
+                tied_positions_list.append(temp_dict)
+            my_dict[result['name']] = tied_positions_list
+    else:
+        if args.pos_neg_chain_list:
+            chain_list_input = [[str(item) for item in one.split()] for one in args.pos_neg_chain_list.split(",")]
+            chain_betas_input = [[float(item) for item in one.split()] for one in args.pos_neg_chain_betas.split(",")]
+            chain_list_flat = [item for sublist in chain_list_input for item in sublist]
+            chain_betas_flat = [item for sublist in chain_betas_input for item in sublist]
+            chain_betas_dict = dict(zip(chain_list_flat, chain_betas_flat))
+        my_dict = {}
+        for json_str in json_list:
+            result = json.loads(json_str)
+            all_chain_list = sorted([item[-1:] for item in list(result) if item[:9]=='seq_chain']) #A, B, C, ...
+            tied_positions_list = []
+            chain_length = len(result[f"seq_chain_{all_chain_list[0]}"])
+            for chains in chain_list_input:
+                for i in range(1,chain_length+1):
+                    temp_dict = {}
+                    for j, chain in enumerate(chains):
+                        if args.pos_neg_chain_list and chain in chain_list_flat:
+                            temp_dict[chain] = [[i], [chain_betas_dict[chain]]]
+                        else:
+                            temp_dict[chain] = [[i], [1.0]] #first list is for residue numbers, second list is for weights for the energy, +ive and -ive design
+                    tied_positions_list.append(temp_dict)
+            my_dict[result['name']] = tied_positions_list
+    with open(args.output_path, 'w') as f:
+        f.write(json.dumps(my_dict) + '\n')
+if __name__ == "__main__":
+    argparser = argparse.ArgumentParser(formatter_class=argparse.ArgumentDefaultsHelpFormatter)
+    argparser.add_argument("--input_path", type=str, help="Path to the parsed PDBs")
+    argparser.add_argument("--output_path", type=str, help="Path to the output dictionary")
+    argparser.add_argument("--chain_list", type=str, default='', help="List of the chains that need to be fixed")
+    argparser.add_argument("--position_list", type=str, default='', help="Position lists, e.g. 11 12 14 18, 1 2 3 4 for first chain and the second chain")
+    argparser.add_argument("--homooligomer", type=int, default=0, help="If 0 do not use, if 1 then design homooligomer")
+    argparser.add_argument("--pos_neg_chain_list", type=str, default='', help="Chain lists to be tied together")
+    argparser.add_argument("--pos_neg_chain_betas", type=str, default='', help="Chain beta list for the chain lists provided; 1.0 for the positive design, -0.1 or -0.5 for negative, 0.0 means do not use that chain info")
+    args = argparser.parse_args()
+    main(args)
+#e.g. output
+#{"5TTA": [], "3LIS": [{"A": [1], "B": [1]}, {"A": [2], "B": [2]}, {"A": [3], "B": [3]}, {"A": [4], "B": [4]}, {"A": [5], "B": [5]}, {"A": [6], "B": [6]}, {"A": [7], "B": [7]}, {"A": [8], "B": [8]}, {"A": [9], "B": [9]}, {"A": [10], "B": [10]}, {"A": [11], "B": [11]}, {"A": [12], "B": [12]}, {"A": [13], "B": [13]}, {"A": [14], "B": [14]}, {"A": [15], "B": [15]}, {"A": [16], "B": [16]}, {"A": [17], "B": [17]}, {"A": [18], "B": [18]}, {"A": [19], "B": [19]}, {"A": [20], "B": [20]}, {"A": [21], "B": [21]}, {"A": [22], "B": [22]}, {"A": [23], "B": [23]}, {"A": [24], "B": [24]}, {"A": [25], "B": [25]}, {"A": [26], "B": [26]}, {"A": [27], "B": [27]}, {"A": [28], "B": [28]}, {"A": [29], "B": [29]}, {"A": [30], "B": [30]}, {"A": [31], "B": [31]}, {"A": [32], "B": [32]}, {"A": [33], "B": [33]}, {"A": [34], "B": [34]}, {"A": [35], "B": [35]}, {"A": [36], "B": [36]}, {"A": [37], "B": [37]}, {"A": [38], "B": [38]}, {"A": [39], "B": [39]}, {"A": [40], "B": [40]}, {"A": [41], "B": [41]}, {"A": [42], "B": [42]}, {"A": [43], "B": [43]}, {"A": [44], "B": [44]}, {"A": [45], "B": [45]}, {"A": [46], "B": [46]}, {"A": [47], "B": [47]}, {"A": [48], "B": [48]}, {"A": [49], "B": [49]}, {"A": [50], "B": [50]}, {"A": [51], "B": [51]}, {"A": [52], "B": [52]}, {"A": [53], "B": [53]}, {"A": [54], "B": [54]}, {"A": [55], "B": [55]}, {"A": [56], "B": [56]}, {"A": [57], "B": [57]}, {"A": [58], "B": [58]}, {"A": [59], "B": [59]}, {"A": [60], "B": [60]}, {"A": [61], "B": [61]}, {"A": [62], "B": [62]}, {"A": [63], "B": [63]}, {"A": [64], "B": [64]}, {"A": [65], "B": [65]}, {"A": [66], "B": [66]}, {"A": [67], "B": [67]}, {"A": [68], "B": [68]}, {"A": [69], "B": [69]}, {"A": [70], "B": [70]}, {"A": [71], "B": [71]}, {"A": [72], "B": [72]}, {"A": [73], "B": [73]}, {"A": [74], "B": [74]}, {"A": [75], "B": [75]}, {"A": [76], "B": [76]}, {"A": [77], "B": [77]}, {"A": [78], "B": [78]}, {"A": [79], "B": [79]}, {"A": [80], "B": [80]}, {"A": [81], "B": [81]}, {"A": [82], "B": [82]}, {"A": [83], "B": [83]}, {"A": [84], "B": [84]}, {"A": [85], "B": [85]}, {"A": [86], "B": [86]}, {"A": [87], "B": [87]}, {"A": [88], "B": [88]}, {"A": [89], "B": [89]}, {"A": [90], "B": [90]}, {"A": [91], "B": [91]}, {"A": [92], "B": [92]}, {"A": [93], "B": [93]}, {"A": [94], "B": [94]}, {"A": [95], "B": [95]}, {"A": [96], "B": [96]}]}

ProteinMPNN/helper_scripts/make_pssm_input_dict.py ADDED Viewed

	@@ -0,0 +1,36 @@

+import argparse
+def main(args):
+    import json
+    import numpy as np
+    with open(args.jsonl_input_path, 'r') as json_file:
+        json_list = list(json_file)
+    my_dict = {}
+    for json_str in json_list:
+        result = json.loads(json_str)
+        all_chain_list = [item[-1:] for item in list(result) if item[:9]=='seq_chain']
+        path_to_PSSM = args.PSSM_input_path+"/"+result['name'] + ".npz"
+        print(path_to_PSSM)
+        pssm_input = np.load(path_to_PSSM)
+        pssm_dict = {}
+        for chain in all_chain_list:
+            pssm_dict[chain] = {}
+            pssm_dict[chain]['pssm_coef'] = pssm_input[chain+'_coef'].tolist() #[L] per position coefficient to trust PSSM; 0.0 - do not use it; 1.0 - just use PSSM only
+            pssm_dict[chain]['pssm_bias'] = pssm_input[chain+'_bias'].tolist() #[L,21] probability (sums up to 1.0 over alphabet of size 21) from PSSM
+            pssm_dict[chain]['pssm_log_odds'] = pssm_input[chain+'_odds'].tolist() #[L,21] log_odds ratios coming from PSSM; optional/not needed
+        my_dict[result['name']] = pssm_dict
+    #Write output to:
+    with open(args.output_path, 'w') as f:
+        f.write(json.dumps(my_dict) + '\n')
+if __name__ == "__main__":
+    argparser = argparse.ArgumentParser(formatter_class=argparse.ArgumentDefaultsHelpFormatter)
+    argparser.add_argument("--PSSM_input_path", type=str, help="Path to PSSMs saved as npz files.")
+    argparser.add_argument("--jsonl_input_path", type=str, help="Path where to load .jsonl dictionary of parsed pdbs.")
+    argparser.add_argument("--output_path", type=str, help="Path where to save .jsonl dictionary with PSSM bias.")
+    args = argparser.parse_args()
+    main(args)

ProteinMPNN/helper_scripts/make_tied_positions_dict.py ADDED Viewed

	@@ -0,0 +1,61 @@

+import argparse
+def main(args):
+    import glob
+    import random
+    import numpy as np
+    import json
+    import itertools
+    with open(args.input_path, 'r') as json_file:
+        json_list = list(json_file)
+    homooligomeric_state = args.homooligomer
+    if homooligomeric_state == 0:
+        tied_list = [[int(item) for item in one.split()] for one in args.position_list.split(",")]
+        global_designed_chain_list = [str(item) for item in args.chain_list.split()]
+        my_dict = {}
+        for json_str in json_list:
+            result = json.loads(json_str)
+            all_chain_list = sorted([item[-1:] for item in list(result) if item[:9]=='seq_chain']) #A, B, C, ...
+            tied_positions_list = []
+            for i, pos in enumerate(tied_list[0]):
+                temp_dict = {}
+                for j, chain in enumerate(global_designed_chain_list):
+                    temp_dict[chain] = [tied_list[j][i]] #needs to be a list
+                tied_positions_list.append(temp_dict)
+            my_dict[result['name']] = tied_positions_list
+    else:
+        my_dict = {}
+        for json_str in json_list:
+            result = json.loads(json_str)
+            all_chain_list = sorted([item[-1:] for item in list(result) if item[:9]=='seq_chain']) #A, B, C, ...
+            tied_positions_list = []
+            chain_length = len(result[f"seq_chain_{all_chain_list[0]}"])
+            for i in range(1,chain_length+1):
+                temp_dict = {}
+                for j, chain in enumerate(all_chain_list):
+                    temp_dict[chain] = [i] #needs to be a list
+                tied_positions_list.append(temp_dict)
+            my_dict[result['name']] = tied_positions_list
+    with open(args.output_path, 'w') as f:
+        f.write(json.dumps(my_dict) + '\n')
+if __name__ == "__main__":
+    argparser = argparse.ArgumentParser(formatter_class=argparse.ArgumentDefaultsHelpFormatter)
+    argparser.add_argument("--input_path", type=str, help="Path to the parsed PDBs")
+    argparser.add_argument("--output_path", type=str, help="Path to the output dictionary")
+    argparser.add_argument("--chain_list", type=str, default='', help="List of the chains that need to be fixed")
+    argparser.add_argument("--position_list", type=str, default='', help="Position lists, e.g. 11 12 14 18, 1 2 3 4 for first chain and the second chain")
+    argparser.add_argument("--homooligomer", type=int, default=0, help="If 0 do not use, if 1 then design homooligomer")
+    args = argparser.parse_args()
+    main(args)
+#e.g. output
+#{"5TTA": [], "3LIS": [{"A": [1], "B": [1]}, {"A": [2], "B": [2]}, {"A": [3], "B": [3]}, {"A": [4], "B": [4]}, {"A": [5], "B": [5]}, {"A": [6], "B": [6]}, {"A": [7], "B": [7]}, {"A": [8], "B": [8]}, {"A": [9], "B": [9]}, {"A": [10], "B": [10]}, {"A": [11], "B": [11]}, {"A": [12], "B": [12]}, {"A": [13], "B": [13]}, {"A": [14], "B": [14]}, {"A": [15], "B": [15]}, {"A": [16], "B": [16]}, {"A": [17], "B": [17]}, {"A": [18], "B": [18]}, {"A": [19], "B": [19]}, {"A": [20], "B": [20]}, {"A": [21], "B": [21]}, {"A": [22], "B": [22]}, {"A": [23], "B": [23]}, {"A": [24], "B": [24]}, {"A": [25], "B": [25]}, {"A": [26], "B": [26]}, {"A": [27], "B": [27]}, {"A": [28], "B": [28]}, {"A": [29], "B": [29]}, {"A": [30], "B": [30]}, {"A": [31], "B": [31]}, {"A": [32], "B": [32]}, {"A": [33], "B": [33]}, {"A": [34], "B": [34]}, {"A": [35], "B": [35]}, {"A": [36], "B": [36]}, {"A": [37], "B": [37]}, {"A": [38], "B": [38]}, {"A": [39], "B": [39]}, {"A": [40], "B": [40]}, {"A": [41], "B": [41]}, {"A": [42], "B": [42]}, {"A": [43], "B": [43]}, {"A": [44], "B": [44]}, {"A": [45], "B": [45]}, {"A": [46], "B": [46]}, {"A": [47], "B": [47]}, {"A": [48], "B": [48]}, {"A": [49], "B": [49]}, {"A": [50], "B": [50]}, {"A": [51], "B": [51]}, {"A": [52], "B": [52]}, {"A": [53], "B": [53]}, {"A": [54], "B": [54]}, {"A": [55], "B": [55]}, {"A": [56], "B": [56]}, {"A": [57], "B": [57]}, {"A": [58], "B": [58]}, {"A": [59], "B": [59]}, {"A": [60], "B": [60]}, {"A": [61], "B": [61]}, {"A": [62], "B": [62]}, {"A": [63], "B": [63]}, {"A": [64], "B": [64]}, {"A": [65], "B": [65]}, {"A": [66], "B": [66]}, {"A": [67], "B": [67]}, {"A": [68], "B": [68]}, {"A": [69], "B": [69]}, {"A": [70], "B": [70]}, {"A": [71], "B": [71]}, {"A": [72], "B": [72]}, {"A": [73], "B": [73]}, {"A": [74], "B": [74]}, {"A": [75], "B": [75]}, {"A": [76], "B": [76]}, {"A": [77], "B": [77]}, {"A": [78], "B": [78]}, {"A": [79], "B": [79]}, {"A": [80], "B": [80]}, {"A": [81], "B": [81]}, {"A": [82], "B": [82]}, {"A": [83], "B": [83]}, {"A": [84], "B": [84]}, {"A": [85], "B": [85]}, {"A": [86], "B": [86]}, {"A": [87], "B": [87]}, {"A": [88], "B": [88]}, {"A": [89], "B": [89]}, {"A": [90], "B": [90]}, {"A": [91], "B": [91]}, {"A": [92], "B": [92]}, {"A": [93], "B": [93]}, {"A": [94], "B": [94]}, {"A": [95], "B": [95]}, {"A": [96], "B": [96]}]}

ProteinMPNN/helper_scripts/other_tools/make_omit_AA.py ADDED Viewed

	@@ -0,0 +1,39 @@

+import glob
+import random
+import numpy as np
+import json
+import itertools
+#MODIFY this path
+with open('/home/justas/projects/lab_github/mpnn/data/pdbs.jsonl', 'r') as json_file:
+    json_list = list(json_file)
+my_dict = {}
+for json_str in json_list:
+    result = json.loads(json_str)
+    all_chain_list = [item[-1:] for item in list(result) if item[:9]=='seq_chain']
+    fixed_position_dict = {}
+    print(result['name'])
+    if result['name'] == '5TTA':
+        for chain in all_chain_list:
+            if chain == 'A':
+                fixed_position_dict[chain] = [
+                    [[int(item) for item in list(itertools.chain(list(np.arange(1,4)), list(np.arange(7,10)), [22, 25, 33]))], 'GPL'],
+                    [[int(item) for item in list(itertools.chain([40, 41, 42, 43]))], 'WC'],
+                    [[int(item) for item in list(itertools.chain(list(np.arange(50,150))))], 'ACEFGHIKLMNRSTVWYX'],
+                    [[int(item) for item in list(itertools.chain(list(np.arange(160,200))))], 'FGHIKLPQDMNRSTVWYX']]
+            else:
+                fixed_position_dict[chain] = []
+    else:
+        for chain in all_chain_list:
+            fixed_position_dict[chain] = []
+    my_dict[result['name']] = fixed_position_dict
+#MODIFY this path
+with open('/home/justas/projects/lab_github/mpnn/data/omit_AA.jsonl', 'w') as f:
+    f.write(json.dumps(my_dict) + '\n')
+print('Finished')
+#e.g. output
+#{"5TTA": {"A": [[[1, 2, 3, 7, 8, 9, 22, 25, 33], "GPL"], [[40, 41, 42, 43], "WC"], [[50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149], "ACEFGHIKLMNRSTVWYX"], [[160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199], "FGHIKLPQDMNRSTVWYX"]], "B": []}, "3LIS": {"A": [], "B": []}}

ProteinMPNN/helper_scripts/other_tools/make_pssm_dict.py ADDED Viewed

	@@ -0,0 +1,64 @@

+import pandas as pd
+import numpy as np
+import glob
+import random
+import numpy as np
+import json
+def softmax(x, T):
+    return np.exp(x/T)/np.sum(np.exp(x/T), -1, keepdims=True)
+def parse_pssm(path):
+    data = pd.read_csv(path, skiprows=2)
+    floats_list_list = []
+    for i in range(data.values.shape[0]):
+        str1 = data.values[i][0][4:]
+        floats_list = []
+        for item in str1.split():
+            floats_list.append(float(item))
+        floats_list_list.append(floats_list)
+    np_lines = np.array(floats_list_list)
+    return np_lines
+np_lines = parse_pssm('/home/swang523/RLcage/capsid/monomersfordesign/8-16-21/pssm_rainity_final_8-16-21_int/build_0.2089_0.98_0.4653_19_2.00_0.005745.pssm')
+mpnn_alphabet = 'ACDEFGHIKLMNPQRSTVWYX'
+input_alphabet = 'ARNDCQEGHILKMFPSTWYV'
+permutation_matrix = np.zeros([20,21])
+for i in range(20):
+    letter1 = input_alphabet[i]
+    for j in range(21):
+        letter2 = mpnn_alphabet[j]
+        if letter1 == letter2:
+            permutation_matrix[i,j]=1.
+pssm_log_odds = np_lines[:,:20] @ permutation_matrix
+pssm_probs = np_lines[:,20:40] @ permutation_matrix
+X_mask = np.concatenate([np.zeros([1,20]), np.ones([1,1])], -1)
+def softmax(x, T):
+    return np.exp(x/T)/np.sum(np.exp(x/T), -1, keepdims=True)
+#Load parsed PDBs:
+with open('/home/justas/projects/cages/parsed/test.jsonl', 'r') as json_file:
+    json_list = list(json_file)
+my_dict = {}
+for json_str in json_list:
+    result = json.loads(json_str)
+    all_chain_list = [item[-1:] for item in list(result) if item[:9]=='seq_chain']
+    pssm_dict = {}
+    for chain in all_chain_list:
+        pssm_dict[chain] = {}
+        pssm_dict[chain]['pssm_coef'] = (np.ones(len(result['seq_chain_A']))).tolist() #a number between 0.0 and 1.0 specifying how much attention put to PSSM, can be adjusted later as a flag
+        pssm_dict[chain]['pssm_bias'] = (softmax(pssm_log_odds-X_mask*1e8, 1.0)).tolist() #PSSM like, [length, 21] such that sum over the last dimension adds up to 1.0
+        pssm_dict[chain]['pssm_log_odds'] = (pssm_log_odds).tolist()
+    my_dict[result['name']] = pssm_dict
+#Write output to:
+with open('/home/justas/projects/lab_github/mpnn/data/pssm_dict.jsonl', 'w') as f:
+    f.write(json.dumps(my_dict) + '\n')

ProteinMPNN/helper_scripts/parse_multiple_chains.out ADDED Viewed

	@@ -0,0 +1 @@


1	+ Successfully finished: 2 pdbs

ProteinMPNN/helper_scripts/parse_multiple_chains.py ADDED Viewed

	@@ -0,0 +1,163 @@

+import argparse
+def main(args):
+    import numpy as np
+    import os, time, gzip, json
+    import glob
+    folder_with_pdbs_path = args.input_path
+    save_path = args.output_path
+    ca_only = args.ca_only
+    alpha_1 = list("ARNDCQEGHILKMFPSTWYV-")
+    states = len(alpha_1)
+    alpha_3 = ['ALA','ARG','ASN','ASP','CYS','GLN','GLU','GLY','HIS','ILE',
+               'LEU','LYS','MET','PHE','PRO','SER','THR','TRP','TYR','VAL','GAP']
+    aa_1_N = {a:n for n,a in enumerate(alpha_1)}
+    aa_3_N = {a:n for n,a in enumerate(alpha_3)}
+    aa_N_1 = {n:a for n,a in enumerate(alpha_1)}
+    aa_1_3 = {a:b for a,b in zip(alpha_1,alpha_3)}
+    aa_3_1 = {b:a for a,b in zip(alpha_1,alpha_3)}
+    def AA_to_N(x):
+      # ["ARND"] -> [[0,1,2,3]]
+      x = np.array(x);
+      if x.ndim == 0: x = x[None]
+      return [[aa_1_N.get(a, states-1) for a in y] for y in x]
+    def N_to_AA(x):
+      # [[0,1,2,3]] -> ["ARND"]
+      x = np.array(x);
+      if x.ndim == 1: x = x[None]
+      return ["".join([aa_N_1.get(a,"-") for a in y]) for y in x]
+    def parse_PDB_biounits(x, atoms=['N','CA','C'], chain=None):
+      '''
+      input:  x = PDB filename
+              atoms = atoms to extract (optional)
+      output: (length, atoms, coords=(x,y,z)), sequence
+      '''
+      xyz,seq,min_resn,max_resn = {},{},1e6,-1e6
+      for line in open(x,"rb"):
+        line = line.decode("utf-8","ignore").rstrip()
+        if line[:6] == "HETATM" and line[17:17+3] == "MSE":
+          line = line.replace("HETATM","ATOM  ")
+          line = line.replace("MSE","MET")
+        if line[:4] == "ATOM":
+          ch = line[21:22]
+          if ch == chain or chain is None:
+            atom = line[12:12+4].strip()
+            resi = line[17:17+3]
+            resn = line[22:22+5].strip()
+            x,y,z = [float(line[i:(i+8)]) for i in [30,38,46]]
+            if resn[-1].isalpha():
+                resa,resn = resn[-1],int(resn[:-1])-1
+            else:
+                resa,resn = "",int(resn)-1
+    #         resn = int(resn)
+            if resn < min_resn:
+                min_resn = resn
+            if resn > max_resn:
+                max_resn = resn
+            if resn not in xyz:
+                xyz[resn] = {}
+            if resa not in xyz[resn]:
+                xyz[resn][resa] = {}
+            if resn not in seq:
+                seq[resn] = {}
+            if resa not in seq[resn]:
+                seq[resn][resa] = resi
+            if atom not in xyz[resn][resa]:
+              xyz[resn][resa][atom] = np.array([x,y,z])
+      # convert to numpy arrays, fill in missing values
+      seq_,xyz_ = [],[]
+      try:
+          for resn in range(min_resn,max_resn+1):
+            if resn in seq:
+              for k in sorted(seq[resn]): seq_.append(aa_3_N.get(seq[resn][k],20))
+            else: seq_.append(20)
+            if resn in xyz:
+              for k in sorted(xyz[resn]):
+                for atom in atoms:
+                  if atom in xyz[resn][k]: xyz_.append(xyz[resn][k][atom])
+                  else: xyz_.append(np.full(3,np.nan))
+            else:
+              for atom in atoms: xyz_.append(np.full(3,np.nan))
+          return np.array(xyz_).reshape(-1,len(atoms),3), N_to_AA(np.array(seq_))
+      except TypeError:
+          return 'no_chain', 'no_chain'
+    pdb_dict_list = []
+    c = 0
+    if folder_with_pdbs_path[-1]!='/':
+        folder_with_pdbs_path = folder_with_pdbs_path+'/'
+    init_alphabet = ['A', 'B', 'C', 'D', 'E', 'F', 'G','H', 'I', 'J','K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T','U', 'V','W','X', 'Y', 'Z', 'a', 'b', 'c', 'd', 'e', 'f', 'g','h', 'i', 'j','k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't','u', 'v','w','x', 'y', 'z']
+    extra_alphabet = [str(item) for item in list(np.arange(300))]
+    chain_alphabet = init_alphabet + extra_alphabet
+    biounit_names = glob.glob(folder_with_pdbs_path+'*.pdb')
+    for biounit in biounit_names:
+        my_dict = {}
+        s = 0
+        concat_seq = ''
+        concat_N = []
+        concat_CA = []
+        concat_C = []
+        concat_O = []
+        concat_mask = []
+        coords_dict = {}
+        for letter in chain_alphabet:
+            if ca_only:
+                sidechain_atoms = ['CA']
+            else:
+                sidechain_atoms = ['N', 'CA', 'C', 'O']
+            xyz, seq = parse_PDB_biounits(biounit, atoms=sidechain_atoms, chain=letter)
+            if type(xyz) != str:
+                concat_seq += seq[0]
+                my_dict['seq_chain_'+letter]=seq[0]
+                coords_dict_chain = {}
+                if ca_only:
+                    coords_dict_chain['CA_chain_'+letter]=xyz.tolist()
+                else:
+                    coords_dict_chain['N_chain_' + letter] = xyz[:, 0, :].tolist()
+                    coords_dict_chain['CA_chain_' + letter] = xyz[:, 1, :].tolist()
+                    coords_dict_chain['C_chain_' + letter] = xyz[:, 2, :].tolist()
+                    coords_dict_chain['O_chain_' + letter] = xyz[:, 3, :].tolist()
+                my_dict['coords_chain_'+letter]=coords_dict_chain
+                s += 1
+        fi = biounit.rfind("/")
+        my_dict['name']=biounit[(fi+1):-4]
+        my_dict['num_of_chains'] = s
+        my_dict['seq'] = concat_seq
+        if s < len(chain_alphabet):
+            pdb_dict_list.append(my_dict)
+            c+=1
+    with open(save_path, 'w') as f:
+        for entry in pdb_dict_list:
+            f.write(json.dumps(entry) + '\n')
+if __name__ == "__main__":
+    argparser = argparse.ArgumentParser(formatter_class=argparse.ArgumentDefaultsHelpFormatter)
+    argparser.add_argument("--input_path", type=str, help="Path to a folder with pdb files, e.g. /home/my_pdbs/")
+    argparser.add_argument("--output_path", type=str, help="Path where to save .jsonl dictionary of parsed pdbs")
+    argparser.add_argument("--ca_only", action="store_true", default=False, help="parse a backbone-only structure (default: false)")
+    args = argparser.parse_args()
+    main(args)

ProteinMPNN/helper_scripts/parse_multiple_chains.sh ADDED Viewed

	@@ -0,0 +1,7 @@

+#!/bin/bash
+#SBATCH --mem=32g
+#SBATCH -c 2
+#SBATCH --output=parse_multiple_chains.out
+source activate mlfold
+python parse_multiple_chains.py --input_path='../PDB_complexes/pdbs/' --output_path='../PDB_complexes/parsed_pdbs.jsonl'

ProteinMPNN/inputs/PDB_complexes/pdbs/3HTN.pdb ADDED Viewed

The diff for this file is too large to render. See raw diff

ProteinMPNN/inputs/PDB_complexes/pdbs/4YOW.pdb ADDED Viewed

The diff for this file is too large to render. See raw diff

ProteinMPNN/inputs/PDB_homooligomers/pdbs/4GYT.pdb ADDED Viewed

The diff for this file is too large to render. See raw diff

ProteinMPNN/inputs/PDB_homooligomers/pdbs/6EHB.pdb ADDED Viewed

The diff for this file is too large to render. See raw diff

ProteinMPNN/inputs/PDB_monomers/pdbs/5L33.pdb ADDED Viewed

The diff for this file is too large to render. See raw diff

ProteinMPNN/inputs/PDB_monomers/pdbs/6MRR.pdb ADDED Viewed

The diff for this file is too large to render. See raw diff

ProteinMPNN/inputs/PSSM_inputs/3HTN.npz ADDED Viewed

Binary file (148 kB). View file

ProteinMPNN/inputs/PSSM_inputs/4YOW.npz ADDED Viewed

Binary file (240 kB). View file

ProteinMPNN/outputs/example_1_outputs/parsed_pdbs.jsonl ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ {"seq_chain_A": "GWSTELEKHREELKEFLKKEGITNVEIRIDNGRLEVRVEGGTERLKRFLEELRQKLEKKGYTVDIKIE", "coords_chain_A": {"N_chain_A": [[-15.113, 4.641, 12.533], [-13.275, 3.42, 10.93], [-10.741, 1.675, 9.445], [-7.432, 1.448, 9.871], [-5.644, -0.548, 8.854], [-7.205, -1.96, 6.899], [-7.793, 0.183, 5.237], [-5.26, 0.685, 4.162], [-4.958, -1.588, 2.516], [-7.177, -1.19, 0.885], [-6.291, 1.085, -0.528], [-4.142, -0.079, -1.945], [-5.518, -1.938, -3.517], [-7.137, -0.17, -5.019], [-5.151, 1.232, -6.404], [-4.087, -0.835, -8.02], [-6.444, -1.411, -9.542], [-6.711, 1.026, -10.874], [-4.295, 1.105, -12.41], [-4.894, -1.039, -14.08], [-7.091, -0.121, -15.482], [-9.668, -0.781, -14.596], [-12.955, -0.348, -13.741], [-15.301, -1.272, -12.622], [-15.546, -2.334, -10.044], [-16.758, -2.673, -6.77], [-16.165, -1.622, -3.469], [-16.378, -2.183, -0.036], [-15.967, -1.433, 3.409], [-16.551, -2.631, 6.58], [-16.843, -2.78, 9.914], [-14.464, -4.361, 10.3], [-14.612, -6.107, 8.198], [-13.921, -6.735, 4.835], [-15.439, -6.161, 1.804], [-15.834, -7.017, -1.536], [-17.223, -6.523, -4.647], [-17.77, -7.513, -7.955], [-18.767, -7.543, -11.345], [-17.695, -9.931, -13.085], [-14.398, -10.398, -14.406], [-10.967, -10.184, -15.446], [-8.395, -12.722, -15.59], [-6.205, -11.462, -14.409], [-7.298, -9.643, -12.523], [-8.704, -11.528, -10.949], [-6.649, -13.146, -9.873], [-5.274, -11.233, -8.368], [-7.233, -10.585, -6.439], [-7.641, -13.078, -5.203], [-5.146, -13.427, -3.951], [-5.083, -11.169, -2.282], [-7.277, -11.766, -0.57], [-6.278, -14.12, 0.645], [-4.045, -13.155, 2.06], [-5.234, -11.366, 3.89], [-7.155, -12.951, 5.25], [-5.415, -14.589, 6.647], [-4.611, -12.942, 8.681], [-6.891, -12.779, 10.155], [-9.004, -11.198, 9.278], [-12.305, -10.775, 7.928], [-13.164, -11.51, 4.65], [-15.046, -10.898, 1.956], [-15.534, -11.745, -1.147], [-17.085, -11.4, -4.207], [-17.565, -12.423, -7.43], [-19.448, -11.812, -10.016]], "CA_chain_A": [[-15.455, 3.353, 11.854], [-12.239, 3.522, 9.924], [-9.735, 0.662, 9.74], [-6.128, 1.8, 9.322], [-5.074, -1.624, 8.054], [-7.991, -2.219, 5.697], [-7.623, 1.317, 4.337], [-4.025, 0.475, 3.411], [-5.233, -2.549, 1.457], [-8.065, -0.527, -0.059], [-5.465, 1.902, -1.408], [-3.396, -0.941, -2.853], [-6.467, -2.459, -4.49], [-7.527, 0.902, -5.927], [-4.022, 1.506, -7.283], [-4.098, -1.901, -9.02], [-7.565, -1.214, -10.455], [-6.381, 2.179, -11.705], [-3.302, 0.671, -13.388], [-5.533, -1.961, -15.007], [-8.251, 0.462, -16.125], [-10.865, -1.492, -14.176], [-13.808, 0.635, -13.093], [-16.571, -1.877, -12.233], [-15.4, -3.01, -8.758], [-17.257, -1.959, -5.603], [-15.423, -1.976, -2.265], [-16.955, -1.707, 1.214], [-15.316, -1.837, 4.653], [-17.596, -2.568, 7.598], [-16.558, -3.484, 11.165], [-13.399, -5.329, 10.144], [-14.87, -6.942, 7.044], [-13.72, -6.182, 3.503], [-16.342, -6.715, 0.811], [-15.531, -6.62, -2.905], [-18.17, -7.079, -5.604], [-17.534, -7.115, -9.33], [-19.369, -8.381, -12.371], [-16.73, -10.481, -13.989], [-13.092, -9.792, -14.467], [-9.913, -10.939, -16.082], [-7.519, -13.504, -14.741], [-5.339, -10.574, -13.637], [-8.066, -9.217, -11.357], [-9.037, -12.676, -10.118], [-5.482, -13.557, -9.098], [-5.007, -10.2, -7.373], [-8.209, -10.695, -5.362], [-7.356, -14.301, -4.461], [-3.977, -13.177, -3.117], [-5.501, -10.237, -1.239], [-8.087, -12.495, 0.396], [-5.447, -15.036, 1.423], [-3.208, -12.371, 2.957], [-6.035, -10.78, 4.959], [-7.726, -14.004, 6.078], [-4.371, -15.076, 7.539], [-4.587, -12.019, 9.803], [-8.15, -12.851, 10.866], [-10.051, -10.372, 8.703], [-13.312, -11.38, 7.069], [-13.029, -10.997, 3.289], [-16.162, -11.398, 1.161], [-15.314, -11.41, -2.547], [-18.077, -11.972, -5.106], [-17.355, -12.096, -8.833], [-20.724, -12.228, -10.578]], "C_chain_A": [[-14.525, 3.068, 10.696], [-11.128, 2.581, 10.337], [-8.423, 1.057, 9.074], [-5.594, 0.705, 8.401], [-5.884, -1.859, 6.782], [-7.943, -1.043, 4.732], [-6.325, 1.21, 3.548], [-4.256, -0.489, 2.257], [-6.223, -2.003, 0.447], [-7.273, 0.337, -1.032], [-4.696, 1.044, -2.401], [-4.311, -1.534, -3.911], [-6.855, -1.387, -5.493], [-6.396, 1.25, -6.885], [-3.907, 0.448, -8.373], [-5.226, -1.69, -10.026], [-7.297, -0.052, -11.4], [-5.369, 1.804, -12.788], [-3.949, -0.183, -14.469], [-6.661, -1.332, -15.813], [-9.553, -0.226, -15.794], [-11.707, -0.553, -13.334], [-15.118, 0.051, -12.584], [-16.509, -2.642, -10.91], [-15.935, -2.111, -7.648], [-16.587, -2.519, -4.359], [-16.155, -1.378, -1.072], [-16.124, -2.248, 2.369], [-16.312, -1.604, 5.773], [-17.173, -3.42, 8.787], [-15.579, -4.644, 10.974], [-13.57, -6.281, 8.988], [-14.622, -6.128, 5.784], [-14.64, -6.92, 2.547], [-15.981, -6.143, -0.547], [-16.543, -7.305, -3.811], [-17.806, -6.597, -6.999], [-18.266, -8.076, -10.237], [-18.258, -8.767, -13.326], [-15.43, -9.727, -13.946], [-12.137, -10.73, -15.157], [-9.201, -11.812, -15.06], [-6.628, -12.616, -13.888], [-6.046, -10.077, -12.38], [-8.321, -10.379, -10.406], [-7.846, -13.1, -9.277], [-5.133, -12.518, -8.043], [-5.947, -10.345, -6.178], [-8.016, -11.971, -4.552], [-6.204, -14.094, -3.491], [-4.309, -12.212, -1.986], [-6.31, -10.947, -0.159], [-7.225, -13.41, 1.257], [-4.622, -14.283, 2.46], [-4.01, -11.836, 4.136], [-6.693, -11.844, 5.834], [-6.69, -14.56, 7.045], [-4.246, -14.212, 8.781], [-5.907, -11.982, 10.559], [-9.234, -11.922, 10.372], [-11.067, -11.248, 7.98], [-13.236, -10.706, 5.708], [-14.076, -11.68, 2.422], [-15.976, -10.869, -0.254], [-16.369, -12.137, -3.367], [-17.771, -11.49, -6.511], [-18.538, -12.681, -9.594], [-20.658, -12.365, -12.09]], "O_chain_A": [[-14.897, 2.519, 9.662], [-10.68, 2.634, 11.485], [-8.304, 0.991, 7.855], [-5.143, 0.977, 7.279], [-5.323, -1.971, 5.685], [-8.0, -1.245, 3.513], [-6.273, 1.603, 2.377], [-3.814, -0.247, 1.129], [-6.118, -2.31, -0.74], [-7.536, 0.331, -2.241], [-4.583, 1.398, -3.577], [-3.94, -1.609, -5.083], [-6.892, -1.646, -6.703], [-6.638, 1.55, -8.059], [-3.651, 0.78, -9.537], [-5.01, -1.776, -11.239], [-7.634, -0.111, -12.591], [-5.549, 2.134, -13.966], [-3.596, -0.085, -15.651], [-7.156, -1.972, -16.745], [-10.461, -0.263, -16.626], [-11.246, -0.047, -12.305], [-15.961, 0.814, -12.103], [-17.363, -3.495, -10.656], [-15.586, -0.928, -7.571], [-16.44, -3.737, -4.225], [-16.472, -0.181, -1.077], [-15.638, -3.382, 2.317], [-16.864, -0.507, 5.887], [-17.148, -4.645, 8.685], [-15.802, -5.764, 11.436], [-12.747, -7.189, 8.815], [-15.045, -4.971, 5.688], [-14.627, -8.155, 2.491], [-15.862, -4.925, -0.694], [-16.72, -8.526, -3.736], [-17.604, -5.4, -7.224], [-18.383, -9.265, -9.953], [-17.93, -8.033, -14.267], [-15.345, -8.577, -13.499], [-12.441, -11.902, -15.409], [-9.352, -11.668, -13.836], [-6.32, -12.97, -12.748], [-5.482, -10.097, -11.272], [-8.202, -10.239, -9.186], [-7.993, -13.366, -8.078], [-4.776, -12.872, -6.917], [-5.52, -10.263, -5.018], [-8.189, -11.956, -3.329], [-6.257, -14.545, -2.341], [-3.883, -12.411, -0.84], [-6.071, -10.762, 1.042], [-7.393, -13.461, 2.478], [-4.512, -14.709, 3.621], [-3.538, -11.859, 5.278], [-6.787, -11.672, 7.055], [-7.035, -14.941, 8.167], [-3.788, -14.689, 9.822], [-6.022, -11.231, 11.533], [-10.311, -11.879, 10.981], [-10.746, -12.329, 7.484], [-13.222, -9.474, 5.623], [-14.005, -12.897, 2.196], [-16.231, -9.692, -0.528], [-16.5, -13.357, -3.266], [-17.731, -10.283, -6.76], [-18.639, -13.904, -9.761], [-21.628, -12.801, -12.712]]}, "name": "6MRR", "num_of_chains": 1, "seq": "GWSTELEKHREELKEFLKKEGITNVEIRIDNGRLEVRVEGGTERLKRFLEELRQKLEKKGYTVDIKIE"}
2	+ {"seq_chain_A": "HMPEEEKAARLFIEALEKGDPELMRKVISPDTRMEDNGREFTGDEVVEYVKEIQKRGEQWHLRRYTKEGNSWRFEVQVDNNGQTEQWEVQIEVRNGRIKRVTITHV", "coords_chain_A": {"N_chain_A": [[37.0, 18.222, 51.819], [35.18, 19.045, 54.805], [33.142, 21.39, 56.357], [32.697, 22.256, 59.882], [30.075, 22.366, 60.868], [28.465, 21.048, 58.967], [29.669, 18.568, 59.079], [29.059, 17.634, 61.702], [26.271, 17.24, 61.58], [26.225, 15.306, 59.622], [27.541, 13.181, 60.918], [25.603, 12.501, 62.842], [23.621, 11.465, 61.194], [25.073, 9.367, 60.115], [25.367, 7.722, 62.376], [22.785, 6.789, 62.655], [22.499, 5.42, 60.214], [24.449, 3.414, 60.569], [23.344, 2.25, 62.7], [24.374, 2.554, 65.225], [24.763, 2.964, 68.494], [26.944, 3.77, 70.035], [28.442, 5.552, 68.362], [26.446, 7.553, 68.106], [26.246, 8.499, 70.748], [28.563, 9.948, 71.018], [28.108, 12.096, 69.352], [25.861, 13.648, 70.164], [24.504, 16.068, 72.578], [23.777, 16.707, 76.082], [21.518, 18.351, 75.963], [20.455, 18.057, 73.397], [17.593, 17.855, 71.366], [15.114, 15.757, 69.978], [12.531, 15.031, 67.686], [10.309, 12.633, 66.571], [7.796, 11.95, 64.523], [5.869, 12.982, 66.497], [6.728, 11.686, 68.914], [8.927, 12.6, 71.49], [12.386, 13.091, 72.157], [15.152, 13.567, 74.34], [18.607, 14.262, 74.992], [20.249, 11.415, 76.286], [18.136, 9.735, 75.635], [18.012, 9.917, 72.876], [20.144, 8.337, 72.204], [19.126, 5.939, 73.133], [17.028, 5.681, 71.32], [18.585, 4.982, 69.05], [19.592, 2.533, 69.856], [17.261, 1.04, 69.728], [16.741, 1.045, 67.012], [18.819, -0.538, 66.081], [18.18, -2.888, 67.459], [15.617, -3.575, 66.242], [16.392, -4.089, 63.726], [16.097, -2.152, 61.8], [16.701, -0.407, 58.831], [18.613, 2.321, 57.804], [19.364, 4.606, 55.165], [21.224, 7.164, 53.799], [20.201, 9.35, 51.263], [20.133, 11.886, 49.855], [20.945, 15.377, 50.442], [20.462, 18.834, 51.494], [21.245, 22.207, 52.499], [18.911, 23.991, 54.2], [17.94, 27.07, 54.88], [18.965, 27.602, 58.12], [19.227, 24.935, 58.758], [19.98, 21.458, 58.206], [18.725, 19.353, 55.737], [17.945, 16.136, 55.375], [17.279, 13.513, 53.17], [16.307, 10.372, 53.735], [16.304, 6.85, 53.277], [14.36, 4.368, 54.732], [14.218, 1.096, 55.78], [12.396, -1.822, 56.732], [11.509, -5.193, 57.372], [10.625, -5.836, 54.804], [9.326, -3.567, 53.956], [9.713, -0.36, 52.528], [10.495, 2.99, 53.244], [10.35, 6.575, 52.736], [11.862, 9.008, 54.514], [12.232, 12.164, 56.203], [14.836, 14.014, 57.563], [15.572, 16.566, 59.658], [18.051, 18.728, 60.656], [19.208, 20.958, 63.088], [22.156, 22.563, 63.663], [23.645, 24.166, 66.175], [26.134, 26.306, 67.255], [28.264, 24.429, 66.79], [27.157, 22.089, 67.871], [24.652, 19.433, 68.054], [21.475, 20.587, 67.962], [18.615, 20.547, 67.532], [16.573, 18.007, 66.237], [14.476, 16.857, 63.707], [12.331, 14.421, 62.706], [10.453, 13.297, 60.044], [7.846, 11.327, 58.803], [5.177, 10.579, 57.057]], "CA_chain_A": [[36.936, 18.773, 53.168], [33.829, 19.307, 55.268], [33.003, 22.335, 57.475], [32.383, 21.616, 61.147], [28.63, 22.278, 61.041], [27.969, 19.998, 58.095], [30.255, 17.336, 59.605], [28.319, 17.193, 62.883], [24.978, 16.74, 61.124], [26.544, 14.088, 58.891], [27.832, 12.133, 61.893], [24.312, 12.112, 63.413], [23.007, 10.631, 60.175], [25.91, 8.164, 60.045], [25.045, 6.895, 63.536], [21.519, 6.158, 62.308], [22.821, 4.501, 59.135], [25.19, 2.299, 61.114], [22.592, 1.729, 63.824], [25.424, 2.426, 66.209], [24.548, 3.825, 69.667], [28.216, 4.325, 70.466], [28.703, 6.763, 67.587], [25.452, 8.572, 68.459], [26.576, 9.042, 72.062], [29.65, 10.906, 70.821], [27.638, 13.226, 68.553], [24.834, 14.469, 70.801], [24.761, 16.77, 73.826], [22.683, 16.47, 77.021], [20.351, 19.16, 75.595], [19.926, 17.722, 72.07], [16.203, 17.421, 71.347], [14.694, 15.122, 68.734], [11.136, 14.692, 67.52], [10.007, 11.689, 65.501], [6.346, 11.777, 64.428], [5.194, 13.191, 67.767], [7.53, 11.223, 70.04], [10.037, 13.505, 71.773], [13.571, 12.523, 72.816], [16.186, 14.512, 74.768], [19.828, 13.667, 75.507], [20.521, 9.981, 76.157], [17.012, 9.105, 74.937], [18.376, 9.832, 71.466], [20.98, 7.138, 72.128], [18.19, 4.839, 73.298], [16.414, 5.594, 69.984], [19.465, 4.246, 68.135], [19.602, 1.154, 70.325], [16.045, 0.407, 69.247], [16.922, 0.85, 65.574], [19.706, -1.684, 66.003], [17.363, -4.009, 67.906], [14.558, -3.966, 65.32], [17.02, -4.306, 62.438], [15.647, -1.091, 60.918], [17.816, 0.015, 57.986], [18.627, 3.687, 57.284], [20.338, 4.921, 54.124], [21.169, 8.544, 53.342], [20.076, 9.473, 49.814], [20.579, 13.205, 49.457], [20.616, 16.466, 51.352], [20.718, 20.236, 51.207], [21.091, 23.102, 53.628], [17.708, 24.807, 54.042], [18.051, 28.065, 55.926], [20.011, 27.214, 59.057], [19.161, 23.525, 59.1], [20.28, 20.496, 57.163], [17.581, 18.51, 55.443], [18.39, 14.859, 54.825], [16.355, 12.518, 52.643], [16.831, 9.069, 54.073], [15.447, 5.708, 53.01], [14.308, 3.533, 55.936], [13.632, -0.203, 55.453], [12.292, -2.933, 57.665], [10.532, -6.258, 57.193], [10.11, -5.81, 53.444], [8.72, -2.286, 53.633], [10.651, 0.732, 52.332], [9.888, 4.302, 53.432], [11.125, 7.801, 52.577], [11.761, 9.852, 55.689], [12.975, 13.407, 56.146], [15.508, 14.281, 58.831], [15.997, 17.954, 59.618], [18.953, 18.92, 61.785], [19.776, 22.273, 63.355], [23.418, 22.556, 64.378], [23.916, 25.488, 66.721], [27.402, 26.364, 67.981], [28.974, 23.169, 66.639], [26.507, 20.955, 68.531], [23.321, 19.08, 67.562], [20.524, 21.296, 68.816], [17.262, 20.046, 67.347], [16.415, 17.077, 65.126], [13.085, 16.599, 63.409], [12.153, 13.313, 61.776], [9.088, 13.272, 59.537], [7.438, 10.301, 57.858], [3.735, 10.392, 57.071]], "C_chain_A": [[35.516, 19.181, 53.529], [33.835, 20.246, 56.472], [32.424, 21.668, 58.723], [30.875, 21.424, 61.352], [28.045, 21.144, 60.221], [28.428, 18.625, 58.602], [29.459, 16.766, 60.777], [26.977, 16.58, 62.491], [25.152, 15.408, 60.402], [26.83, 12.921, 59.83], [26.562, 11.609, 62.585], [23.641, 11.12, 62.474], [23.742, 9.311, 60.031], [25.597, 7.187, 61.182], [23.732, 6.145, 63.321], [21.733, 5.074, 61.25], [23.514, 3.243, 59.644], [24.422, 1.619, 62.24], [23.493, 1.584, 65.033], [25.271, 3.432, 67.346], [25.829, 4.473, 70.157], [28.568, 5.591, 69.685], [27.729, 7.873, 67.973], [25.765, 9.271, 69.778], [27.63, 10.133, 71.946], [29.225, 12.164, 70.07], [26.725, 14.189, 69.313], [25.32, 15.147, 72.075], [23.586, 16.509, 74.765], [21.4, 17.227, 76.67], [19.883, 19.001, 74.14], [18.479, 17.252, 72.148], [15.814, 16.879, 69.984], [13.215, 14.791, 68.796], [11.019, 13.732, 66.357], [8.511, 11.383, 65.495], [5.617, 11.906, 65.766], [5.97, 12.779, 69.006], [8.693, 12.188, 70.248], [11.14, 12.773, 72.507], [14.571, 13.629, 73.143], [17.412, 13.767, 75.307], [19.981, 12.179, 75.234], [19.346, 9.184, 75.593], [17.272, 8.974, 73.437], [19.264, 8.61, 71.249], [20.105, 5.892, 72.234], [17.42, 4.6, 71.996], [17.271, 4.767, 69.027], [19.569, 2.782, 68.557], [18.408, 0.378, 69.783], [16.076, 0.161, 67.749], [17.732, -0.411, 65.332], [18.974, -2.968, 66.397], [16.379, -4.492, 66.836], [15.096, -4.299, 63.928], [16.594, -3.31, 61.369], [16.837, -0.53, 60.144], [17.671, 1.442, 57.459], [19.7, 3.886, 56.222], [20.172, 6.366, 53.668], [21.132, 8.588, 51.81], [20.646, 10.787, 49.314], [20.267, 14.245, 50.522], [21.046, 17.824, 50.859], [20.686, 21.006, 52.51], [19.918, 24.019, 53.329], [17.639, 25.811, 55.176], [19.214, 27.689, 56.819], [19.807, 25.792, 59.584], [19.526, 22.664, 57.913], [19.11, 19.552, 56.986], [18.028, 17.274, 54.701], [17.2, 14.029, 54.397], [16.961, 11.15, 52.89], [15.812, 8.012, 53.681], [15.513, 4.779, 54.22], [13.552, 2.243, 55.625], [13.475, -1.054, 56.689], [11.196, -3.906, 57.263], [9.861, -6.201, 55.824], [9.414, -4.523, 53.038], [9.757, -1.167, 53.588], [9.951, 2.083, 52.441], [10.884, 5.43, 53.152], [10.835, 8.756, 53.723], [12.571, 11.114, 55.473], [13.554, 13.659, 57.513], [16.084, 15.679, 58.809], [16.836, 18.239, 60.853], [19.393, 20.364, 61.913], [21.031, 22.116, 64.2], [23.671, 23.966, 64.867], [25.19, 25.419, 67.548], [28.163, 25.044, 67.963], [28.334, 21.936, 67.267], [25.122, 20.677, 67.946], [22.274, 19.664, 68.485], [19.119, 20.714, 68.743], [17.234, 19.151, 66.116], [14.943, 16.776, 64.944], [12.96, 15.527, 62.351], [10.703, 13.266, 61.349], [8.891, 12.117, 58.572], [5.949, 10.026, 57.982], [3.267, 9.765, 55.77]], "O_chain_A": [[34.75, 19.627, 52.679], [34.466, 19.951, 57.486], [31.745, 20.64, 58.632], [30.444, 20.43, 61.936], [27.223, 20.37, 60.71], [27.666, 17.644, 58.56], [29.228, 15.556, 60.851], [26.587, 15.54, 63.01], [24.344, 14.49, 60.563], [26.43, 11.793, 59.561], [26.466, 10.42, 62.876], [23.133, 10.086, 62.899], [23.11, 8.253, 59.861], [25.565, 5.978, 60.976], [23.587, 5.004, 63.732], [21.208, 3.961, 61.365], [23.214, 2.143, 59.19], [24.797, 0.534, 62.681], [23.414, 0.602, 65.768], [25.628, 4.597, 67.202], [25.794, 5.601, 70.653], [28.946, 6.599, 70.283], [28.147, 9.011, 68.154], [25.567, 10.478, 69.921], [27.585, 11.13, 72.657], [29.904, 13.182, 70.145], [26.795, 15.401, 69.13], [26.4, 14.846, 72.582], [22.509, 16.12, 74.303], [20.32, 16.76, 77.023], [19.016, 19.744, 73.693], [18.171, 16.339, 72.914], [16.109, 17.485, 68.964], [12.702, 14.379, 69.847], [11.564, 13.984, 65.281], [8.025, 10.649, 66.357], [4.861, 11.02, 66.147], [5.876, 13.441, 70.042], [9.374, 12.559, 69.286], [10.86, 11.923, 73.362], [14.807, 14.512, 72.329], [17.271, 12.751, 75.992], [19.846, 11.725, 74.084], [19.535, 8.068, 75.133], [16.802, 8.045, 72.793], [19.141, 7.904, 70.238], [20.301, 4.917, 71.505], [17.186, 3.453, 71.614], [16.744, 3.928, 68.26], [19.568, 1.88, 67.714], [18.527, -0.796, 69.411], [15.514, -0.823, 67.273], [17.375, -1.244, 64.501], [19.117, -3.994, 65.744], [16.312, -5.687, 66.557], [14.345, -4.733, 63.062], [16.717, -3.592, 60.189], [17.874, -0.25, 60.724], [16.729, 1.724, 56.722], [20.833, 3.459, 56.399], [19.101, 6.761, 53.213], [21.922, 7.924, 51.14], [21.54, 10.807, 48.463], [19.441, 14.03, 51.403], [21.88, 17.95, 49.948], [20.154, 20.521, 53.506], [19.91, 24.707, 52.301], [17.308, 25.452, 56.305], [20.323, 27.467, 56.337], [20.162, 25.477, 60.718], [19.425, 23.084, 56.754], [18.544, 19.031, 57.959], [18.427, 17.355, 53.535], [16.231, 13.89, 55.157], [18.022, 10.811, 52.366], [14.608, 8.257, 53.699], [16.601, 4.461, 54.699], [12.391, 2.3, 55.229], [14.338, -1.043, 57.557], [10.101, -3.503, 56.86], [8.668, -6.468, 55.697], [8.959, -4.394, 51.905], [10.569, -1.042, 54.499], [8.934, 2.298, 51.817], [12.103, 5.252, 53.305], [9.722, 9.241, 53.893], [13.499, 11.135, 54.676], [12.845, 13.511, 58.513], [16.983, 15.964, 58.026], [16.401, 17.994, 61.976], [19.882, 20.939, 60.945], [20.986, 21.581, 65.294], [23.86, 24.873, 64.065], [25.32, 24.562, 68.418], [28.641, 24.59, 68.998], [28.902, 20.843, 67.213], [24.492, 21.583, 67.413], [22.176, 19.279, 69.665], [18.494, 20.423, 69.772], [17.819, 19.495, 65.079], [14.236, 16.503, 65.911], [13.432, 15.7, 61.239], [9.814, 13.225, 62.192], [9.69, 11.926, 57.664], [5.503, 9.336, 58.905], [3.96, 9.844, 54.754]]}, "name": "5L33", "num_of_chains": 1, "seq": "HMPEEEKAARLFIEALEKGDPELMRKVISPDTRMEDNGREFTGDEVVEYVKEIQKRGEQWHLRRYTKEGNSWRFEVQVDNNGQTEQWEVQIEVRNGRIKRVTITHV"}

ProteinMPNN/outputs/example_1_outputs/seqs/5L33.fa ADDED Viewed

	@@ -0,0 +1,6 @@

+>5L33, score=1.5874, global_score=1.5874, fixed_chains=[], designed_chains=['A'], model_name=v_48_020, git_hash=015ff820b9b5741ead6ba6795258f35a9c15e94b, seed=37
+HMPEEEKAARLFIEALEKGDPELMRKVISPDTRMEDNGREFTGDEVVEYVKEIQKRGEQWHLRRYTKEGNSWRFEVQVDNNGQTEQWEVQIEVRNGRIKRVTITHV
+>T=0.1, sample=1, score=0.8221, global_score=0.8221, seq_recovery=0.5094
+MINEEEKKALDFIEALEKADPELMKKVIEPDTKMEVNGKKYEGEEIVEFVKKLKEEGVKYKLLSYKKEGNKYVFEVEKSKNGVTKKITIEIEVENGKVKKIVITEK
+>T=0.1, sample=2, score=0.8356, global_score=0.8356, seq_recovery=0.4434
+SINEEEQKALDYIKALEKADPELMKKVITPDTKMTVNGKEYEGEEIVEYVKELKERGIKYKLLSYKKEGDKYVFTVERSENGKTYTITIEVKVKDGKVEEIVIKEE

ProteinMPNN/outputs/example_1_outputs/seqs/6MRR.fa ADDED Viewed

	@@ -0,0 +1,6 @@

+>6MRR, score=1.4683, global_score=1.4683, fixed_chains=[], designed_chains=['A'], model_name=v_48_020, git_hash=015ff820b9b5741ead6ba6795258f35a9c15e94b, seed=37
+GWSTELEKHREELKEFLKKEGITNVEIRIDNGRLEVRVEGGTERLKRFLEELRQKLEKKGYTVDIKIE
+>T=0.1, sample=1, score=0.9617, global_score=0.9617, seq_recovery=0.5000
+GMDEELEKYVKELKAFLKEKGINNVEIKIENGTLTIKMNGASKETREFLEKLKKELEEKGYKVNIEIS
+>T=0.1, sample=2, score=0.9513, global_score=0.9513, seq_recovery=0.4853
+GKDEELEKYVKELKKFLKEKGINNVKIEVKDGTLTIEMKGCSKETKDFLKKLKKELEKKGYKVNIKIY

ProteinMPNN/outputs/example_2_outputs/assigned_pdbs.jsonl ADDED Viewed

	@@ -0,0 +1 @@


1	+ {"3HTN": [["A", "B"], ["C"]], "4YOW": [["A", "B"], ["C", "D", "E", "F"]]}

ProteinMPNN/outputs/example_2_outputs/parsed_pdbs.jsonl ADDED Viewed

The diff for this file is too large to render. See raw diff

ProteinMPNN/outputs/example_2_outputs/seqs/3HTN.fa ADDED Viewed

	@@ -0,0 +1,6 @@

+>3HTN, score=1.4405, global_score=1.4946, fixed_chains=['C'], designed_chains=['A', 'B'], CA_model_name=v_48_020, git_hash=015ff820b9b5741ead6ba6795258f35a9c15e94b, seed=37
+NMYSYKKIGNKYIVSINNHTEIVKALNAFCKEKGILSGSINGIGAIGELTLRFFNPKTKAYDDKTFREQMEISNLTGNISSMNEQVYLHLHITVGRSDYSALAGHLLSAIQNGAGEFVVEDYSERISRTYNPDLGLNIYDFER/NMYSYKKIGNKYIVSINNHTEIVKALNAFCKEKGILSGSINGIGAIGELTLRFFNPKXXXXDDKTFREQMEISNLTGNISSMNEQVYLHLHITVGRSDYSALAGHLLSAIQNGAGEFVVEDYSERISRTYNPDLGLNIYDFER
+>T=0.1, sample=1, score=0.8450, global_score=1.0949, seq_recovery=0.5071
+KLYSYKEIGNKYIVSINVGTDLVEALKKFCEEKNIKSGTINGIGEVSKLTLKFYDFETKETELKTFEGNFTISNLTGLIYTYNGKIFLHLHVTFGDEDFSALAGHLVSATVLQEALLKVENYNENITAKFDEKLGLYLLDFNS/MSYKYKKIGNKYLVSINIGKDLVESLKEFVKEKNIKSGTINGIGGVSEVTLRFFDPEXXXXKERTFKGLFDISNLTGFISTKDGEPFLHLHATFGDEDFSALAGHLVSAKVSTGAELLVENYNVELTRKYDEKLGVYLLDFNA
+>T=0.1, sample=2, score=0.8471, global_score=1.0996, seq_recovery=0.5000
+MLYDYKKIGNKYFVKVNVDQDLVEALKEFCEELGIKSGTINGIGEVSEVTLRFFDFETKESVDKTFKEPFTISNLTGLISTYNGKIHLHLHITFSDKEFSALAGHLVSAKVLQEALLIVEDYGENITRKYDKETGLLLLDFNS/MLYKYKKIGNKYLIEINIGKDLVEALKEFVEEKNIKAGTINGIGMVEEVTLEYYDPKXXXXEKKTFEGLFEISNLTGFIYTKDGKPVLHLHVTFGDEDFSALAGHLVSAKVLGEAELLVEDYNVELTVKYDEERGEDLLDFNS

ProteinMPNN/outputs/example_2_outputs/seqs/4YOW.fa ADDED Viewed

	@@ -0,0 +1,6 @@

+>4YOW, score=1.3574, global_score=1.3913, fixed_chains=['C', 'D', 'E', 'F'], designed_chains=['A', 'B'], CA_model_name=v_48_020, git_hash=015ff820b9b5741ead6ba6795258f35a9c15e94b, seed=37
+MRIVAADTGGAVLDESFQPVGLIATVAVLVEKPYKTSKRFLVKYADPYNYDLSGRQAIRDEIELAIELAREVSPDVIHLNSTLGGIEVRKLDESTIDALQISDRGKEIWKELSKDLQPLAKKFWEETGIEIIAIGKSSVPVRIAEIYAGIFSVKWALDNVKEKGGLLVGLPRYMEVEIKKDKIIGKSLDPREGGLYGEVKTEVPQGIKWELYPNPLVRRFMVFEITS/XXXX
+>T=0.1, sample=1, score=0.8241, global_score=1.2059, seq_recovery=0.5154
+MKIVASDAGGYLLDEELKPIGRIAVVAVLVEKPFTSAKEYKVEYLDPEKYNLEGNDDLIKEFELAVELAKKYKPDVILLDLNLGGVELSELNPEVIEKLQISEETKEFLIKLSEILSPKAKEFKKETGIPILLAGGNSTAVKIAELLASAAAVKWALENVKEKGKLLIGLERAVEIEIEEDKIRARDLDPRYGGLYAEIDIKIPEGLKYEQYPNPFKPGEMVFEIEK/XXXX
+>T=0.1, sample=2, score=0.8195, global_score=1.2174, seq_recovery=0.5419
+MKIVAADAGGYLVDEDLKPIGRIAVVAVLVEKPFTSSKVYKVKYIDPEKADLNGNEDLRLELELAIELAKEYKPDIILLDLNLGGVELSELNEETIKKLQISEEAKKKLIELSKELSPLAKKFKEETGIPILLAGDNSVPVHIAEILASAEAVKWALENVKEKGEVKVLLHESVSIEIEEDKIKARSLDPRLGGLEAEIEIKIPEGIEYEQEPNPFRPHHMVFTAKV/XXXX

ProteinMPNN/outputs/example_3_outputs/seqs/3HTN.fa ADDED Viewed

	@@ -0,0 +1,6 @@

+>3HTN, score=1.1550, global_score=1.1955, fixed_chains=['C'], designed_chains=['A', 'B'], model_name=v_48_020, git_hash=015ff820b9b5741ead6ba6795258f35a9c15e94b, seed=37
+NMYSYKKIGNKYIVSINNHTEIVKALNAFCKEKGILSGSINGIGAIGELTLRFFNPKTKAYDDKTFREQMEISNLTGNISSMNEQVYLHLHITVGRSDYSALAGHLLSAIQNGAGEFVVEDYSERISRTYNPDLGLNIYDFER/NMYSYKKIGNKYIVSINNHTEIVKALNAFCKEKGILSGSINGIGAIGELTLRFFNPKXXXXDDKTFREQMEISNLTGNISSMNEQVYLHLHITVGRSDYSALAGHLLSAIQNGAGEFVVEDYSERISRTYNPDLGLNIYDFER
+>T=0.1, sample=1, score=0.7339, global_score=0.9189, seq_recovery=0.5390
+KLYDYEKIGNKYIVSIYNNTDIVKALKKFCEEKNIKSGTVNGIGQVKEVTLKFYNFETKESEEKTFKKNFTISNLTGFISEHDGKIFLDLHITFGDENFSALAGHLVSAIVNGECKLVIEDYKEKVSTKYDEELGLWLLDFNK/ETYKYKKIGNKYLVSINNGKDLVDSIKKFCKDKKIKSGTVNGIGSISKLTLEFFDPDXXXXKTKTLEKNLEISNLTGFISTKDGEVFLDLHITIGDENFSALAGHLISAIVNGIAELKIEDYNKEINVKYDEKLGLYLLDFNK
+>T=0.1, sample=2, score=0.7064, global_score=0.9034, seq_recovery=0.5993
+HMYEYKKIGNKYIVSVKNNTELVEALKAFCEEKKIKSGTVNGIGQVKSVTLRFYDFKTKTSKDTTFNQNLEISNLTGFISEYNNKVFLDLHITFGDSNFSALAGHLLSAVVGGEAIFVVEDYKEKISRKYDEKLGLYLLDFNK/NMYKYKKIGNKYIVSINNGKNLVKALKKFCEDKNIKSGTINGIGMISKVTLYFFDPEXXXXTTKTFNELLEISNLTGFISEKNGKVFLHLHITIGDSNFSALAGHLIDAVVNGIAEVIVEDFNEKINVKYNEETGLWLLDFNK