Spaces:
Sleeping
Sleeping
gravelcompbio
commited on
Commit
•
05f403c
1
Parent(s):
31a2e39
Update README.md
Browse files
README.md
CHANGED
@@ -9,10 +9,7 @@ app_file: app.py
|
|
9 |
pinned: false
|
10 |
license: cc-by-nc-nd-4.0
|
11 |
---
|
12 |
-
|
13 |
-
|
14 |
-
|
15 |
-
<!-- This Github was Made By Nathan Gravel and tested with help of Mariah Salcedo-->
|
16 |
|
17 |
# Phosformer-ST <img src="https://github.com/gravelCompBio/Phosformer-ST/assets/75225868/f375e377-b639-4b8c-9792-6d8e5e9e6c39" width="60">
|
18 |
|
@@ -24,9 +21,6 @@ license: cc-by-nc-nd-4.0
|
|
24 |
|
25 |
|
26 |
|
27 |
-
|
28 |
-
|
29 |
-
|
30 |
|
31 |
|
32 |
|
@@ -34,14 +28,7 @@ license: cc-by-nc-nd-4.0
|
|
34 |
|
35 |
|
36 |
|
37 |
-
This repository contains the code to run Phosformer-ST locally
|
38 |
-
|
39 |
-
uncovers the kinase-substrate interaction landscape" . This readme should also give you the specific versions for all packages used to run Phosformer-ST in a local environment.
|
40 |
-
|
41 |
-
The model was created by Zhongliang Zhou and Wayland Yeung. The Phos-ST webtool is found from this link (https://phosformer.netlify.app/) and was generated by Saber Soleymani.
|
42 |
-
|
43 |
-
|
44 |
-
|
45 |
</br>
|
46 |
|
47 |
|
@@ -76,60 +63,45 @@ The model was created by Zhongliang Zhou and Wayland Yeung. The Phos-ST webtool
|
|
76 |
|
77 |
|
78 |
|
79 |
-
|
80 |
-
|
81 |
-
|
82 |
-
|
83 |
-
- `phos-ST_Example_Code.ipynb`: Jupyter File with example code to run Phosformer-ST
|
84 |
-
|
85 |
-
|
86 |
-
|
87 |
-
- `modeling_esm.py`: Python file that has the architecture of Phosformer-ST
|
88 |
-
|
89 |
-
|
90 |
-
|
91 |
-
- `configuration_esm.py`: Python file that has configuration/parameters of Phosformer-ST
|
92 |
-
|
93 |
|
94 |
|
95 |
-
- `
|
96 |
|
97 |
|
98 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
99 |
|
100 |
|
101 |
-
- `multitask_MHA_esm2_t30_150M_UR50D_neg_ratio_8+8_shift_30_mask_0.2_2023-03-25_90.txt`: this txt file contains a link to a zenodo repository to download the proper folder
|
102 |
-
|
103 |
|
104 |
|
105 |
-
|
106 |
-
|
107 |
-
- See section below (Downloading this repository) to be shown how to download this folder and where to put it
|
108 |
-
|
109 |
-
|
110 |
-
- `multitask_MHA_esm2_t30_150M_UR50D_neg_ratio_8+8_shift_30_mask_0.2_2023-03-25_90`: folder of the training weights for Phosformer-ST to run as advertised
|
111 |
|
|
|
112 |
|
113 |
|
114 |
-
- `phosST.yml`: This file is used to help create an environment for
|
115 |
|
116 |
|
117 |
|
118 |
-
- `README.md`:
|
119 |
|
120 |
|
121 |
|
122 |
-
- `LICENSE`: Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International Public License
|
123 |
-
|
124 |
|
125 |
-
- `app.py`: Contains the code to get huggingface to work as a webtool (with gradio)
|
126 |
|
127 |
|
128 |
-
|
129 |
-
|
130 |
|
131 |
|
132 |
-
|
133 |
|
134 |
|
135 |
</br>
|
@@ -141,7 +113,6 @@ The model was created by Zhongliang Zhou and Wayland Yeung. The Phos-ST webtool
|
|
141 |
|
142 |
|
143 |
|
144 |
-
|
145 |
|
146 |
|
147 |
## Installing dependencies with version info
|
@@ -225,71 +196,7 @@ Follow along with its recommendation
|
|
225 |
Installing torch can be the most complex part
|
226 |
|
227 |
|
228 |
-
|
229 |
-
|
230 |
-
|
231 |
-
</br>
|
232 |
-
|
233 |
-
|
234 |
-
|
235 |
-
|
236 |
-
|
237 |
-
|
238 |
-
|
239 |
-
|
240 |
-
|
241 |
-
### The computer specs that we know that this model can run on (with gpu acceleration)
|
242 |
-
|
243 |
-
|
244 |
-
|
245 |
-
</br>
|
246 |
-
|
247 |
-
|
248 |
-
|
249 |
-
**Computer 1**
|
250 |
-
|
251 |
-
|
252 |
-
|
253 |
-
Ubuntu 22.04.2 LTS
|
254 |
-
|
255 |
-
|
256 |
-
|
257 |
-
Intel(R) Xeon(R) Bronze 3204 CPU @ 1.90GHz (6 cores) x (1 thread per core)
|
258 |
-
|
259 |
-
|
260 |
-
|
261 |
-
64 GB ram
|
262 |
-
|
263 |
-
|
264 |
-
|
265 |
-
NVIDIA Quadro RTX 5000 (16 GB vRAM)(CUDA Version: 12.1)
|
266 |
-
|
267 |
-
|
268 |
-
|
269 |
-
</br>
|
270 |
-
|
271 |
-
|
272 |
-
|
273 |
-
**Computer 2**
|
274 |
-
|
275 |
-
|
276 |
-
|
277 |
-
Ubuntu 20.04.6 LTS
|
278 |
-
|
279 |
-
|
280 |
-
|
281 |
-
Intel(R) Xeon(R) Bronze 3204 CPU @ 1.90GHz (6 cores) x (1 thread per core)
|
282 |
-
|
283 |
-
|
284 |
-
|
285 |
-
64 GB ram
|
286 |
-
|
287 |
|
288 |
-
|
289 |
-
NVIDIA RTX A4000 (16 GB vRAM)(CUDA Version: 12.2)
|
290 |
-
|
291 |
-
|
292 |
-
|
293 |
|
294 |
|
295 |
|
@@ -300,7 +207,6 @@ NVIDIA RTX A4000 (16 GB vRAM)(CUDA Version: 12.2)
|
|
300 |
|
301 |
</br>
|
302 |
|
303 |
-
|
304 |
|
305 |
|
306 |
|
@@ -360,7 +266,7 @@ The `Phosformer-ST_with_trainging_weights` folder should have the following file
|
|
360 |
|
361 |
|
362 |
|
363 |
-
- folder 1 `multitask_MHA_esm2_t30_150M_UR50D_neg_ratio_8+8_shift_30_mask_0.2_2023-03-25_90`
|
364 |
|
365 |
|
366 |
|
@@ -393,7 +299,7 @@ Once you have a folder with the files/folder above in it you have done all the d
|
|
393 |
|
394 |
### PICK ONE of the options below
|
395 |
|
396 |
-
### Option
|
397 |
|
398 |
here is a step-by-step guide to set up the environment with the yml file
|
399 |
|
@@ -417,9 +323,9 @@ conda activate phosST
|
|
417 |
|
418 |
|
419 |
|
420 |
-
###
|
421 |
|
422 |
-
(This is if torch is
|
423 |
|
424 |
Just type these lines of code into the terminal after you download this repository (this assumes you have anaconda already installed)
|
425 |
|
@@ -477,11 +383,11 @@ pip3 install torch torchvision torchaudio
|
|
477 |
|
478 |
|
479 |
|
480 |
-
### the terminal line above might look different
|
481 |
|
482 |
|
483 |
|
484 |
-
We provided code to test
|
485 |
|
486 |
|
487 |
|
@@ -499,21 +405,19 @@ We provided code to test Phos-ST (see section below)
|
|
499 |
|
500 |
|
501 |
|
502 |
-
## Utilizing the Model with our example
|
503 |
|
504 |
All the following code examples is done inside of the `phos-ST_Example_Code.ipynb` file using jupyter lab
|
505 |
|
506 |
|
507 |
|
508 |
-
Once you have your environment resolved just use jupyter lab to access the example code by typing the
|
509 |
|
510 |
```
|
511 |
-
|
512 |
jupyter lab
|
513 |
-
|
514 |
```
|
515 |
|
516 |
-
Once you open the notebook on your browser, run each cell
|
517 |
|
518 |
|
519 |
|
@@ -521,28 +425,20 @@ Once you open the notebook on your browser, run each cell of notebook
|
|
521 |
|
522 |
|
523 |
|
524 |
-
### Testing
|
525 |
-
|
526 |
-
There should be a positive control and a negative control example code at bottom of the `phos-ST_Example_Code.ipynb` file. This is here just to sanity check that the model is working. The positive and negative control is running the same code with known examples where Phos-ST should give an answered close to 1 (positive control) or 0 (negative control).
|
527 |
|
|
|
528 |
|
529 |
|
530 |
**Positive Example**
|
531 |
|
532 |
```Python
|
533 |
-
|
534 |
# P17612 KAPCA_HUMAN
|
535 |
-
|
536 |
kinDomain="FERIKTLGTGSFGRVMLVKHKETGNHYAMKILDKQKVVKLKQIEHTLNEKRILQAVNFPFLVKLEFSFKDNSNLYMVMEYVPGGEMFSHLRRIGRFSEPHARFYAAQIVLTFEYLHSLDLIYRDLKPENLLIDQQGYIQVTDFGFAKRVKGRTWTLCGTPEYLAPEIILSKGYNKAVDWWALGVLIYEMAAGYPPFFADQPIQIYEKIVSGKVRFPSHFSSDLKDLLRNLLQVDLTKRFGNLKNGVNDIKNHKWF"
|
537 |
-
|
538 |
# P53602_S96_LARKRRNSRDGDPLP
|
539 |
-
|
540 |
substrate="LARKRRNSRDGDPLP"
|
541 |
-
|
542 |
|
543 |
-
|
544 |
phosST(kinDomain,substrate).to_csv('PostiveExample.csv')
|
545 |
-
|
546 |
```
|
547 |
|
548 |
|
@@ -552,22 +448,15 @@ phosST(kinDomain,substrate).to_csv('PostiveExample.csv')
|
|
552 |
**Negative Example**
|
553 |
|
554 |
```Python
|
555 |
-
|
556 |
# P17612 KAPCA_HUMAN
|
557 |
-
|
558 |
kinDomain="FERIKTLGTGSFGRVMLVKHKETGNHYAMKILDKQKVVKLKQIEHTLNEKRILQAVNFPFLVKLEFSFKDNSNLYMVMEYVPGGEMFSHLRRIGRFSEPHARFYAAQIVLTFEYLHSLDLIYRDLKPENLLIDQQGYIQVTDFGFAKRVKGRTWTLCGTPEYLAPEIILSKGYNKAVDWWALGVLIYEMAAGYPPFFADQPIQIYEKIVSGKVRFPSHFSSDLKDLLRNLLQVDLTKRFGNLKNGVNDIKNHKWF"
|
559 |
-
|
560 |
# Q01831_T169_PVEIEIETPEQAKTR
|
561 |
-
|
562 |
substrate="PVEIEIETPEQAKTR"
|
563 |
-
|
564 |
|
565 |
-
|
566 |
phosST(kinDomain,substrate).to_csv('NegitiveExample.csv')
|
567 |
-
|
568 |
```
|
569 |
|
570 |
-
Both scores should show up in a csv file in the
|
571 |
|
572 |
|
573 |
|
@@ -577,23 +466,21 @@ Both scores should show up in a csv file in the same folder of this code
|
|
577 |
|
578 |
### Inputting your own data for novel predictions
|
579 |
|
580 |
-
One can simply take the code from above and modify the string variables `kinDomain` and `substrate` to
|
581 |
|
582 |
|
583 |
|
584 |
-
**Formatting of the `kinDomain` and `substrate` for input for
|
585 |
|
586 |
|
587 |
|
588 |
-
- `kinDomain` should
|
589 |
-
|
590 |
|
591 |
-
|
592 |
-
- `substrate` should be a 15mer with the center residue/char being the Serine or Threonine being phosphorylated
|
593 |
|
594 |
|
595 |
|
596 |
-
Not following these rules
|
597 |
|
598 |
|
599 |
|
@@ -603,154 +490,107 @@ Not following these rules will still give you and output at time but does not gu
|
|
603 |
|
604 |
|
605 |
|
606 |
-
### How to
|
607 |
|
608 |
-
This model
|
609 |
|
610 |
-
|
611 |
|
612 |
-
|
613 |
|
614 |
-
|
615 |
|
616 |
-
|
617 |
|
618 |
|
619 |
-
|
620 |
-
Combining with other special, temporal, or other biologically relevant filters might be more accurate when modeling protein kinase.
|
621 |
-
|
622 |
|
623 |
|
624 |
</br>
|
625 |
|
626 |
|
627 |
-
|
628 |
|
629 |
|
630 |
-
|
631 |
|
632 |
|
633 |
|
634 |
-
|
635 |
-
|
636 |
-
Currenly, we have it only predicting one kinase domain + one substrate at a time. One can simply swap out the `helper function to use Phos-ST` code-block with the code-block below. The input arguments now require a list of strings for both the kinase domains and substrates. Make sure the list of both kinases and substrates are the same length and conserve the same format specified in the "Inputting your own data for novel predictions" section of the readme
|
637 |
-
|
638 |
-
```Python
|
639 |
-
|
640 |
-
# P17612 KAPCA_HUMAN listed twice
|
641 |
-
|
642 |
-
kinDomains=["FERIKTLGTGSFGRVMLVKHKETGNHYAMKILDKQKVVKLKQIEHTLNEKRILQAVNFPFLVKLEFSFKDNSNLYMVMEYVPGGEMFSHLRRIGRFSEPHARFYAAQIVLTFEYLHSLDLIYRDLKPENLLIDQQGYIQVTDFGFAKRVKGRTWTLCGTPEYLAPEIILSKGYNKAVDWWALGVLIYEMAAGYPPFFADQPIQIYEKIVSGKVRFPSHFSSDLKDLLRNLLQVDLTKRFGNLKNGVNDIKNHKWF","FERIKTLGTGSFGRVMLVKHKETGNHYAMKILDKQKVVKLKQIEHTLNEKRILQAVNFPFLVKLEFSFKDNSNLYMVMEYVPGGEMFSHLRRIGRFSEPHARFYAAQIVLTFEYLHSLDLIYRDLKPENLLIDQQGYIQVTDFGFAKRVKGRTWTLCGTPEYLAPEIILSKGYNKAVDWWALGVLIYEMAAGYPPFFADQPIQIYEKIVSGKVRFPSHFSSDLKDLLRNLLQVDLTKRFGNLKNGVNDIKNHKWF"]
|
643 |
|
644 |
|
645 |
|
646 |
-
|
647 |
-
|
648 |
-
substrates=["LARKRRNSRDGDPLP","PVEIEIETPEQAKTR"]
|
649 |
-
|
650 |
-
|
651 |
|
652 |
|
653 |
|
654 |
-
|
655 |
|
656 |
-
|
657 |
-
|
658 |
-
substrate15mers,
|
659 |
-
|
660 |
-
kinaseDomainSeqs,
|
661 |
-
|
662 |
-
model=model,
|
663 |
-
|
664 |
-
tokenizer=tokenizer,
|
665 |
-
|
666 |
-
device='cuda',
|
667 |
-
|
668 |
-
batch_size=10,
|
669 |
-
|
670 |
-
output_hidden_states=False,
|
671 |
-
|
672 |
-
output_attentions=False,
|
673 |
-
|
674 |
-
)
|
675 |
-
|
676 |
-
|
677 |
-
|
678 |
-
#total = dataset.shape[0]
|
679 |
-
|
680 |
-
results = {
|
681 |
|
682 |
-
|
683 |
|
684 |
-
'peptide' : [],
|
685 |
|
686 |
-
|
687 |
|
688 |
-
|
689 |
|
690 |
-
|
691 |
|
692 |
-
|
693 |
|
694 |
-
|
695 |
|
696 |
-
results['kinase' ] += [i['kinase']]
|
697 |
|
698 |
-
results['peptide'] += [i['peptide']]
|
699 |
|
700 |
-
|
701 |
|
702 |
-
|
703 |
|
704 |
-
|
705 |
|
706 |
|
707 |
|
708 |
-
|
709 |
|
710 |
|
711 |
|
712 |
-
|
713 |
|
714 |
-
|
715 |
|
716 |
-
phosST(kinDomains,substrates).to_csv('BatchExample.csv')
|
717 |
|
718 |
|
719 |
|
720 |
-
|
721 |
|
722 |
|
723 |
|
724 |
-
|
725 |
|
726 |
-
```
|
727 |
|
728 |
-
|
|
|
729 |
|
730 |
|
731 |
|
732 |
-
|
733 |
|
734 |
|
735 |
|
736 |
-
|
737 |
|
738 |
|
739 |
|
740 |
-
|
741 |
|
742 |
|
743 |
|
744 |
-
Using the CPU version of torch might 10x to 1000x your run time so for large prediction datasets GPU acceleration is suggested
|
745 |
|
746 |
-
|
747 |
|
748 |
-
If you just are here to test if it phos-ST works, the example code should not take too much time to run on the CPU version of torch
|
749 |
|
750 |
-
|
751 |
|
752 |
-
|
|
|
753 |
|
|
|
|
|
754 |
|
|
|
755 |
|
756 |
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
|
|
|
9 |
pinned: false
|
10 |
license: cc-by-nc-nd-4.0
|
11 |
---
|
12 |
+
<!-- This github was Made by Nathan Gravel -->
|
|
|
|
|
|
|
13 |
|
14 |
# Phosformer-ST <img src="https://github.com/gravelCompBio/Phosformer-ST/assets/75225868/f375e377-b639-4b8c-9792-6d8e5e9e6c39" width="60">
|
15 |
|
|
|
21 |
|
22 |
|
23 |
|
|
|
|
|
|
|
24 |
|
25 |
|
26 |
|
|
|
28 |
|
29 |
|
30 |
|
31 |
+
This repository contains the code to run Phosformer-ST locally described in the manuscript "Phosformer-ST: explainable machine learning uncovers the kinase-substrate interaction landscape". This readme also provides instructions on all dependencies and packages required to run Phosformer-ST in a local environment.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
32 |
</br>
|
33 |
|
34 |
|
|
|
63 |
|
64 |
|
65 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
66 |
|
67 |
|
68 |
+
- `phos-ST_Example_Code.ipynb`: ipynb file with example code to run Phosformer-ST
|
69 |
|
70 |
|
71 |
|
72 |
+
- `modeling_esm.py`: Python file that has the architecture of Phosformer-ST
|
73 |
+
|
74 |
+
|
75 |
+
|
76 |
+
- `configuration_esm.py`: Python file that has configuration/parameters of Phosformer-ST
|
77 |
+
|
78 |
+
|
79 |
+
|
80 |
+
- `tokenization_esm.py`: Python file that contains code for the tokenizer
|
81 |
|
82 |
|
|
|
|
|
83 |
|
84 |
|
85 |
+
- `multitask_MHA_esm2_t30_150M_UR50D_neg_ratio_8+8_shift_30_mask_0.2_2023-03-25_90.txt`: this txt file contains a link to the training weights held on the hugging face or zenodo repository
|
|
|
|
|
|
|
|
|
|
|
86 |
|
87 |
+
- See section below (Downloading this repository) to be shown how to download this folder and where to put it
|
88 |
|
89 |
|
90 |
+
- `phosST.yml`: This file is used to help create an environment for Phosformer-ST to work
|
91 |
|
92 |
|
93 |
|
94 |
+
- `README.md`:
|
95 |
|
96 |
|
97 |
|
98 |
+
- `LICENSE`: Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International Public License
|
|
|
99 |
|
|
|
100 |
|
101 |
|
102 |
+
|
|
|
103 |
|
104 |
|
|
|
105 |
|
106 |
|
107 |
</br>
|
|
|
113 |
|
114 |
|
115 |
|
|
|
116 |
|
117 |
|
118 |
## Installing dependencies with version info
|
|
|
196 |
Installing torch can be the most complex part
|
197 |
|
198 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
199 |
|
|
|
|
|
|
|
|
|
|
|
200 |
|
201 |
|
202 |
|
|
|
207 |
|
208 |
</br>
|
209 |
|
|
|
210 |
|
211 |
|
212 |
|
|
|
266 |
|
267 |
|
268 |
|
269 |
+
- folder 1 `multitask_MHA_esm2_t30_150M_UR50D_neg_ratio_8+8_shift_30_mask_0.2_2023-03-25_90`
|
270 |
|
271 |
|
272 |
|
|
|
299 |
|
300 |
### PICK ONE of the options below
|
301 |
|
302 |
+
### Main Option) Utilizing the PhosformerST.yml file
|
303 |
|
304 |
here is a step-by-step guide to set up the environment with the yml file
|
305 |
|
|
|
323 |
|
324 |
|
325 |
|
326 |
+
### Alternative option) Creating this environment without yml file
|
327 |
|
328 |
+
(This is if torch is not working with your version of cuda or any other problem)
|
329 |
|
330 |
Just type these lines of code into the terminal after you download this repository (this assumes you have anaconda already installed)
|
331 |
|
|
|
383 |
|
384 |
|
385 |
|
386 |
+
### the terminal line above might look different for you
|
387 |
|
388 |
|
389 |
|
390 |
+
We provided code to test Phosformer-ST (see section below)
|
391 |
|
392 |
|
393 |
|
|
|
405 |
|
406 |
|
407 |
|
408 |
+
## Utilizing the Model with our example code
|
409 |
|
410 |
All the following code examples is done inside of the `phos-ST_Example_Code.ipynb` file using jupyter lab
|
411 |
|
412 |
|
413 |
|
414 |
+
Once you have your environment resolved just use jupyter lab to access the example code by typing the command below in your terminal (when you're in the `Phosformer-ST` folder)
|
415 |
|
416 |
```
|
|
|
417 |
jupyter lab
|
|
|
418 |
```
|
419 |
|
420 |
+
Once you open the notebook on your browser, run each cell in the notebook
|
421 |
|
422 |
|
423 |
|
|
|
425 |
|
426 |
|
427 |
|
428 |
+
### Testing Phosformer-ST with the example code
|
|
|
|
|
429 |
|
430 |
+
There should be a positive control and a negative control example code at the bottom of the `phos-ST_Example_Code.ipynb` file which can be used to test the model.
|
431 |
|
432 |
|
433 |
**Positive Example**
|
434 |
|
435 |
```Python
|
|
|
436 |
# P17612 KAPCA_HUMAN
|
|
|
437 |
kinDomain="FERIKTLGTGSFGRVMLVKHKETGNHYAMKILDKQKVVKLKQIEHTLNEKRILQAVNFPFLVKLEFSFKDNSNLYMVMEYVPGGEMFSHLRRIGRFSEPHARFYAAQIVLTFEYLHSLDLIYRDLKPENLLIDQQGYIQVTDFGFAKRVKGRTWTLCGTPEYLAPEIILSKGYNKAVDWWALGVLIYEMAAGYPPFFADQPIQIYEKIVSGKVRFPSHFSSDLKDLLRNLLQVDLTKRFGNLKNGVNDIKNHKWF"
|
|
|
438 |
# P53602_S96_LARKRRNSRDGDPLP
|
|
|
439 |
substrate="LARKRRNSRDGDPLP"
|
|
|
440 |
|
|
|
441 |
phosST(kinDomain,substrate).to_csv('PostiveExample.csv')
|
|
|
442 |
```
|
443 |
|
444 |
|
|
|
448 |
**Negative Example**
|
449 |
|
450 |
```Python
|
|
|
451 |
# P17612 KAPCA_HUMAN
|
|
|
452 |
kinDomain="FERIKTLGTGSFGRVMLVKHKETGNHYAMKILDKQKVVKLKQIEHTLNEKRILQAVNFPFLVKLEFSFKDNSNLYMVMEYVPGGEMFSHLRRIGRFSEPHARFYAAQIVLTFEYLHSLDLIYRDLKPENLLIDQQGYIQVTDFGFAKRVKGRTWTLCGTPEYLAPEIILSKGYNKAVDWWALGVLIYEMAAGYPPFFADQPIQIYEKIVSGKVRFPSHFSSDLKDLLRNLLQVDLTKRFGNLKNGVNDIKNHKWF"
|
|
|
453 |
# Q01831_T169_PVEIEIETPEQAKTR
|
|
|
454 |
substrate="PVEIEIETPEQAKTR"
|
|
|
455 |
|
|
|
456 |
phosST(kinDomain,substrate).to_csv('NegitiveExample.csv')
|
|
|
457 |
```
|
458 |
|
459 |
+
Both scores should show up in a csv file in the current directory
|
460 |
|
461 |
|
462 |
|
|
|
466 |
|
467 |
### Inputting your own data for novel predictions
|
468 |
|
469 |
+
One can simply take the code from above and modify the string variables `kinDomain` and `substrate` to make predictions on any given kinase substrate pairs
|
470 |
|
471 |
|
472 |
|
473 |
+
**Formatting of the `kinDomain` and `substrate` for input for Phosformer-ST are as follows:**
|
474 |
|
475 |
|
476 |
|
477 |
+
- `kinDomain` should be a human Serine/Threonine kinase domain (not the full sequence).
|
|
|
478 |
|
479 |
+
- `substrate` should be a 15mer with the center residue/char being the target Serine or Threonine being phosphorylated
|
|
|
480 |
|
481 |
|
482 |
|
483 |
+
Not following these rules may result in dubious predictions
|
484 |
|
485 |
|
486 |
|
|
|
490 |
|
491 |
|
492 |
|
493 |
+
### How to interpret Phosformer-ST's output
|
494 |
|
495 |
+
This model outputs a prediction score between 1 and 0.
|
496 |
|
|
|
497 |
|
498 |
+
We trained the model to uses a cutoff of 0.5 to distinguish positive and negative predictions
|
499 |
|
|
|
500 |
|
501 |
+
A score of 0.5 or above indicates a positive prediction for peptide substrate phosphorylation by the given kinase
|
502 |
|
503 |
|
|
|
|
|
|
|
504 |
|
505 |
|
506 |
</br>
|
507 |
|
508 |
|
|
|
509 |
|
510 |
|
511 |
+
## Troubleshooting
|
512 |
|
513 |
|
514 |
|
515 |
+
If torch is not installing correctly or you do not have a GPU to run Phosformer-ST on, the CPU version of torch is perfectly fine to use
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
516 |
|
517 |
|
518 |
|
519 |
+
Using the CPU version of torch might increase your run time so for large prediction datasets GPU acceleration is suggested
|
|
|
|
|
|
|
|
|
520 |
|
521 |
|
522 |
|
523 |
+
If you just are here to test if it Phosformer-ST works, the example code should not take too much time to run on the CPU version of torch
|
524 |
|
525 |
+
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
526 |
|
527 |
+
Also depending on your GPU the `batch_size` argument might need to be adjusted
|
528 |
|
|
|
529 |
|
530 |
+
### The model has been tested on the following computers with the following specifications for trouble shooting proposes
|
531 |
|
532 |
+
|
533 |
|
534 |
+
</br>
|
535 |
|
536 |
+
|
537 |
|
538 |
+
**Computer 1**
|
539 |
|
|
|
540 |
|
|
|
541 |
|
542 |
+
NVIDIA Quadro RTX 5000 (16 GB vRAM)(CUDA Version: 12.1)
|
543 |
|
544 |
+
|
545 |
|
546 |
+
Ubuntu 22.04.2 LTS
|
547 |
|
548 |
|
549 |
|
550 |
+
Intel(R) Xeon(R) Bronze 3204 CPU @ 1.90GHz (6 cores) x (1 thread per core)
|
551 |
|
552 |
|
553 |
|
554 |
+
64 GB ram
|
555 |
|
|
|
556 |
|
|
|
557 |
|
558 |
|
559 |
|
560 |
+
</br>
|
561 |
|
562 |
|
563 |
|
564 |
+
**Computer 2**
|
565 |
|
|
|
566 |
|
567 |
+
|
568 |
+
NVIDIA RTX A4000 (16 GB vRAM)(CUDA Version: 12.2)
|
569 |
|
570 |
|
571 |
|
572 |
+
Ubuntu 20.04.6 LTS
|
573 |
|
574 |
|
575 |
|
576 |
+
Intel(R) Xeon(R) Bronze 3204 CPU @ 1.90GHz (6 cores) x (1 thread per core)
|
577 |
|
578 |
|
579 |
|
580 |
+
64 GB ram
|
581 |
|
582 |
|
583 |
|
|
|
584 |
|
|
|
585 |
|
|
|
586 |
|
|
|
587 |
|
588 |
+
</br>
|
589 |
+
|
590 |
|
591 |
+
## Other accessory tools and resources
|
592 |
+
A webtool for Phosformer-ST can be accessed from: https://phosformer.netlify.app/. A huggingface repository can be downloaded from: https://huggingface.co/gravelcompbio/Phosformer-ST_with_trainging_weights. A huggingface spaces app is available at: https://huggingface.co/spaces/gravelcompbio/Phosformer-ST
|
593 |
|
594 |
+
The github can be found here https://github.com/gravelCompBio/Phosformer-ST/tree/main
|
595 |
|
596 |
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
|