|
--- |
|
license: gpl-3.0 |
|
tags: |
|
- chemistry |
|
- biology |
|
- medical |
|
- gpt2 |
|
--- |
|
# DrugGPT |
|
A generative drug design model based on GPT2. |
|
<img src="https://img.shields.io/github/license/LIYUESEN/druggpt"><img src="https://img.shields.io/badge/python-3.7-blue"><img src="https://img.shields.io/github/stars/LIYUESEN/druggpt?style=social"> |
|
## π© Introduction |
|
DrugGPT is a generative pharmaceutical strategy based on GPT structure, which aims to bring innovation to drug design by using natural language processing technique. |
|
|
|
This project applies the GPT model to the exploration of chemical space to discover new molecules with potential binding abilities for specific proteins. |
|
|
|
DrugGPT provides a fast and efficient method for the generation of drug candidate molecules by training on up to 1.8 million protein-ligand binding data. |
|
## π₯ Deployment |
|
1. Clone |
|
```shell |
|
git clone https://github.com/LIYUESEN/druggpt.git |
|
cd druggpt |
|
``` |
|
Or you can visit our [GitHub repo](https://github.com/LIYUESEN/druggpt) and click *Code>Download ZIP* to download this repo. |
|
2. Create virtual environment |
|
```shell |
|
conda create -n druggpt python=3.7 |
|
conda activate druggpt |
|
``` |
|
3. Download python dependencies |
|
```shell |
|
pip install datasets transformers scipy scikit-learn |
|
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu117 |
|
conda install -c openbabel openbabel |
|
``` |
|
## π How to use |
|
Use [drug_generator.py](https://github.com/LIYUESEN/druggpt/blob/main/drug_generator.py) |
|
|
|
Required parameters: |
|
- `-p` | `--pro_seq`: Input a protein amino acid sequence. |
|
- `-f` | `--fasta`: Input a FASTA file. |
|
|
|
> Only one of -p and -f should be specified. |
|
- `-l` | `--ligand_prompt`: Input a ligand prompt. |
|
- `-e` | `--empty_input`: Enable directly generate mode. |
|
- `-n` | `--number`: At least how many molecules will be generated. |
|
- `-d` | `--device`: Hardware device to use. Default is 'cuda'. |
|
- `-o` | `--output`: Output directory for generated molecules. Default is './ligand_output/'. |
|
- `-b` | `--batch_size`: How many molecules will be generated per batch. Try to reduce this value if you have low RAM. Default is 32. |
|
## π¬ Example usage |
|
- If you want to input a protein FASTA file |
|
```shell |
|
python drug_generator.py -f bcl2.fasta -n 50 |
|
``` |
|
- If you want to input the amino acid sequence of the protein |
|
```shell |
|
python drug_generator.py -p MAKQPSDVSSECDREGRQLQPAERPPQLRPGAPTSLQTEPQGNPEGNHGGEGDSCPHGSPQGPLAPPASPGPFATRSPLFIFMRRSSLLSRSSSGYFSFDTDRSPAPMSCDKSTQTPSPPCQAFNHYLSAMASMRQAEPADMRPEIWIAQELRRIGDEFNAYYARRVFLNNYQAAEDHPRMVILRLLRYIVRLVWRMH -n 50 |
|
``` |
|
|
|
- If you want to provide a prompt for the ligand |
|
```shell |
|
python drug_generator.py -f bcl2.fasta -l COc1ccc(cc1)C(=O) -n 50 |
|
``` |
|
|
|
- Note: If you are running in a Linux environment, you need to enclose the ligand's prompt with single quotes (''). |
|
```shell |
|
python drug_generator.py -f bcl2.fasta -l 'COc1ccc(cc1)C(=O)' -n 50 |
|
``` |
|
## π How to reference this work |
|
DrugGPT: A GPT-based Strategy for Designing Potential Ligands Targeting Specific Proteins |
|
|
|
Yuesen Li, Chengyi Gao, Xin Song, Xiangyu Wang, Yungang Xu, Suxia Han |
|
|
|
bioRxiv 2023.06.29.543848; doi: [https://doi.org/10.1101/2023.06.29.543848](https://doi.org/10.1101/2023.06.29.543848) |
|
|
|
[![DOI](https://img.shields.io/badge/DOI-10.1101/2023.06.29.543848-blue)](https://doi.org/10.1101/2023.06.29.543848) |
|
## β License |
|
[GNU General Public License v3.0](https://www.gnu.org/licenses/gpl-3.0.html) |