liyuesen
/

druggpt

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

druggpt / README.md

liyuesen's picture

Update README.md (#6)

9c03bc4 over 1 year ago

|

history blame contribute delete

3.53 kB

	---
	license: gpl-3.0
	tags:
	- chemistry
	- biology
	- medical
	- gpt2
	---
	# DrugGPT
	A generative drug design model based on GPT2.
	<img src="https://img.shields.io/github/license/LIYUESEN/druggpt"><img src="https://img.shields.io/badge/python-3.7-blue"><img src="https://img.shields.io/github/stars/LIYUESEN/druggpt?style=social">
	## 🚩 Introduction
	DrugGPT is a generative pharmaceutical strategy based on GPT structure, which aims to bring innovation to drug design by using natural language processing technique.

	This project applies the GPT model to the exploration of chemical space to discover new molecules with potential binding abilities for specific proteins.

	DrugGPT provides a fast and efficient method for the generation of drug candidate molecules by training on up to 1.8 million protein-ligand binding data.
	## 📥 Deployment
	1. Clone
	```shell
	git clone https://github.com/LIYUESEN/druggpt.git
	cd druggpt
	```
	Or you can visit our [GitHub repo](https://github.com/LIYUESEN/druggpt) and click Code>Download ZIP to download this repo.
	2. Create virtual environment
	```shell
	conda create -n druggpt python=3.7
	conda activate druggpt
	```
	3. Download python dependencies
	```shell
	pip install datasets transformers scipy scikit-learn
	pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu117
	conda install -c openbabel openbabel
	```
	## 🗝 How to use
	Use [drug_generator.py](https://github.com/LIYUESEN/druggpt/blob/main/drug_generator.py)

	Required parameters:
	- `-p` \| `--pro_seq`: Input a protein amino acid sequence.
	- `-f` \| `--fasta`: Input a FASTA file.

	> Only one of -p and -f should be specified.
	- `-l` \| `--ligand_prompt`: Input a ligand prompt.
	- `-e` \| `--empty_input`: Enable directly generate mode.
	- `-n` \| `--number`: At least how many molecules will be generated.
	- `-d` \| `--device`: Hardware device to use. Default is 'cuda'.
	- `-o` \| `--output`: Output directory for generated molecules. Default is './ligand_output/'.
	- `-b` \| `--batch_size`: How many molecules will be generated per batch. Try to reduce this value if you have low RAM. Default is 32.
	## 🔬 Example usage
	- If you want to input a protein FASTA file
	```shell
	python drug_generator.py -f bcl2.fasta -n 50
	```
	- If you want to input the amino acid sequence of the protein
	```shell
	python drug_generator.py -p MAKQPSDVSSECDREGRQLQPAERPPQLRPGAPTSLQTEPQGNPEGNHGGEGDSCPHGSPQGPLAPPASPGPFATRSPLFIFMRRSSLLSRSSSGYFSFDTDRSPAPMSCDKSTQTPSPPCQAFNHYLSAMASMRQAEPADMRPEIWIAQELRRIGDEFNAYYARRVFLNNYQAAEDHPRMVILRLLRYIVRLVWRMH -n 50
	```

	- If you want to provide a prompt for the ligand
	```shell
	python drug_generator.py -f bcl2.fasta -l COc1ccc(cc1)C(=O) -n 50
	```

	- Note: If you are running in a Linux environment, you need to enclose the ligand's prompt with single quotes ('').
	```shell
	python drug_generator.py -f bcl2.fasta -l 'COc1ccc(cc1)C(=O)' -n 50
	```
	## 📝 How to reference this work
	DrugGPT: A GPT-based Strategy for Designing Potential Ligands Targeting Specific Proteins

	Yuesen Li, Chengyi Gao, Xin Song, Xiangyu Wang, Yungang Xu, Suxia Han

	bioRxiv 2023.06.29.543848; doi: [https://doi.org/10.1101/2023.06.29.543848](https://doi.org/10.1101/2023.06.29.543848)

	[![DOI](https://img.shields.io/badge/DOI-10.1101/2023.06.29.543848-blue)](https://doi.org/10.1101/2023.06.29.543848)
	## ⚖ License
	[GNU General Public License v3.0](https://www.gnu.org/licenses/gpl-3.0.html)