MolXPT
Our model is a variant of GPT pre-trained on SMILES (a sequence representation of molecules) wrapped by text. Our model is based on BioGPT and we redefine the tokenizer.
Example Usage
from transformers import AutoTokenizer, BioGptForCausalLM
model = BioGptForCausalLM.from_pretrained("zequnl/molxpt")
molxpt_tokenizer = AutoTokenizer.from_pretrained("zequnl/molxpt", trust_remote_code=True)
model = model.cuda()
model.eval()
input_ids = molxpt_tokenizer('<start-of-mol>CC(=O)OC1=CC=CC=C1C(=O)O<end-of-mol> is ', return_tensors="pt").input_ids.cuda()
output = model.generate(
input_ids,
max_new_tokens=300,
num_return_sequences=4,
temperature=0.75,
top_p=0.95,
do_sample=True,
)
for i in range(4):
s = molxpt_tokenizer.decode(output[i])
print(s)
References
For more information, please refer to our paper and GitHub repository.
Paper: MolXPT: Wrapping Molecules with Text for Generative Pre-training
Authors: Zequn Liu, Wei Zhang, Yingce Xia, Lijun Wu, Shufang Xie, Tao Qin, Ming Zhang, Tie-Yan Liu
- Downloads last month
- 39
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.