Edit model card

MolXPT

Our model is a variant of GPT pre-trained on SMILES (a sequence representation of molecules) wrapped by text. Our model is based on BioGPT and we redefine the tokenizer.

Example Usage

from transformers import AutoTokenizer, BioGptForCausalLM

model = BioGptForCausalLM.from_pretrained("zequnl/molxpt")
molxpt_tokenizer = AutoTokenizer.from_pretrained("zequnl/molxpt", trust_remote_code=True)

model = model.cuda()
model.eval()

input_ids = molxpt_tokenizer('<start-of-mol>CC(=O)OC1=CC=CC=C1C(=O)O<end-of-mol> is ', return_tensors="pt").input_ids.cuda()
output = model.generate(
    input_ids,
    max_new_tokens=300,
    num_return_sequences=4,
    temperature=0.75,
    top_p=0.95,
    do_sample=True,
)

for i in range(4):
    s = molxpt_tokenizer.decode(output[i])
    print(s)

References

For more information, please refer to our paper and GitHub repository.

Paper: MolXPT: Wrapping Molecules with Text for Generative Pre-training

Authors: Zequn Liu, Wei Zhang, Yingce Xia, Lijun Wu, Shufang Xie, Tao Qin, Ming Zhang, Tie-Yan Liu

Downloads last month
39
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.