--- library_name: transformers tags: [] --- # MolXPT Our model is a variant of GPT pre-trained on SMILES (a sequence representation of molecules) wrapped by text. Our model is based on [BioGPT](https://huggingface.co/microsoft/biogpt) and we redefine the tokenizer. ## Example Usage ```python from transformers import AutoTokenizer, BioGptForCausalLM model = BioGptForCausalLM.from_pretrained("zequnl/molxpt") molxpt_tokenizer = AutoTokenizer.from_pretrained("zequnl/molxpt", trust_remote_code=True) model = model.cuda() model.eval() input_ids = molxpt_tokenizer('CC(=O)OC1=CC=CC=C1C(=O)O is ', return_tensors="pt").input_ids.cuda() output = model.generate( input_ids, max_new_tokens=300, num_return_sequences=4, temperature=0.75, top_p=0.95, do_sample=True, ) for i in range(4): s = molxpt_tokenizer.decode(output[i]) print(s) ``` ## References For more information, please refer to our paper and GitHub repository. Paper: [MolXPT: Wrapping Molecules with Text for Generative Pre-training](https://aclanthology.org/2023.acl-short.138/) Authors: *Zequn Liu, Wei Zhang, Yingce Xia, Lijun Wu, Shufang Xie, Tao Qin, Ming Zhang, Tie-Yan Liu*