ICMA version of galactica-125M for text-based molecule generation task (Cap2Mol) for paper "Large Language Models are In-Context Molecule Learners"

Notice: The input should contain 4 context examples and the cutoff length should be set to 2048 to ensure best performance.

A simple inference example

from transformers import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained("phenixace/ICMA-Galactica-125M-M2C")

from transformers import AutoTokenizer
tk = AutoTokenizer.from_pretrained("phenixace/ICMA-Galactica-125M-M2C")

text ="""Generate a molecule for the caption: The molecule is a dicarboxylic acid monoester that is the 21-(hydrogen succinate) derivative of 11-deoxycorticosterone. It is a 3-oxo-Delta(4) steroid, a 20-oxo steroid, a dicarboxylic acid monoester, a steroid ester and a hemisuccinate. It derives from an 11-deoxycorticosterone and a succinic acid.
Molecule: C[C@]12CC[C@H]3[C@H]([C@@H]1CC[C@@H]2C(=O)COC(=O)CCC(=O)O)CCC4=CC(=O)CC[C@]34C

Generate a molecule for the caption: The molecule is a fluorinated steroid that is 9-fluoropregna-1,4-diene substituted by hydroxy groups at positions 11, 17 and 21, a methyl group at position 16 and oxo groups at positions 3 and 20. It is a synthetic member of the class of glucocorticoids. It has a role as an adrenergic agent, an antiemetic, an antineoplastic agent, an environmental contaminant, a xenobiotic, an immunosuppressive agent and an anti-inflammatory drug. It is a fluorinated steroid, a 3-oxo-Delta(1),Delta(4)-steroid, a glucocorticoid, a 20-oxo steroid, an 11beta-hydroxy steroid, a 17alpha-hydroxy steroid and a 21-hydroxy steroid. It derives from a hydride of a pregnane.
Molecule: C[C@@H]1C[C@H]2[C@@H]3CCC4=CC(=O)C=C[C@@]4([C@]3([C@H](C[C@@]2([C@]1(C(=O)CO)O)C)O)F)C

Generate a molecule for the caption: The molecule is a fluorinated steroid that is pregn-4-ene substituted by a fluoro group at position 2, a methyl group at position 2 and oxo groups at positions 3, 11 and 20. It is a 3-oxo-Delta(4) steroid, an 11-oxo steroid, a 20-oxo steroid and a fluorinated steroid. It derives from a progesterone. It derives from a hydride of a pregnane.
Molecule: C[C@@H]1C[C@]2(C(=CC1=O)CC[C@@H]3[C@@]2(C(=O)C[C@]4([C@H]3CC[C@@H]4C(=O)C)C)F)C

Generate a molecule for the caption: The molecule is a steroid ester that is pregn-4-en-21-yl acetate substituted by oxo group at positions 3 and 20, a methyl group at position 6 and hydroxy groups at positions 11 and 17 respectively. It is a 3-oxo-Delta(4) steroid, a steroid ester, an 11beta-hydroxy steroid, a 17alpha-hydroxy steroid, a 20-oxo steroid and a tertiary alpha-hydroxy ketone. It derives from a hydride of a pregnane.
Molecule: C[C@H]1C[C@H]2[C@@H]3CC[C@@]([C@]3(C[C@@H]([C@@H]2[C@@]4(C1=CC(=O)CC4)C)O)C)(C(=O)COC(=O)C)O

Based on the above examples, analyse the similarities and differences between the examples and finally generate a molecule for the caption: The molecule is a steroid ester that is methyl (17E)-pregna-4,17-dien-21-oate substituted by oxo groups at positions 3 and 11. It is a 3-oxo-Delta(4) steroid, an 11-oxo steroid, a steroid ester and a methyl ester. It derives from a hydride of a pregnant."""

generation_config = GenerationConfig(
            do_sample=True,
            temperature=0.7,
            top_p=0.85,
            top_k=40,
            num_beams=1,
            repetition_penalty=1.0,
            pad_token_id=0,
        )
inputs = tk(text, return_tensors="pt", return_token_type_ids=False)
outputs = model.generate(**inputs, return_dict_in_generate=True, output_scores=True, num_return_sequences=1, max_new_tokens=256, generation_config=generation_config)

# decode
decoded = tk.decode(outputs.sequences[0], skip_special_tokens=True)
print(decoded)

Paper Link: https://arxiv.org/abs/2403.04197

Downloads last month
8
Safetensors
Model size
125M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.