Cannot reproduce the reported values

#3
by yunhuijang - opened

Hi, I am trying to reproduce the results of retrosynthesis, property prediction, and molecule captioning using MolInstructions. Unfortunately, I cannot reproduce the results of Llama3 that you have reported in your GitHub. Isn't it right that setting 1) base_model = "meta-llama/Meta-Llama-3-8B-Instruct", 2) lora_weights= "zjunlp/llama3-instruct-molinst-molecule-8b" in generate.py, and 3) change LlamaTokenizer to AutoTokenizer enough to change the settings from Llama2 to Llama3? Is there any additional settings or details required?

The reported values seem to be very high but when I input the first test data of molecule captioning, it generates totally different caption from the ground truth. Even using the training data does not yield reasonable captions.

스크린샷 2024-11-21 오전 12.00.20.png

스크린샷 2024-11-20 오후 11.59.45.png

스크린샷 2024-11-20 오후 11.59.11.png

Thank you for your help.

ZJUNLP org

Hi, thank you for your interest.

We are using the training and generation code for LLaMA3 provided at https://github.com/hiyouga/LLaMA-Factory, which you may refer to. The code on our GitHub is specifically for LLaMA2.

Sign up or log in to comment