More Benchmarks

#4
by PSM272 - opened

Can you add more benchmarks like MATH, MMLU, HumanEval, etc.?

Thank you for your attention.

Our model has demonstrated preliminary feasibility in math-related tasks, primarily because data in this area is relatively easy to obtain, and the reward mechanisms are straightforward to design. However, our future focus will shift towards non-math tasks, particularly those involving open-ended questions. Therefore, we temporarily have no plans to evaluate the model on additional math benchmarks. However, we will soon present more experimental results and analyses on other tasks. Stay tuned for updates.

In that case, why have the only reported benchmark be MGSM, a math benchmark?

AIDC-AI org

In that case, why have the only reported benchmark be MGSM, a math benchmark?

I apologize for any confusion caused by the incomplete response to the earlier question. The response has now been updated.

This comment has been hidden

Sign up or log in to comment