AIDC-AI/Marco-o1 · More Benchmarks

PSM272

5 days ago

Can you add more benchmarks like MATH, MMLU, HumanEval, etc.?

Sniper

AIDC-AI org 5 days ago

•

edited 2 days ago

Thank you for your attention.

Our model has demonstrated preliminary feasibility in math-related tasks, primarily because data in this area is relatively easy to obtain, and the reward mechanisms are straightforward to design. However, our future focus will shift towards non-math tasks, particularly those involving open-ended questions. Therefore, we temporarily have no plans to evaluate the model on additional math benchmarks. However, we will soon present more experimental results and analyses on other tasks. Stay tuned for updates.

alpayariyak

5 days ago

In that case, why have the only reported benchmark be MGSM, a math benchmark?

Sniper

AIDC-AI org 2 days ago

In that case, why have the only reported benchmark be MGSM, a math benchmark?

I apologize for any confusion caused by the incomplete response to the earlier question. The response has now been updated.

deleted

about 18 hours ago

This comment has been hidden