Qwen
/

Qwen2-Math-RM-72B

Text Classification

feature-extraction

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Zhenru commited on Sep 18

Commit

ecd9123

•

1 Parent(s): 46df1cb

Update README.md

Files changed (1) hide show

README.md +2 -0

README.md CHANGED Viewed

@@ -27,6 +27,8 @@ Key Highlights:
   - Best of N: By leveraging a combination of response sampling and Best-of-N strategies, we choose the response of top score judged by reward model, yielding better results with spending more inference time. For example, Qwen2-Math-1.5B-Instruct obtains 79.9 on MATH in RM@8 setting and even surpasses the performance of Qwen2-Math-7B-Instruct 75.1 with greedy decoding.
   - Comparasion with majority voting (Maj@N): RM@N scores are substantially better than Maj@N scores aross almost all benchmarks and models.
 ## Model Details

   - Best of N: By leveraging a combination of response sampling and Best-of-N strategies, we choose the response of top score judged by reward model, yielding better results with spending more inference time. For example, Qwen2-Math-1.5B-Instruct obtains 79.9 on MATH in RM@8 setting and even surpasses the performance of Qwen2-Math-7B-Instruct 75.1 with greedy decoding.
   - Comparasion with majority voting (Maj@N): RM@N scores are substantially better than Maj@N scores aross almost all benchmarks and models.
+![](http://qianwen-res.oss-accelerate-overseas.aliyuncs.com/Qwen2.5/qwen2.5-math-pipeline.jpeg)
 ## Model Details