Update README.md
Browse files
README.md
CHANGED
@@ -27,6 +27,8 @@ Key Highlights:
|
|
27 |
- Best of N: By leveraging a combination of response sampling and Best-of-N strategies, we choose the response of top score judged by reward model, yielding better results with spending more inference time. For example, Qwen2-Math-1.5B-Instruct obtains 79.9 on MATH in RM@8 setting and even surpasses the performance of Qwen2-Math-7B-Instruct 75.1 with greedy decoding.
|
28 |
- Comparasion with majority voting (Maj@N): RM@N scores are substantially better than Maj@N scores aross almost all benchmarks and models.
|
29 |
|
|
|
|
|
30 |
|
31 |
## Model Details
|
32 |
|
|
|
27 |
- Best of N: By leveraging a combination of response sampling and Best-of-N strategies, we choose the response of top score judged by reward model, yielding better results with spending more inference time. For example, Qwen2-Math-1.5B-Instruct obtains 79.9 on MATH in RM@8 setting and even surpasses the performance of Qwen2-Math-7B-Instruct 75.1 with greedy decoding.
|
28 |
- Comparasion with majority voting (Maj@N): RM@N scores are substantially better than Maj@N scores aross almost all benchmarks and models.
|
29 |
|
30 |
+
![](http://qianwen-res.oss-accelerate-overseas.aliyuncs.com/Qwen2.5/qwen2.5-math-pipeline.jpeg)
|
31 |
+
|
32 |
|
33 |
## Model Details
|
34 |
|