thomas-yanxin
commited on
Commit
•
c62d83e
1
Parent(s):
7fccd26
Update README.md
Browse files
README.md
CHANGED
@@ -10,4 +10,21 @@ datasets:
|
|
10 |
|
11 |
The main purpose of this model is to validate the usability of [thomas-yanxin/MT-SFT-ShareGPT](https://huggingface.co/datasets/thomas-yanxin/MT-SFT-ShareGPT), i.e., the quality of the data is all you need. We found that when we meticulously extract the data through a better data governance approach, the corresponding model results can be vastly improved, even if only through SFT.
|
12 |
|
13 |
-
Here are the results from our OpenCompass evaluation:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
10 |
|
11 |
The main purpose of this model is to validate the usability of [thomas-yanxin/MT-SFT-ShareGPT](https://huggingface.co/datasets/thomas-yanxin/MT-SFT-ShareGPT), i.e., the quality of the data is all you need. We found that when we meticulously extract the data through a better data governance approach, the corresponding model results can be vastly improved, even if only through SFT.
|
12 |
|
13 |
+
Here are the results from our OpenCompass evaluation:
|
14 |
+
|
15 |
+
| Classification | Benchmarks | Models |
|
16 |
+
| :------------: | :--------: | :--------: |
|
17 |
+
| | 名称 | XinYuan-Qwen2-7B |
|
18 |
+
| English | MMLU | 68.71 |
|
19 |
+
| | MMLU-Pro | 30.56 |
|
20 |
+
| | Theorem QA | 25.3 |
|
21 |
+
| | GPQA | 29.2 |
|
22 |
+
| | BBH | 60.3 |
|
23 |
+
| | IFEval (Prompt Strict-Acc.) | 39.2 |
|
24 |
+
| | ARC-C | 87.5 |
|
25 |
+
| Math | GSM8K | 75.4 |
|
26 |
+
| | MATH | 34.76 |
|
27 |
+
| Chinese | C-EVAL | 82.0 |
|
28 |
+
| | CMMLU | 77.9 |
|
29 |
+
| Code | MBPP | 50.6 |
|
30 |
+
| | HumanEval | 70.1 |
|