thomas-yanxin commited on
Commit
c62d83e
1 Parent(s): 7fccd26

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +18 -1
README.md CHANGED
@@ -10,4 +10,21 @@ datasets:
10
 
11
  The main purpose of this model is to validate the usability of [thomas-yanxin/MT-SFT-ShareGPT](https://huggingface.co/datasets/thomas-yanxin/MT-SFT-ShareGPT), i.e., the quality of the data is all you need. We found that when we meticulously extract the data through a better data governance approach, the corresponding model results can be vastly improved, even if only through SFT.
12
 
13
- Here are the results from our OpenCompass evaluation:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
10
 
11
  The main purpose of this model is to validate the usability of [thomas-yanxin/MT-SFT-ShareGPT](https://huggingface.co/datasets/thomas-yanxin/MT-SFT-ShareGPT), i.e., the quality of the data is all you need. We found that when we meticulously extract the data through a better data governance approach, the corresponding model results can be vastly improved, even if only through SFT.
12
 
13
+ Here are the results from our OpenCompass evaluation:
14
+
15
+ | Classification | Benchmarks | Models |
16
+ | :------------: | :--------: | :--------: |
17
+ | | 名称 | XinYuan-Qwen2-7B |
18
+ | English | MMLU | 68.71 |
19
+ | | MMLU-Pro | 30.56 |
20
+ | | Theorem QA | 25.3 |
21
+ | | GPQA | 29.2 |
22
+ | | BBH | 60.3 |
23
+ | | IFEval (Prompt Strict-Acc.) | 39.2 |
24
+ | | ARC-C | 87.5 |
25
+ | Math | GSM8K | 75.4 |
26
+ | | MATH | 34.76 |
27
+ | Chinese | C-EVAL | 82.0 |
28
+ | | CMMLU | 77.9 |
29
+ | Code | MBPP | 50.6 |
30
+ | | HumanEval | 70.1 |