yutaozhu94 commited on
Commit
6067226
1 Parent(s): f885913

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -6
README.md CHANGED
@@ -35,9 +35,9 @@ Due to the license limitation, for models based on LLaMA, we only provide the we
35
 
36
  ## Evaluation
37
 
38
- We evaluate our YuLan-Chat model on several Chinese and English benchmarks. The evaluation results are shown as follows. Some evaluations are not finished, and we will update their results as soon as possible.
39
 
40
- > 我们在中英文的一些基准测试上对YuLan-Chat进行了评价,其结果如下。有一些评估尚未完成,我们将尽快更新表格中的内容。
41
 
42
  ### MMLU
43
 
@@ -47,8 +47,8 @@ We evaluate our YuLan-Chat model on several Chinese and English benchmarks. The
47
 
48
  | Model | STEM | Social Science | Humanities | Others | Avg. |
49
  | --------------------------------- | :--: | :------------: | :--------: | :----: | :--: |
50
- | YuLan-Chat-1-13B-v1 | | | | | |
51
- | YuLan-Chat-1-65B-v1 | | | | | |
52
  | YuLan-Chat-1-65B-v2 | 46.3 | 67.9 | 56.9 | 63.9 | 58.7 |
53
  | LLaMA-2-13B | 44.6 | 64.2 | 53.9 | 62.2 | 56.2 |
54
  | FlagAlpha/Llama2-Chinese-13b-Chat | 44.4 | 63.2 | 51.6 | 60.6 | 55.0 |
@@ -63,8 +63,8 @@ We evaluate our YuLan-Chat model on several Chinese and English benchmarks. The
63
 
64
  | Model | STEM | Social Science | Humanities | Others | Avg. | Avg. (Hard) |
65
  | --------------------------------- | :--: | :------------: | :--------: | :----: | :--: | :---------: |
66
- | YuLan-Chat-1-13B-v1 | | | | | | |
67
- | YuLan-Chat-1-65B-v1 | 37.6 | 46.4 | 36.8 | 37.5 | 39.1 | 31.4 |
68
  | YuLan-Chat-1-65B-v2 | 39.9 | 55.9 | 47.7 | 43.7 | 45.4 | 31.4 |
69
  | LLaMA-2-13B | 36.9 | 43.2 | 37.6 | 36.6 | 38.2 | 32.0 |
70
  | FlagAlpha/Llama2-Chinese-13b-Chat | 36.8 | 44.5 | 36.3 | 36.5 | 38.1 | 30.9 |
 
35
 
36
  ## Evaluation
37
 
38
+ We evaluate our YuLan-Chat model on several Chinese and English benchmarks. The evaluation results are shown as follows.
39
 
40
+ > 我们在中英文的一些基准测试上对YuLan-Chat进行了评价,其结果如下。
41
 
42
  ### MMLU
43
 
 
47
 
48
  | Model | STEM | Social Science | Humanities | Others | Avg. |
49
  | --------------------------------- | :--: | :------------: | :--------: | :----: | :--: |
50
+ | YuLan-Chat-1-13B-v1 | 39.6 | 57.8 | 42.6 | 57.6 | 49.4 |
51
+ | YuLan-Chat-1-65B-v1 | 49.2 | 71.7 | 57.7 | 66.7 | 61.3 |
52
  | YuLan-Chat-1-65B-v2 | 46.3 | 67.9 | 56.9 | 63.9 | 58.7 |
53
  | LLaMA-2-13B | 44.6 | 64.2 | 53.9 | 62.2 | 56.2 |
54
  | FlagAlpha/Llama2-Chinese-13b-Chat | 44.4 | 63.2 | 51.6 | 60.6 | 55.0 |
 
63
 
64
  | Model | STEM | Social Science | Humanities | Others | Avg. | Avg. (Hard) |
65
  | --------------------------------- | :--: | :------------: | :--------: | :----: | :--: | :---------: |
66
+ | YuLan-Chat-1-13B-v1 | 30.2 | 37.4 | 31.9 | 30.7 | 32.0 | 25.7 |
67
+ | YuLan-Chat-1-65B-v1 | 37.7 | 46.1 | 36.8 | 38.0 | 39.2 | 31.1 |
68
  | YuLan-Chat-1-65B-v2 | 39.9 | 55.9 | 47.7 | 43.7 | 45.4 | 31.4 |
69
  | LLaMA-2-13B | 36.9 | 43.2 | 37.6 | 36.6 | 38.2 | 32.0 |
70
  | FlagAlpha/Llama2-Chinese-13b-Chat | 36.8 | 44.5 | 36.3 | 36.5 | 38.1 | 30.9 |