yulan-team
/

YuLan-LLaMA-2-13b

@@ -35,9 +35,9 @@ Due to the license limitation, for models based on LLaMA, we only provide the we
 ## Evaluation
-We evaluate our YuLan-Chat model on several Chinese and English benchmarks. The evaluation results are shown as follows. Some evaluations are not finished, and we will update their results as soon as possible.
-> 我们在中英文的一些基准测试上对YuLan-Chat进行了评价，其结果如下。有一些评估尚未完成，我们将尽快更新表格中的内容。
 ### MMLU
@@ -47,8 +47,8 @@ We evaluate our YuLan-Chat model on several Chinese and English benchmarks. The
 | Model                             | STEM | Social Science | Humanities | Others | Avg. |
 | --------------------------------- | :--: | :------------: | :--------: | :----: | :--: |
-| YuLan-Chat-1-13B-v1               |      |                |            |        |      |
-| YuLan-Chat-1-65B-v1               |      |                |            |        |      |
 | YuLan-Chat-1-65B-v2               | 46.3 |      67.9      |    56.9    |  63.9  | 58.7 |
 | LLaMA-2-13B                       | 44.6 |      64.2      |    53.9    |  62.2  | 56.2 |
 | FlagAlpha/Llama2-Chinese-13b-Chat | 44.4 |      63.2      |    51.6    |  60.6  | 55.0 |
@@ -63,8 +63,8 @@ We evaluate our YuLan-Chat model on several Chinese and English benchmarks. The
 | Model                             | STEM | Social Science | Humanities | Others | Avg. | Avg. (Hard) |
 | --------------------------------- | :--: | :------------: | :--------: | :----: | :--: | :---------: |
-| YuLan-Chat-1-13B-v1               |      |                |            |        |      |             |
-| YuLan-Chat-1-65B-v1               | 37.6 |      46.4      |    36.8    |  37.5  | 39.1 |    31.4     |
 | YuLan-Chat-1-65B-v2               | 39.9 |      55.9      |    47.7    |  43.7  | 45.4 |    31.4     |
 | LLaMA-2-13B                       | 36.9 |      43.2      |    37.6    |  36.6  | 38.2 |    32.0     |
 | FlagAlpha/Llama2-Chinese-13b-Chat | 36.8 |      44.5      |    36.3    |  36.5  | 38.1 |    30.9     |

 ## Evaluation
+We evaluate our YuLan-Chat model on several Chinese and English benchmarks. The evaluation results are shown as follows.
+> 我们在中英文的一些基准测试上对YuLan-Chat进行了评价，其结果如下。
 ### MMLU
 | Model                             | STEM | Social Science | Humanities | Others | Avg. |
 | --------------------------------- | :--: | :------------: | :--------: | :----: | :--: |
+| YuLan-Chat-1-13B-v1               | 39.6 |      57.8      |    42.6    |  57.6  | 49.4 |
+| YuLan-Chat-1-65B-v1               | 49.2 |      71.7      |    57.7    |  66.7  | 61.3 |
 | YuLan-Chat-1-65B-v2               | 46.3 |      67.9      |    56.9    |  63.9  | 58.7 |
 | LLaMA-2-13B                       | 44.6 |      64.2      |    53.9    |  62.2  | 56.2 |
 | FlagAlpha/Llama2-Chinese-13b-Chat | 44.4 |      63.2      |    51.6    |  60.6  | 55.0 |
 | Model                             | STEM | Social Science | Humanities | Others | Avg. | Avg. (Hard) |
 | --------------------------------- | :--: | :------------: | :--------: | :----: | :--: | :---------: |
+| YuLan-Chat-1-13B-v1               | 30.2 |      37.4      |    31.9    |  30.7  | 32.0 |    25.7     |
+| YuLan-Chat-1-65B-v1               | 37.7 |      46.1      |    36.8    |  38.0  | 39.2 |    31.1     |
 | YuLan-Chat-1-65B-v2               | 39.9 |      55.9      |    47.7    |  43.7  | 45.4 |    31.4     |
 | LLaMA-2-13B                       | 36.9 |      43.2      |    37.6    |  36.6  | 38.2 |    32.0     |
 | FlagAlpha/Llama2-Chinese-13b-Chat | 36.8 |      44.5      |    36.3    |  36.5  | 38.1 |    30.9     |