xverse
/

XVERSE-7B

Text Generation

Transformers

PyTorch

xverse

custom_code

Model card Files Files and versions Community

pom commited on Sep 26, 2023

Commit

9460c12

•

1 Parent(s): f34fb5c

update readme

Browse files

Files changed (1) hide show

README.md +10 -3

README.md CHANGED Viewed

@@ -27,20 +27,22 @@ inference: false
 ## 评测结果
-为验证模型的各项能力，我们选取了多个学科综合能力评测集，包括 [MMLU](https://arxiv.org/abs/2009.03300)（英文）、 [C-Eval](https://cevalbenchmark.com/)（中文）、[AGIEval](https://arxiv.org/abs/2304.06364)（中英） 、[GAOKAO-Bench](https://github.com/OpenLMLab/GAOKAO-Bench)（中英）、[GAOKAO-English](https://github.com/ExpressAI/AI-Gaokao)（英文），评测结果如下：
 |        模型        | 类型 |       MMLU       |      C-Eval      | AGIEval<sup>1</sup> | GAOKAO-Bench<sup>1</sup> | GAOKAO-English<sup>1</sup> |
 | :----------------: | :--: | :--------------: | :--------------: | :-----------------: | :----------------------: | :------------------------: |
 |    Baichuan-7B     | 底座 | 42.3<sup>2</sup> | 42.8<sup>2</sup> |  34.4<sup>2</sup>   |     36.3<sup>2</sup>     |            44.3            |
 | Baichuan2-7B-Base  | 底座 | 54.2<sup>2</sup> | 54.0<sup>2</sup> |  42.7<sup>2</sup>   |     47.5<sup>2</sup>     |            53.1            |
 |    ChatGLM2-6B     | 对话 | 45.5<sup>2</sup> | 50.1<sup>2</sup> |        42.6         |           54.2           |            59.7            |
 |     Falcon-7B      | 底座 | 27.8<sup>2</sup> |       25.8       |        26.2         |           26.3           |            29.9            |
 |    InternLM-7B     | 底座 | 51.0<sup>2</sup> |       52.4       |        34.1         |           53.6           |            32.3            |
 |      Llama-7B      | 底座 | 35.1<sup>2</sup> |       27.0       |        27.4         |           26.0           |            30.1            |
 |     Llama-2-7B     | 底座 | 45.3<sup>2</sup> |       28.9       |        27.0         |           27.8           |            47.8            |
 |       MPT-7B       | 底座 | 29.6<sup>2</sup> |       27.8       |        24.2         |           25.3           |            28.1            |
 |   Vicuna-7B-v1.5   | 对话 | 49.8<sup>2</sup> |       22.9       |        26.7         |           24.4           |            61.1            |
-|   **XVERSE-7B**    | 底座 |       56.6       |     **57.1**     |        46.9         |         **61.7**         |            71.1            |
 > <sup>1：只针对其中的单项选择题进行测试，即排除了填空题、开放性问题和多项选择题</sup>
 > <sup>2：来源于各模型官方的汇报结果</sup>
@@ -55,14 +57,16 @@ In order to validate the various abilities of the model, we have chosen several
 | :----------------: | :--------: | :--------------: | :--------------: | :-----------------: | :----------------------: | :------------------------: |
 |    Baichuan-7B     | pretrained | 42.3<sup>2</sup> | 42.8<sup>2</sup> |  34.4<sup>2</sup>   |     36.3<sup>2</sup>     |            44.3            |
 | Baichuan2-7B-Base  | pretrained | 54.2<sup>2</sup> | 54.0<sup>2</sup> |  42.7<sup>2</sup>   |     47.5<sup>2</sup>     |            53.1            |
 |    ChatGLM2-6B     | fine-tuned | 45.5<sup>2</sup> | 50.1<sup>2</sup> |        42.6         |           54.2           |            59.7            |
 |     Falcon-7B      | pretrained | 27.8<sup>2</sup> |       25.8       |        26.2         |           26.3           |            29.9            |
 |    InternLM-7B     | pretrained | 51.0<sup>2</sup> |       52.4       |        34.1         |           53.6           |            32.3            |
 |      Llama-7B      | pretrained | 35.1<sup>2</sup> |       27.0       |        27.4         |           26.0           |            30.1            |
 |     Llama-2-7B     | pretrained | 45.3<sup>2</sup> |       28.9       |        27.0         |           27.8           |            47.8            |
 |       MPT-7B       | pretrained | 29.6<sup>2</sup> |       27.8       |        24.2         |           25.3           |            28.1            |
 |   Vicuna-7B-v1.5   | fine-tuned | 49.8<sup>2</sup> |       22.9       |        26.7         |           24.4           |            61.1            |
-|   **XVERSE-7B**    | pretrained |       56.6       |     **57.1**     |        46.9         |         **61.7**         |            71.1            |
 > <sup>1: Tests are conducted only on single-answer multiple-choice questions, thus excluding fill-in-the-blanks, open-ended questions, and multiple-answer multiple-choice questions.</sup>
 > <sup>2: Reporting results from official results of each model.</sup>
@@ -76,6 +80,7 @@ MMLU Category Results
 |       Models       |    Type    | Average  |   STEM   | Social Science | Humanities |  Others  |
 | :----------------: | :--------: | :------: | :------: | :------------: | :--------: | :------: |
 |    Baichuan-7B     | pretrained |   42.3   |   35.6   |      48.9      |    38.4    |   48.1   |
 |    ChatGLM2-6B     | pretrained |   45.5   |   40.1   |      51.6      |    41.2    |   51.2   |
 |    InternLM-7B     | pretrained |   51.0   | **58.7** |      43.5      |  **52.7**  |   53.2   |
 |      LLaMA-7B      | pretrained |   35.1   |   30.5   |      38.3      |    34.0    |   38.1   |
@@ -90,9 +95,11 @@ C-Eval Category Results
 | :----------------: | :--------: | :------: | :------: | :------------: | :--------: | :------: |
 |    Baichuan-7B     | pretrained |   42.8   |   38.2   |      52.0      |    46.2    |   39.3   |
 | Baichuan2-7B-Base  | pretrained |   54.9   |   47.9   |      67.3      |    58.4    |   52.8   |
 |    ChatGLM2-6B     | fine-tuned |   50.1   |   46.4   |      60.4      |    50.6    |   46.9   |
 |     Falcon-7B      | pretrained |   25.8   |   25.8   |      26.0      |    25.8    |   25.7   |
 |    InternLM-7B     | pretrained |   52.4   |   47.0   |      64.9      |    55.6    |   47.6   |
 |      LLaMA-7B      | pretrained |   27.0   |   26.7   |      26.7      |    28.4    |   26.2   |
 |     LLaMA2-7B      | pretrained |   28.9   |   26.8   |      34.5      |    30.0    |   26.4   |
 |       MPT-7B       | pretrained |   27.8   |   27.4   |      29.8      |    26.9    |   27.7   |

 ## 评测结果
+为验证模型的各项能力，我们选取了多个学科综合能力评测集，包括 [MMLU](https://arxiv.org/abs/2009.03300)（英文）、 [C-Eval](https://cevalbenchmark.com/)（中文）、[AGIEval](https://arxiv.org/abs/2304.06364)（中英） 、[GAOKAO-Bench](https://github.com/OpenLMLab/GAOKAO-Bench)（中英）、[GAOKAO-English](https://github.com/ExpressAI/AI-Gaokao)（英文），评测结果如下（粗体表示各项最高得分）：
 |        模型        | 类型 |       MMLU       |      C-Eval      | AGIEval<sup>1</sup> | GAOKAO-Bench<sup>1</sup> | GAOKAO-English<sup>1</sup> |
 | :----------------: | :--: | :--------------: | :--------------: | :-----------------: | :----------------------: | :------------------------: |
 |    Baichuan-7B     | 底座 | 42.3<sup>2</sup> | 42.8<sup>2</sup> |  34.4<sup>2</sup>   |     36.3<sup>2</sup>     |            44.3            |
 | Baichuan2-7B-Base  | 底座 | 54.2<sup>2</sup> | 54.0<sup>2</sup> |  42.7<sup>2</sup>   |     47.5<sup>2</sup>     |            53.1            |
+| Baichuan2-7B-Chat  | 对话 |       53.2       |       52.2       |        41.3         |           49.7           |            66.6            |
 |    ChatGLM2-6B     | 对话 | 45.5<sup>2</sup> | 50.1<sup>2</sup> |        42.6         |           54.2           |            59.7            |
 |     Falcon-7B      | 底座 | 27.8<sup>2</sup> |       25.8       |        26.2         |           26.3           |            29.9            |
 |    InternLM-7B     | 底座 | 51.0<sup>2</sup> |       52.4       |        34.1         |           53.6           |            32.3            |
+|  InternLM-7B-Chat  | 对话 | 50.8<sup>2</sup> |       52.8       |        39.0         |         **67.4**         |            43.9            |
 |      Llama-7B      | 底座 | 35.1<sup>2</sup> |       27.0       |        27.4         |           26.0           |            30.1            |
 |     Llama-2-7B     | 底座 | 45.3<sup>2</sup> |       28.9       |        27.0         |           27.8           |            47.8            |
 |       MPT-7B       | 底座 | 29.6<sup>2</sup> |       27.8       |        24.2         |           25.3           |            28.1            |
 |   Vicuna-7B-v1.5   | 对话 | 49.8<sup>2</sup> |       22.9       |        26.7         |           24.4           |            61.1            |
+|   **XVERSE-7B**    | 底座 |     **56.6**     |     **57.1**     |      **46.9**       |           61.7           |          **71.1**          |
 > <sup>1：只针对其中的单项选择题进行测试，即排除了填空题、开放性问题和多项选择题</sup>
 > <sup>2：来源于各模型官方的汇报结果</sup>
 | :----------------: | :--------: | :--------------: | :--------------: | :-----------------: | :----------------------: | :------------------------: |
 |    Baichuan-7B     | pretrained | 42.3<sup>2</sup> | 42.8<sup>2</sup> |  34.4<sup>2</sup>   |     36.3<sup>2</sup>     |            44.3            |
 | Baichuan2-7B-Base  | pretrained | 54.2<sup>2</sup> | 54.0<sup>2</sup> |  42.7<sup>2</sup>   |     47.5<sup>2</sup>     |            53.1            |
+| Baichuan2-7B-Chat  | fine-tuned |       53.2       |       52.2       |        41.3         |           49.7           |            66.6            |
 |    ChatGLM2-6B     | fine-tuned | 45.5<sup>2</sup> | 50.1<sup>2</sup> |        42.6         |           54.2           |            59.7            |
 |     Falcon-7B      | pretrained | 27.8<sup>2</sup> |       25.8       |        26.2         |           26.3           |            29.9            |
 |    InternLM-7B     | pretrained | 51.0<sup>2</sup> |       52.4       |        34.1         |           53.6           |            32.3            |
+|  InternLM-7B-Chat  | fine-tuned | 50.8<sup>2</sup> |       52.8       |        39.0         |         **67.4**         |            43.9            |
 |      Llama-7B      | pretrained | 35.1<sup>2</sup> |       27.0       |        27.4         |           26.0           |            30.1            |
 |     Llama-2-7B     | pretrained | 45.3<sup>2</sup> |       28.9       |        27.0         |           27.8           |            47.8            |
 |       MPT-7B       | pretrained | 29.6<sup>2</sup> |       27.8       |        24.2         |           25.3           |            28.1            |
 |   Vicuna-7B-v1.5   | fine-tuned | 49.8<sup>2</sup> |       22.9       |        26.7         |           24.4           |            61.1            |
+|   **XVERSE-7B**    | pretrained |     **56.6**     |     **57.1**     |      **46.9**       |           61.7           |          **71.1**          |
 > <sup>1: Tests are conducted only on single-answer multiple-choice questions, thus excluding fill-in-the-blanks, open-ended questions, and multiple-answer multiple-choice questions.</sup>
 > <sup>2: Reporting results from official results of each model.</sup>
 |       Models       |    Type    | Average  |   STEM   | Social Science | Humanities |  Others  |
 | :----------------: | :--------: | :------: | :------: | :------------: | :--------: | :------: |
 |    Baichuan-7B     | pretrained |   42.3   |   35.6   |      48.9      |    38.4    |   48.1   |
+| Baichuan2-7B-Chat  | fine-tuned |   53.2   |   43.1   |      59.1      |    50.0    |   59.1   |
 |    ChatGLM2-6B     | pretrained |   45.5   |   40.1   |      51.6      |    41.2    |   51.2   |
 |    InternLM-7B     | pretrained |   51.0   | **58.7** |      43.5      |  **52.7**  |   53.2   |
 |      LLaMA-7B      | pretrained |   35.1   |   30.5   |      38.3      |    34.0    |   38.1   |
 | :----------------: | :--------: | :------: | :------: | :------------: | :--------: | :------: |
 |    Baichuan-7B     | pretrained |   42.8   |   38.2   |      52.0      |    46.2    |   39.3   |
 | Baichuan2-7B-Base  | pretrained |   54.9   |   47.9   |      67.3      |    58.4    |   52.8   |
+| Baichuan2-7B-Chat  | fine-tuned |   52.2   |   44.6   |      65.0      |    55.8    |   50.9   |
 |    ChatGLM2-6B     | fine-tuned |   50.1   |   46.4   |      60.4      |    50.6    |   46.9   |
 |     Falcon-7B      | pretrained |   25.8   |   25.8   |      26.0      |    25.8    |   25.7   |
 |    InternLM-7B     | pretrained |   52.4   |   47.0   |      64.9      |    55.6    |   47.6   |
+|  InternLM-7B-Chat  | fine-tuned |   52.8   |   48.4   |      65.6      |    57.0    |   45.0   |
 |      LLaMA-7B      | pretrained |   27.0   |   26.7   |      26.7      |    28.4    |   26.2   |
 |     LLaMA2-7B      | pretrained |   28.9   |   26.8   |      34.5      |    30.0    |   26.4   |
 |       MPT-7B       | pretrained |   27.8   |   27.4   |      29.8      |    26.9    |   27.7   |