Upload 2 files
Browse files- README.md +21 -15
- README_en.md +20 -15
README.md
CHANGED
@@ -11,6 +11,7 @@ pipeline_tag: text-generation
|
|
11 |
---
|
12 |
**Read this in other languages: [English](README_en.md), [中文](README.md).**
|
13 |
|
|
|
14 |
* 2023.12.23更新:发布LongBench的passage_retrieval_en的评测结果
|
15 |
* 2023.12.16更新:发布[论文(中文版)](https://cloud.tsinghua.edu.cn/d/5894ec4442e54a6aac96/)、[论文(英文版)](https://arxiv.org/abs/2312.11193)
|
16 |
* 2023.12.14更新:发布经过微调的Qwen-14b-chat-yarn-32k,微调后的模型能适应32k长度(约4万汉字)的中英问答,相较于之前的通过位置插值得到的32k模型,几乎完全解决了多文档问答任务下召回率低(即 lost in middle 现象)的问题。
|
@@ -22,23 +23,28 @@ pipeline_tag: text-generation
|
|
22 |
|
23 |
# LongBench测试结果
|
24 |
### LongBench的passage_retrieval_zh的评测结果
|
25 |
-
| 模型 | 得分 (acc)
|
26 |
-
|
27 |
-
| **Qwen-14b-chat-yarn-32k**
|
28 |
-
|gpt-3.5-turbo-16k
|
29 |
-
| chatglm3-32k | 0.725
|
30 |
-
| Qwen-14b-chat
|
31 |
-
| Qwen-14b-chat-32k-lora | 0.34
|
32 |
-
|
|
33 |
-
|
|
|
|
|
|
|
|
34 |
|
35 |
### LongBench的passage_retrieval_en的评测结果
|
36 |
-
| 模型
|
37 |
-
|
38 |
-
| **Qwen-14b-chat-yarn-32k**
|
39 |
-
|
|
40 |
-
|
|
41 |
-
|
|
|
|
|
|
42 |
|
43 |
Qwen-14b-chat-yarn-32k经过微调后,在多文档问答(或检索)任务上提升非常显著,大幅领先其他同规模的模型。
|
44 |
|
|
|
11 |
---
|
12 |
**Read this in other languages: [English](README_en.md), [中文](README.md).**
|
13 |
|
14 |
+
* 2023.12.28更新:发布Qwen-7b-chat-yarn-32k,但注意,可能由于模型规模偏小,基座模型能力弱,导致7b版本显著弱于Qwen-14b-chat-yarn-32k
|
15 |
* 2023.12.23更新:发布LongBench的passage_retrieval_en的评测结果
|
16 |
* 2023.12.16更新:发布[论文(中文版)](https://cloud.tsinghua.edu.cn/d/5894ec4442e54a6aac96/)、[论文(英文版)](https://arxiv.org/abs/2312.11193)
|
17 |
* 2023.12.14更新:发布经过微调的Qwen-14b-chat-yarn-32k,微调后的模型能适应32k长度(约4万汉字)的中英问答,相较于之前的通过位置插值得到的32k模型,几乎完全解决了多文档问答任务下召回率低(即 lost in middle 现象)的问题。
|
|
|
23 |
|
24 |
# LongBench测试结果
|
25 |
### LongBench的passage_retrieval_zh的评测结果
|
26 |
+
| 模型 | 得分 (acc) |
|
27 |
+
|------------------------------|------------|
|
28 |
+
| **Qwen-14b-chat-yarn-32k** | **0.94** |
|
29 |
+
| gpt-3.5-turbo-16k | 0.81 |
|
30 |
+
| chatglm3-32k | 0.725 |
|
31 |
+
| Qwen-14b-chat | 0.525 |
|
32 |
+
| Qwen-14b-chat-32k-lora | 0.34 |
|
33 |
+
| **Qwen-7b-chat-yarn-32k** | **0.325** |
|
34 |
+
| Qwen-7b-chat | 0.26 |
|
35 |
+
| LongAlpaca-7b-32k-chinese-v2 | 0.12 |
|
36 |
+
| CausalLM-14b | 0.086 |
|
37 |
+
|
38 |
|
39 |
### LongBench的passage_retrieval_en的评测结果
|
40 |
+
| 模型 | 得分 (acc) |
|
41 |
+
|-----------------------------|------------|
|
42 |
+
| **Qwen-14b-chat-yarn-32k** | **0.945** |
|
43 |
+
| chatglm3-32k | 0.815 |
|
44 |
+
| gpt-3.5-turbo-16k | 0.88 |
|
45 |
+
| **Qwen-7b-chat-yarn-32k** | **0.47** |
|
46 |
+
| Qwen-14b-chat | 0.24 |
|
47 |
+
| Qwen-7b-chat | 0.235 |
|
48 |
|
49 |
Qwen-14b-chat-yarn-32k经过微调后,在多文档问答(或检索)任务上提升非常显著,大幅领先其他同规模的模型。
|
50 |
|
README_en.md
CHANGED
@@ -11,6 +11,7 @@ pipeline_tag: text-generation
|
|
11 |
---
|
12 |
**Read this in other languages: [English](README_en.md), [中文](README.md).**
|
13 |
|
|
|
14 |
* Updated on December 23, 2023: Release the evaluation results of passage_retrieval_en in LongBench
|
15 |
* Updated on December 16, 2023: Release [Paper](https://arxiv.org/abs/2312.11193)
|
16 |
* Updated on December 14, 2023: We have released the Qwen-14b-chat-yarn-32k model, which has been fine-tuned to handle Chinese and English question-answering tasks with a length of up to 32k (approximately 40,000 Chinese characters). This model addresses the low recall issue in multi-document question-answering tasks (also known as the "lost in middle" phenomenon) that was present in the previous 32k model obtained through position interpolation. <br>
|
@@ -21,23 +22,27 @@ pipeline_tag: text-generation
|
|
21 |
# Evaluation results in LongBench
|
22 |
### Evaluation results for passage_retrieval_zh in LongBench
|
23 |
|
24 |
-
| Models | Accuracy
|
25 |
-
|
26 |
-
| **Qwen-14b-chat-yarn-32k** | **0.94**
|
27 |
-
| gpt-3.5-turbo-16k | 0.81
|
28 |
-
| chatglm3-32k | 0.725
|
29 |
-
| Qwen-14b-chat | 0.525
|
30 |
-
| Qwen-14b-chat-32k-lora | 0.34
|
31 |
-
|
|
32 |
-
|
|
|
|
|
|
33 |
|
34 |
### Evaluation results for passage_retrieval_en in LongBench
|
35 |
-
| Models
|
36 |
-
|
37 |
-
| **Qwen-14b-chat-yarn-32k**
|
38 |
-
|
|
39 |
-
|
|
40 |
-
|
|
|
|
|
|
41 |
|
42 |
|
43 |
Qwen-14b-chat-yarn-32k has shown significant improvement in multi-document question-answering (or retrieval) tasks and outperforms other models of similar scale.
|
|
|
11 |
---
|
12 |
**Read this in other languages: [English](README_en.md), [中文](README.md).**
|
13 |
|
14 |
+
* Updated on December 28, 2023: Release Qwen-7b-chat-yarn-32k, but note that the 7b version may be significantly weaker than Qwen-14b-chat-yarn-32k due to the small model size and weak base model capabilities.
|
15 |
* Updated on December 23, 2023: Release the evaluation results of passage_retrieval_en in LongBench
|
16 |
* Updated on December 16, 2023: Release [Paper](https://arxiv.org/abs/2312.11193)
|
17 |
* Updated on December 14, 2023: We have released the Qwen-14b-chat-yarn-32k model, which has been fine-tuned to handle Chinese and English question-answering tasks with a length of up to 32k (approximately 40,000 Chinese characters). This model addresses the low recall issue in multi-document question-answering tasks (also known as the "lost in middle" phenomenon) that was present in the previous 32k model obtained through position interpolation. <br>
|
|
|
22 |
# Evaluation results in LongBench
|
23 |
### Evaluation results for passage_retrieval_zh in LongBench
|
24 |
|
25 |
+
| Models | Accuracy |
|
26 |
+
|------------------------------|-------------|
|
27 |
+
| **Qwen-14b-chat-yarn-32k** | **0.94** |
|
28 |
+
| gpt-3.5-turbo-16k | 0.81 |
|
29 |
+
| chatglm3-32k | 0.725 |
|
30 |
+
| Qwen-14b-chat | 0.525 |
|
31 |
+
| Qwen-14b-chat-32k-lora | 0.34 |
|
32 |
+
| **Qwen-7b-chat-yarn-32k** | **0.325** |
|
33 |
+
| Qwen-7b-chat | 0.26 |
|
34 |
+
| LongAlpaca-7b-32k-chinese-v2 | 0.12 |
|
35 |
+
| CausalLM-14b | 0.086 |
|
36 |
|
37 |
### Evaluation results for passage_retrieval_en in LongBench
|
38 |
+
| Models | Accuracy |
|
39 |
+
|----------------------------------|---------------|
|
40 |
+
| **Qwen-14b-chat-yarn-32k** | **0.945** |
|
41 |
+
| chatglm3-32k | 0.815 |
|
42 |
+
| gpt-3.5-turbo-16k | 0.88 |
|
43 |
+
| **Qwen-7b-chat-yarn-32k** | **0.47** |
|
44 |
+
| Qwen-14b-chat | 0.24 |
|
45 |
+
| Qwen-7b-chat | 0.235 |
|
46 |
|
47 |
|
48 |
Qwen-14b-chat-yarn-32k has shown significant improvement in multi-document question-answering (or retrieval) tasks and outperforms other models of similar scale.
|