JustinLin610
commited on
Commit
•
3484186
1
Parent(s):
d3f8e6a
Update README.md (#14)
Browse files- Update README.md (ba58a26fdc08c2f328d5c649531c95c25d6b06d0)
- Update README.md (5fb7205e085f1f400a9a05961f327e687d8c8e6b)
- Update README.md (295bfbcbbd86d57d28312c96e7fa92b381be8ffe)
- Update README.md (b78c48e1afb02ce5e1d6fe8fbfb4a119517d7f81)
README.md
CHANGED
@@ -16,7 +16,7 @@ inference: false
|
|
16 |
<br>
|
17 |
|
18 |
<p align="center">
|
19 |
-
🤗 <a href="https://huggingface.co/Qwen">Hugging Face</a>   |   🤖 <a href="https://modelscope.cn/
|
20 |
<br>
|
21 |
<a href="https://github.com/QwenLM/Qwen/blob/main/assets/wechat.png">WeChat (微信)</a>   |    DingTalk (钉钉)    |   <a href="https://discord.gg/z3GAxXZ9Ce">Discord</a>  
|
22 |
</p>
|
@@ -32,7 +32,7 @@ inference: false
|
|
32 |
2. **强大的性能**:Qwen-7B在多个中英文下游评测任务上(涵盖常识推理、代码、数学、翻译等),效果显著超越现有的相近规模开源模型,甚至在部分指标上相比更大尺寸模型也有较强竞争力。具体评测结果请详见下文。
|
33 |
3. **覆盖更全面的词表**:相比目前以中英词表为主的开源模型,Qwen-7B使用了约15万大小的词表。该词表对多语言更加友好,方便用户在不扩展词表的情况下对部分语种进行能力增强和扩展。
|
34 |
|
35 |
-
如果您想了解更多关于通义千问7B开源模型的细节,我们建议您参阅[
|
36 |
|
37 |
**Qwen-7B** is the 7B-parameter version of the large language model series, Qwen (abbr. Tongyi Qianwen), proposed by Alibaba Cloud. Qwen-7B is a Transformer-based large language model, which is pretrained on a large volume of data, including web texts, books, codes, etc. Additionally, based on the pretrained Qwen-7B, we release Qwen-7B-Chat, a large-model-based AI assistant, which is trained with alignment techniques. Now we have updated both our pretrained and chat models for better performances. This repository is the one for the Qwen-7B base language model.
|
38 |
|
@@ -42,7 +42,7 @@ The features of Qwen-7B include:
|
|
42 |
2. **Competitive performance**: It significantly surpasses existing open-source models of similar scale on multiple Chinese and English downstream evaluation tasks (including commonsense, reasoning, code, mathematics, etc.), and even surpasses some larger-scale models in several benchmarks. See below for specific evaluation results.
|
43 |
3. **More comprehensive vocabulary coverage**: Compared with other open-source models based on Chinese and English vocabularies, Qwen-7B uses a vocabulary of over 150K tokens. This vocabulary is more friendly to multiple languages, enabling users to directly further enhance the capability for certain languages without expanding the vocabulary.
|
44 |
|
45 |
-
For more details about Qwen, please refer to the [
|
46 |
<br>
|
47 |
|
48 |
## 要求(Requirements)
|
@@ -65,15 +65,14 @@ To run Qwen-7B, please make sure you meet the above requirements, and then execu
|
|
65 |
pip install transformers==4.32.0 accelerate tiktoken einops scipy transformers_stream_generator==0.0.4 peft deepspeed
|
66 |
```
|
67 |
|
68 |
-
另外,推荐安装`flash-attention
|
69 |
|
70 |
-
In addition, it is recommended to install the `flash-attention` library for higher efficiency and lower memory usage.
|
71 |
|
72 |
```bash
|
73 |
-
git clone
|
74 |
cd flash-attention && pip install .
|
75 |
# 下方安装可选,安装可能比较缓慢。
|
76 |
-
# Below are optional. Installing them might be slow.
|
77 |
# pip install csrc/layer_norm
|
78 |
# pip install csrc/rotary
|
79 |
```
|
@@ -101,8 +100,8 @@ tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen-7B", trust_remote_code=True
|
|
101 |
# use auto mode, automatically select precision based on the device.
|
102 |
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B", device_map="auto", trust_remote_code=True).eval()
|
103 |
|
104 |
-
# Specify hyperparameters for generation
|
105 |
-
model.generation_config = GenerationConfig.from_pretrained("Qwen/Qwen-7B", trust_remote_code=True)
|
106 |
|
107 |
inputs = tokenizer('蒙古国的首都是乌兰巴托(Ulaanbaatar)\n冰岛的首都是雷克雅未克(Reykjavik)\n埃塞俄比亚的首都是', return_tensors='pt')
|
108 |
inputs = inputs.to(model.device)
|
@@ -111,9 +110,9 @@ print(tokenizer.decode(pred.cpu()[0], skip_special_tokens=True))
|
|
111 |
# 蒙古国的首都是乌兰巴托(Ulaanbaatar)\n冰岛的首都是雷克雅未克(Reykjavik)\n埃塞俄比亚的首都是亚的斯亚贝巴(Addis Ababa)...
|
112 |
```
|
113 |
|
114 |
-
关于更多的使用说明,请参考我们的[
|
115 |
|
116 |
-
For more information, please refer to our [
|
117 |
<br>
|
118 |
|
119 |
## Tokenizer
|
@@ -171,20 +170,20 @@ The scale of pretraining corpus reaches over 2.4T tokens after deduplication and
|
|
171 |
|
172 |
We selected MMLU, C-Eval, GSM8K, MATH, HumanEval, MBPP, BBH, CMMLU, which are currently popular benchmarks, to test the model’s Chinese and English knowledge capabilities, translation, mathematical reasoning, coding and other capabilities. From the following comprehensive evaluation results, we can see that the Qwen model outperform the similarly sized open-source models on all tasks.
|
173 |
|
174 |
-
| Model | MMLU | C-Eval | GSM8K |
|
175 |
-
|
176 |
-
| |
|
177 |
-
| LLaMA2-7B
|
178 |
-
| LLaMA2-13B
|
179 |
-
| LLaMA2-34B
|
180 |
-
| ChatGLM2-6B
|
181 |
-
| InternLM-7B
|
182 |
-
| InternLM-20B
|
183 |
-
| Baichuan2-7B
|
184 |
-
| Baichuan2-13B
|
185 |
-
| Qwen-7B (original) | 56.7 | 59.6 | 51.6 |
|
186 |
-
| **Qwen-7B** | 58.2 | 63.5 | 51.7 | 11.6 | 29.9 | 31.6
|
187 |
-
| **Qwen-14B** | **66.3** | **72.1** | **61.3** | **24.8** | **32.3** | **40.8**
|
188 |
|
189 |
### 长序列评测(Long-Context Evaluation)
|
190 |
|
@@ -243,6 +242,22 @@ We have provided evaluation scripts to reproduce the performance of our model, d
|
|
243 |
If you meet problems, please refer to [FAQ](https://github.com/QwenLM/Qwen/blob/main/FAQ.md) and the issues first to search a solution before you launch a new issue.
|
244 |
<br>
|
245 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
246 |
## 使用协议(License Agreement)
|
247 |
|
248 |
我们的代码和模型权重对学术研究完全开放,并支持商用。请查看[LICENSE](https://github.com/QwenLM/Qwen/blob/main/LICENSE)了解具体的开源协议细节。如需商用,请填写[问卷](https://dashscope.console.aliyun.com/openModelApply/qianwen)申请。
|
|
|
16 |
<br>
|
17 |
|
18 |
<p align="center">
|
19 |
+
🤗 <a href="https://huggingface.co/Qwen">Hugging Face</a>   |   🤖 <a href="https://modelscope.cn/organization/qwen">ModelScope</a>   |    📑 <a href="https://arxiv.org/abs/2309.16609">Paper</a>   |   🖥️ <a href="https://modelscope.cn/studios/qwen/Qwen-7B-Chat-Demo/summary">Demo</a>
|
20 |
<br>
|
21 |
<a href="https://github.com/QwenLM/Qwen/blob/main/assets/wechat.png">WeChat (微信)</a>   |    DingTalk (钉钉)    |   <a href="https://discord.gg/z3GAxXZ9Ce">Discord</a>  
|
22 |
</p>
|
|
|
32 |
2. **强大的性能**:Qwen-7B在多个中英文下游评测任务上(涵盖常识推理、代码、数学、翻译等),效果显著超越现有的相近规模开源模型,甚至在部分指标上相比更大尺寸模型也有较强竞争力。具体评测结果请详见下文。
|
33 |
3. **覆盖更全面的词表**:相比目前以中英词表为主的开源模型,Qwen-7B使用了约15万大小的词表。该词表对多语言更加友好,方便用户在不扩展词表的情况下对部分语种进行能力增强和扩展。
|
34 |
|
35 |
+
如果您想了解更多关于通义千问7B开源模型的细节,我们建议您参阅[GitHub代码库](https://github.com/QwenLM/Qwen)。
|
36 |
|
37 |
**Qwen-7B** is the 7B-parameter version of the large language model series, Qwen (abbr. Tongyi Qianwen), proposed by Alibaba Cloud. Qwen-7B is a Transformer-based large language model, which is pretrained on a large volume of data, including web texts, books, codes, etc. Additionally, based on the pretrained Qwen-7B, we release Qwen-7B-Chat, a large-model-based AI assistant, which is trained with alignment techniques. Now we have updated both our pretrained and chat models for better performances. This repository is the one for the Qwen-7B base language model.
|
38 |
|
|
|
42 |
2. **Competitive performance**: It significantly surpasses existing open-source models of similar scale on multiple Chinese and English downstream evaluation tasks (including commonsense, reasoning, code, mathematics, etc.), and even surpasses some larger-scale models in several benchmarks. See below for specific evaluation results.
|
43 |
3. **More comprehensive vocabulary coverage**: Compared with other open-source models based on Chinese and English vocabularies, Qwen-7B uses a vocabulary of over 150K tokens. This vocabulary is more friendly to multiple languages, enabling users to directly further enhance the capability for certain languages without expanding the vocabulary.
|
44 |
|
45 |
+
For more details about Qwen, please refer to the [GitHub](https://github.com/QwenLM/Qwen) code repository.
|
46 |
<br>
|
47 |
|
48 |
## 要求(Requirements)
|
|
|
65 |
pip install transformers==4.32.0 accelerate tiktoken einops scipy transformers_stream_generator==0.0.4 peft deepspeed
|
66 |
```
|
67 |
|
68 |
+
另外,推荐安装`flash-attention`库(**当前已支持flash attention 2**),以实现更高的效率和更低的显存占用。
|
69 |
|
70 |
+
In addition, it is recommended to install the `flash-attention` library (**we support flash attention 2 now.**) for higher efficiency and lower memory usage.
|
71 |
|
72 |
```bash
|
73 |
+
git clone https://github.com/Dao-AILab/flash-attention
|
74 |
cd flash-attention && pip install .
|
75 |
# 下方安装可选,安装可能比较缓慢。
|
|
|
76 |
# pip install csrc/layer_norm
|
77 |
# pip install csrc/rotary
|
78 |
```
|
|
|
100 |
# use auto mode, automatically select precision based on the device.
|
101 |
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B", device_map="auto", trust_remote_code=True).eval()
|
102 |
|
103 |
+
# Specify hyperparameters for generation. But if you use transformers>=4.32.0, there is no need to do this.
|
104 |
+
# model.generation_config = GenerationConfig.from_pretrained("Qwen/Qwen-7B", trust_remote_code=True)
|
105 |
|
106 |
inputs = tokenizer('蒙古国的首都是乌兰巴托(Ulaanbaatar)\n冰岛的首都是雷克雅未克(Reykjavik)\n埃塞俄比亚的首都是', return_tensors='pt')
|
107 |
inputs = inputs.to(model.device)
|
|
|
110 |
# 蒙古国的首都是乌兰巴托(Ulaanbaatar)\n冰岛的首都是雷克雅未克(Reykjavik)\n埃塞俄比亚的首都是亚的斯亚贝巴(Addis Ababa)...
|
111 |
```
|
112 |
|
113 |
+
关于更多的使用说明,请参考我们的[GitHub repo](https://github.com/QwenLM/Qwen)获取更多信息。
|
114 |
|
115 |
+
For more information, please refer to our [GitHub repo](https://github.com/QwenLM/Qwen) for more information.
|
116 |
<br>
|
117 |
|
118 |
## Tokenizer
|
|
|
170 |
|
171 |
We selected MMLU, C-Eval, GSM8K, MATH, HumanEval, MBPP, BBH, CMMLU, which are currently popular benchmarks, to test the model’s Chinese and English knowledge capabilities, translation, mathematical reasoning, coding and other capabilities. From the following comprehensive evaluation results, we can see that the Qwen model outperform the similarly sized open-source models on all tasks.
|
172 |
|
173 |
+
| Model | MMLU | C-Eval | GSM8K | MATH | HumanEval | MBPP | BBH | CMMLU |
|
174 |
+
|:-------------------|:--------:|:--------:|:--------:|:--------:|:---------:|:--------:|:--------:|:--------:|
|
175 |
+
| | 5-shot | 5-shot | 8-shot | 4-shot | 0-shot | 3-shot | 3-shot | 5-shot |
|
176 |
+
| LLaMA2-7B | 46.8 | 32.5 | 16.7 | 3.3 | 12.8 | 20.8 | 38.2 | 31.8 |
|
177 |
+
| LLaMA2-13B | 55.0 | 41.4 | 29.6 | 5.0 | 18.9 | 30.3 | 45.6 | 38.4 |
|
178 |
+
| LLaMA2-34B | 62.6 | - | 42.2 | 6.2 | 22.6 | 33.0 | 44.1 | - |
|
179 |
+
| ChatGLM2-6B | 47.9 | 51.7 | 32.4 | 6.5 | - | - | 33.7 | - |
|
180 |
+
| InternLM-7B | 51.0 | 53.4 | 31.2 | 6.3 | 10.4 | 14.0 | 37.0 | 51.8 |
|
181 |
+
| InternLM-20B | 62.1 | 58.8 | 52.6 | 7.9 | 25.6 | 35.6 | 52.5 | 59.0 |
|
182 |
+
| Baichuan2-7B | 54.7 | 56.3 | 24.6 | 5.6 | 18.3 | 24.2 | 41.6 | 57.1 |
|
183 |
+
| Baichuan2-13B | 59.5 | 59.0 | 52.8 | 10.1 | 17.1 | 30.2 | 49.0 | 62.0 |
|
184 |
+
| Qwen-7B (original) | 56.7 | 59.6 | 51.6 | - | 24.4 | 31.2 | 40.6 | 58.8 |
|
185 |
+
| **Qwen-7B** | 58.2 | 63.5 | 51.7 | 11.6 | 29.9 | 31.6 | 45.0 | 62.2 |
|
186 |
+
| **Qwen-14B** | **66.3** | **72.1** | **61.3** | **24.8** | **32.3** | **40.8** | **53.4** | **71.0** |
|
187 |
|
188 |
### 长序列评测(Long-Context Evaluation)
|
189 |
|
|
|
242 |
If you meet problems, please refer to [FAQ](https://github.com/QwenLM/Qwen/blob/main/FAQ.md) and the issues first to search a solution before you launch a new issue.
|
243 |
<br>
|
244 |
|
245 |
+
## 引用 (Citation)
|
246 |
+
|
247 |
+
如果你觉得我们的工作对你有帮助,欢迎引用!
|
248 |
+
|
249 |
+
If you find our work helpful, feel free to give us a cite.
|
250 |
+
|
251 |
+
```
|
252 |
+
@article{qwen,
|
253 |
+
title={Qwen Technical Report},
|
254 |
+
author={Jinze Bai and Shuai Bai and Yunfei Chu and Zeyu Cui and Kai Dang and Xiaodong Deng and Yang Fan and Wenbin Ge and Yu Han and Fei Huang and Binyuan Hui and Luo Ji and Mei Li and Junyang Lin and Runji Lin and Dayiheng Liu and Gao Liu and Chengqiang Lu and Keming Lu and Jianxin Ma and Rui Men and Xingzhang Ren and Xuancheng Ren and Chuanqi Tan and Sinan Tan and Jianhong Tu and Peng Wang and Shijie Wang and Wei Wang and Shengguang Wu and Benfeng Xu and Jin Xu and An Yang and Hao Yang and Jian Yang and Shusheng Yang and Yang Yao and Bowen Yu and Hongyi Yuan and Zheng Yuan and Jianwei Zhang and Xingxuan Zhang and Yichang Zhang and Zhenru Zhang and Chang Zhou and Jingren Zhou and Xiaohuan Zhou and Tianhang Zhu},
|
255 |
+
journal={arXiv preprint arXiv:2309.16609},
|
256 |
+
year={2023}
|
257 |
+
}
|
258 |
+
```
|
259 |
+
<br>
|
260 |
+
|
261 |
## 使用协议(License Agreement)
|
262 |
|
263 |
我们的代码和模型权重对学术研究完全开放,并支持商用。请查看[LICENSE](https://github.com/QwenLM/Qwen/blob/main/LICENSE)了解具体的开源协议细节。如需商用,请填写[问卷](https://dashscope.console.aliyun.com/openModelApply/qianwen)申请。
|