JustinLin610
commited on
Commit
•
f077bf1
1
Parent(s):
df4768d
Update README.md
Browse files
README.md
CHANGED
@@ -195,7 +195,7 @@ In detail, the setting of profiling is generating 8192 new tokens with 1 context
|
|
195 |
We also profile the peak GPU memory usage for encoding 2048 tokens as context (and generating single token) and generating 8192 tokens (with single token as context) under BF16 or Int4 quantization level, respectively. The results are shown below.
|
196 |
|
197 |
| Quantization Level | Peak Usage for Encoding 2048 Tokens | Peak Usage for Generating 8192 Tokens |
|
198 |
-
|
199 |
| BF16 | 18.99GB | 24.40GB |
|
200 |
| Int4 | 10.20GB | 15.61GB |
|
201 |
|
@@ -211,12 +211,12 @@ The above speed and memory profiling are conducted using [this script](https://q
|
|
211 |
The details of the model architecture of Qwen-7B-Chat are listed as follows:
|
212 |
|
213 |
| Hyperparameter | Value |
|
214 |
-
|
215 |
-
| n_layers |
|
216 |
-
| n_heads |
|
217 |
-
| d_model |
|
218 |
| vocab size | 151851 |
|
219 |
-
| sequence length |
|
220 |
|
221 |
在位置编码、FFN激活函数和normalization的实现方式上,我们也采用了目前最流行的做法,
|
222 |
即RoPE相对位置编码、SwiGLU激活函数、RMSNorm(可选安装flash-attention加速)。
|
@@ -251,7 +251,7 @@ Note: Due to rounding errors caused by hardware and framework, differences in re
|
|
251 |
We demonstrate the 0-shot & 5-shot accuracy of Qwen-7B-Chat on C-Eval validation set
|
252 |
|
253 |
| Model | Avg. Acc. |
|
254 |
-
|
255 |
| LLaMA2-7B-Chat | 31.9 |
|
256 |
| LLaMA2-13B-Chat | 36.2 |
|
257 |
| LLaMA2-70B-Chat | 44.3 |
|
@@ -293,7 +293,7 @@ The 0-shot & 5-shot accuracy of Qwen-7B-Chat on MMLU is provided below.
|
|
293 |
The performance of Qwen-7B-Chat still on the top between other human-aligned models with comparable size.
|
294 |
|
295 |
| Model | Avg. Acc. |
|
296 |
-
|
297 |
| ChatGLM2-6B-Chat | 46.0 |
|
298 |
| LLaMA2-7B-Chat | 46.2 |
|
299 |
| InternLM-7B-Chat | 51.1 |
|
@@ -313,18 +313,18 @@ Qwen-7B-Chat在[HumanEval](https://github.com/openai/human-eval)的zero-shot Pas
|
|
313 |
|
314 |
The zero-shot Pass@1 of Qwen-7B-Chat on [HumanEval](https://github.com/openai/human-eval) is demonstrated below
|
315 |
|
316 |
-
| Model | Pass@1
|
317 |
-
|
318 |
-
| ChatGLM2-6B-Chat | 11.0
|
319 |
-
| LLaMA2-7B-Chat | 12.2
|
320 |
-
| Baichuan2-7B-Chat | 13.4
|
321 |
-
| InternLM-7B-Chat | 14.6
|
322 |
-
| Baichuan2-13B-Chat | 17.7
|
323 |
-
| LLaMA2-13B-Chat | 18.9
|
324 |
-
| LLaMA2-70B-Chat | 32.3
|
325 |
-
| Qwen-7B-Chat (original) | 24.4
|
326 |
-
| **Qwen-7B-Chat** | 37.2
|
327 |
-
| **Qwen-14B-Chat** | **43.9**
|
328 |
|
329 |
### 数学评测(Mathematics Evaluation)
|
330 |
|
@@ -332,20 +332,20 @@ The zero-shot Pass@1 of Qwen-7B-Chat on [HumanEval](https://github.com/openai/hu
|
|
332 |
|
333 |
The accuracy of Qwen-7B-Chat on GSM8K is shown below
|
334 |
|
335 |
-
| Model | Acc.
|
336 |
-
|
337 |
-
| LLaMA2-7B-Chat | 26.3
|
338 |
-
| ChatGLM2-6B-Chat | 28.8
|
339 |
-
| Baichuan2-7B-Chat | 32.8
|
340 |
-
| InternLM-7B-Chat | 33.0
|
341 |
-
| LLaMA2-13B-Chat | 37.1
|
342 |
-
| Baichuan2-13B-Chat | 55.3
|
343 |
-
| LLaMA2-70B-Chat | 59.3
|
344 |
-
| **Qwen-7B-Chat (original) (0-shot)** | 41.1
|
345 |
-
| **Qwen-7B-Chat (0-shot)** | 50.3
|
346 |
-
| **Qwen-7B-Chat (8-shot)** | 54.1
|
347 |
-
| **Qwen-14B-Chat (0-shot)** | **60.1**
|
348 |
-
| **Qwen-14B-Chat (8-shot)** | 59.3
|
349 |
|
350 |
### 长序列评测(Long-Context Understanding)
|
351 |
|
@@ -358,7 +358,7 @@ We introduce NTK-aware interpolation, LogN attention scaling to extend the conte
|
|
358 |
**(To use these tricks, please set `use_dynamic_ntk` and `use_long_attn` to true in config.json.)**
|
359 |
|
360 |
| Model | VCSUM (zh) |
|
361 |
-
|
362 |
| GPT-3.5-Turbo-16k | 16.0 |
|
363 |
| LLama2-7B-Chat | 0.2 |
|
364 |
| InternLM-7B-Chat | 13.0 |
|
|
|
195 |
We also profile the peak GPU memory usage for encoding 2048 tokens as context (and generating single token) and generating 8192 tokens (with single token as context) under BF16 or Int4 quantization level, respectively. The results are shown below.
|
196 |
|
197 |
| Quantization Level | Peak Usage for Encoding 2048 Tokens | Peak Usage for Generating 8192 Tokens |
|
198 |
+
|--------------------|:-----------------------------------:|:-------------------------------------:|
|
199 |
| BF16 | 18.99GB | 24.40GB |
|
200 |
| Int4 | 10.20GB | 15.61GB |
|
201 |
|
|
|
211 |
The details of the model architecture of Qwen-7B-Chat are listed as follows:
|
212 |
|
213 |
| Hyperparameter | Value |
|
214 |
+
|:----------------|:------:|
|
215 |
+
| n_layers | 32 |
|
216 |
+
| n_heads | 32 |
|
217 |
+
| d_model | 4096 |
|
218 |
| vocab size | 151851 |
|
219 |
+
| sequence length | 8192 |
|
220 |
|
221 |
在位置编码、FFN激活函数和normalization的实现方式上,我们也采用了目前最流行的做法,
|
222 |
即RoPE相对位置编码、SwiGLU激活函数、RMSNorm(可选安装flash-attention加速)。
|
|
|
251 |
We demonstrate the 0-shot & 5-shot accuracy of Qwen-7B-Chat on C-Eval validation set
|
252 |
|
253 |
| Model | Avg. Acc. |
|
254 |
+
|:--------------------------------:|:---------:|
|
255 |
| LLaMA2-7B-Chat | 31.9 |
|
256 |
| LLaMA2-13B-Chat | 36.2 |
|
257 |
| LLaMA2-70B-Chat | 44.3 |
|
|
|
293 |
The performance of Qwen-7B-Chat still on the top between other human-aligned models with comparable size.
|
294 |
|
295 |
| Model | Avg. Acc. |
|
296 |
+
|:--------------------------------:|:---------:|
|
297 |
| ChatGLM2-6B-Chat | 46.0 |
|
298 |
| LLaMA2-7B-Chat | 46.2 |
|
299 |
| InternLM-7B-Chat | 51.1 |
|
|
|
313 |
|
314 |
The zero-shot Pass@1 of Qwen-7B-Chat on [HumanEval](https://github.com/openai/human-eval) is demonstrated below
|
315 |
|
316 |
+
| Model | Pass@1 |
|
317 |
+
|:-----------------------:|:--------:|
|
318 |
+
| ChatGLM2-6B-Chat | 11.0 |
|
319 |
+
| LLaMA2-7B-Chat | 12.2 |
|
320 |
+
| Baichuan2-7B-Chat | 13.4 |
|
321 |
+
| InternLM-7B-Chat | 14.6 |
|
322 |
+
| Baichuan2-13B-Chat | 17.7 |
|
323 |
+
| LLaMA2-13B-Chat | 18.9 |
|
324 |
+
| LLaMA2-70B-Chat | 32.3 |
|
325 |
+
| Qwen-7B-Chat (original) | 24.4 |
|
326 |
+
| **Qwen-7B-Chat** | 37.2 |
|
327 |
+
| **Qwen-14B-Chat** | **43.9** |
|
328 |
|
329 |
### 数学评测(Mathematics Evaluation)
|
330 |
|
|
|
332 |
|
333 |
The accuracy of Qwen-7B-Chat on GSM8K is shown below
|
334 |
|
335 |
+
| Model | Acc. |
|
336 |
+
|:------------------------------------:|:--------:|
|
337 |
+
| LLaMA2-7B-Chat | 26.3 |
|
338 |
+
| ChatGLM2-6B-Chat | 28.8 |
|
339 |
+
| Baichuan2-7B-Chat | 32.8 |
|
340 |
+
| InternLM-7B-Chat | 33.0 |
|
341 |
+
| LLaMA2-13B-Chat | 37.1 |
|
342 |
+
| Baichuan2-13B-Chat | 55.3 |
|
343 |
+
| LLaMA2-70B-Chat | 59.3 |
|
344 |
+
| **Qwen-7B-Chat (original) (0-shot)** | 41.1 |
|
345 |
+
| **Qwen-7B-Chat (0-shot)** | 50.3 |
|
346 |
+
| **Qwen-7B-Chat (8-shot)** | 54.1 |
|
347 |
+
| **Qwen-14B-Chat (0-shot)** | **60.1** |
|
348 |
+
| **Qwen-14B-Chat (8-shot)** | 59.3 |
|
349 |
|
350 |
### 长序列评测(Long-Context Understanding)
|
351 |
|
|
|
358 |
**(To use these tricks, please set `use_dynamic_ntk` and `use_long_attn` to true in config.json.)**
|
359 |
|
360 |
| Model | VCSUM (zh) |
|
361 |
+
|:------------------|:----------:|
|
362 |
| GPT-3.5-Turbo-16k | 16.0 |
|
363 |
| LLama2-7B-Chat | 0.2 |
|
364 |
| InternLM-7B-Chat | 13.0 |
|