Qwen
/

JustinLin610 commited on
Commit
f077bf1
1 Parent(s): df4768d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +35 -35
README.md CHANGED
@@ -195,7 +195,7 @@ In detail, the setting of profiling is generating 8192 new tokens with 1 context
195
  We also profile the peak GPU memory usage for encoding 2048 tokens as context (and generating single token) and generating 8192 tokens (with single token as context) under BF16 or Int4 quantization level, respectively. The results are shown below.
196
 
197
  | Quantization Level | Peak Usage for Encoding 2048 Tokens | Peak Usage for Generating 8192 Tokens |
198
- | ------------------ | :---------------------------------: | :-----------------------------------: |
199
  | BF16 | 18.99GB | 24.40GB |
200
  | Int4 | 10.20GB | 15.61GB |
201
 
@@ -211,12 +211,12 @@ The above speed and memory profiling are conducted using [this script](https://q
211
  The details of the model architecture of Qwen-7B-Chat are listed as follows:
212
 
213
  | Hyperparameter | Value |
214
- | :------------- | :----: |
215
- | n_layers | 32 |
216
- | n_heads | 32 |
217
- | d_model | 4096 |
218
  | vocab size | 151851 |
219
- | sequence length | 8192 |
220
 
221
  在位置编码、FFN激活函数和normalization的实现方式上,我们也采用了目前最流行的做法,
222
  即RoPE相对位置编码、SwiGLU激活函数、RMSNorm(可选安装flash-attention加速)。
@@ -251,7 +251,7 @@ Note: Due to rounding errors caused by hardware and framework, differences in re
251
  We demonstrate the 0-shot & 5-shot accuracy of Qwen-7B-Chat on C-Eval validation set
252
 
253
  | Model | Avg. Acc. |
254
- |:--------------------------------:| :-------: |
255
  | LLaMA2-7B-Chat | 31.9 |
256
  | LLaMA2-13B-Chat | 36.2 |
257
  | LLaMA2-70B-Chat | 44.3 |
@@ -293,7 +293,7 @@ The 0-shot & 5-shot accuracy of Qwen-7B-Chat on MMLU is provided below.
293
  The performance of Qwen-7B-Chat still on the top between other human-aligned models with comparable size.
294
 
295
  | Model | Avg. Acc. |
296
- |:--------------------------------:| :-------: |
297
  | ChatGLM2-6B-Chat | 46.0 |
298
  | LLaMA2-7B-Chat | 46.2 |
299
  | InternLM-7B-Chat | 51.1 |
@@ -313,18 +313,18 @@ Qwen-7B-Chat在[HumanEval](https://github.com/openai/human-eval)的zero-shot Pas
313
 
314
  The zero-shot Pass@1 of Qwen-7B-Chat on [HumanEval](https://github.com/openai/human-eval) is demonstrated below
315
 
316
- | Model | Pass@1 |
317
- |:-----------------------:| :-------: |
318
- | ChatGLM2-6B-Chat | 11.0 |
319
- | LLaMA2-7B-Chat | 12.2 |
320
- | Baichuan2-7B-Chat | 13.4 |
321
- | InternLM-7B-Chat | 14.6 |
322
- | Baichuan2-13B-Chat | 17.7 |
323
- | LLaMA2-13B-Chat | 18.9 |
324
- | LLaMA2-70B-Chat | 32.3 |
325
- | Qwen-7B-Chat (original) | 24.4 |
326
- | **Qwen-7B-Chat** | 37.2 |
327
- | **Qwen-14B-Chat** | **43.9** |
328
 
329
  ### 数学评测(Mathematics Evaluation)
330
 
@@ -332,20 +332,20 @@ The zero-shot Pass@1 of Qwen-7B-Chat on [HumanEval](https://github.com/openai/hu
332
 
333
  The accuracy of Qwen-7B-Chat on GSM8K is shown below
334
 
335
- | Model | Acc. |
336
- |:------------------------------------:| :-------: |
337
- | LLaMA2-7B-Chat | 26.3 |
338
- | ChatGLM2-6B-Chat | 28.8 |
339
- | Baichuan2-7B-Chat | 32.8 |
340
- | InternLM-7B-Chat | 33.0 |
341
- | LLaMA2-13B-Chat | 37.1 |
342
- | Baichuan2-13B-Chat | 55.3 |
343
- | LLaMA2-70B-Chat | 59.3 |
344
- | **Qwen-7B-Chat (original) (0-shot)** | 41.1 |
345
- | **Qwen-7B-Chat (0-shot)** | 50.3 |
346
- | **Qwen-7B-Chat (8-shot)** | 54.1 |
347
- | **Qwen-14B-Chat (0-shot)** | **60.1** |
348
- | **Qwen-14B-Chat (8-shot)** | 59.3 |
349
 
350
  ### 长序列评测(Long-Context Understanding)
351
 
@@ -358,7 +358,7 @@ We introduce NTK-aware interpolation, LogN attention scaling to extend the conte
358
  **(To use these tricks, please set `use_dynamic_ntk` and `use_long_attn` to true in config.json.)**
359
 
360
  | Model | VCSUM (zh) |
361
- | :---------------- | :--------: |
362
  | GPT-3.5-Turbo-16k | 16.0 |
363
  | LLama2-7B-Chat | 0.2 |
364
  | InternLM-7B-Chat | 13.0 |
 
195
  We also profile the peak GPU memory usage for encoding 2048 tokens as context (and generating single token) and generating 8192 tokens (with single token as context) under BF16 or Int4 quantization level, respectively. The results are shown below.
196
 
197
  | Quantization Level | Peak Usage for Encoding 2048 Tokens | Peak Usage for Generating 8192 Tokens |
198
+ |--------------------|:-----------------------------------:|:-------------------------------------:|
199
  | BF16 | 18.99GB | 24.40GB |
200
  | Int4 | 10.20GB | 15.61GB |
201
 
 
211
  The details of the model architecture of Qwen-7B-Chat are listed as follows:
212
 
213
  | Hyperparameter | Value |
214
+ |:----------------|:------:|
215
+ | n_layers | 32 |
216
+ | n_heads | 32 |
217
+ | d_model | 4096 |
218
  | vocab size | 151851 |
219
+ | sequence length | 8192 |
220
 
221
  在位置编码、FFN激活函数和normalization的实现方式上,我们也采用了目前最流行的做法,
222
  即RoPE相对位置编码、SwiGLU激活函数、RMSNorm(可选安装flash-attention加速)。
 
251
  We demonstrate the 0-shot & 5-shot accuracy of Qwen-7B-Chat on C-Eval validation set
252
 
253
  | Model | Avg. Acc. |
254
+ |:--------------------------------:|:---------:|
255
  | LLaMA2-7B-Chat | 31.9 |
256
  | LLaMA2-13B-Chat | 36.2 |
257
  | LLaMA2-70B-Chat | 44.3 |
 
293
  The performance of Qwen-7B-Chat still on the top between other human-aligned models with comparable size.
294
 
295
  | Model | Avg. Acc. |
296
+ |:--------------------------------:|:---------:|
297
  | ChatGLM2-6B-Chat | 46.0 |
298
  | LLaMA2-7B-Chat | 46.2 |
299
  | InternLM-7B-Chat | 51.1 |
 
313
 
314
  The zero-shot Pass@1 of Qwen-7B-Chat on [HumanEval](https://github.com/openai/human-eval) is demonstrated below
315
 
316
+ | Model | Pass@1 |
317
+ |:-----------------------:|:--------:|
318
+ | ChatGLM2-6B-Chat | 11.0 |
319
+ | LLaMA2-7B-Chat | 12.2 |
320
+ | Baichuan2-7B-Chat | 13.4 |
321
+ | InternLM-7B-Chat | 14.6 |
322
+ | Baichuan2-13B-Chat | 17.7 |
323
+ | LLaMA2-13B-Chat | 18.9 |
324
+ | LLaMA2-70B-Chat | 32.3 |
325
+ | Qwen-7B-Chat (original) | 24.4 |
326
+ | **Qwen-7B-Chat** | 37.2 |
327
+ | **Qwen-14B-Chat** | **43.9** |
328
 
329
  ### 数学评测(Mathematics Evaluation)
330
 
 
332
 
333
  The accuracy of Qwen-7B-Chat on GSM8K is shown below
334
 
335
+ | Model | Acc. |
336
+ |:------------------------------------:|:--------:|
337
+ | LLaMA2-7B-Chat | 26.3 |
338
+ | ChatGLM2-6B-Chat | 28.8 |
339
+ | Baichuan2-7B-Chat | 32.8 |
340
+ | InternLM-7B-Chat | 33.0 |
341
+ | LLaMA2-13B-Chat | 37.1 |
342
+ | Baichuan2-13B-Chat | 55.3 |
343
+ | LLaMA2-70B-Chat | 59.3 |
344
+ | **Qwen-7B-Chat (original) (0-shot)** | 41.1 |
345
+ | **Qwen-7B-Chat (0-shot)** | 50.3 |
346
+ | **Qwen-7B-Chat (8-shot)** | 54.1 |
347
+ | **Qwen-14B-Chat (0-shot)** | **60.1** |
348
+ | **Qwen-14B-Chat (8-shot)** | 59.3 |
349
 
350
  ### 长序列评测(Long-Context Understanding)
351
 
 
358
  **(To use these tricks, please set `use_dynamic_ntk` and `use_long_attn` to true in config.json.)**
359
 
360
  | Model | VCSUM (zh) |
361
+ |:------------------|:----------:|
362
  | GPT-3.5-Turbo-16k | 16.0 |
363
  | LLama2-7B-Chat | 0.2 |
364
  | InternLM-7B-Chat | 13.0 |