ThomasBaruzier commited on
Commit
c7e725e
1 Parent(s): 8b8c96e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -10
README.md CHANGED
@@ -1,23 +1,23 @@
1
  ---
2
  license: apache-2.0
3
- license_link: https://huggingface.co/Qwen/Qwen2.5-14B-Instruct/blob/main/LICENSE
4
  language:
5
  - en
6
  pipeline_tag: text-generation
7
- base_model: Qwen/Qwen2.5-14B
8
  tags:
9
  - chat
10
  ---
11
 
12
  <hr>
13
 
14
- # Llama.cpp imatrix quantizations of Qwen/Qwen2.5-14B-Instruct
15
 
16
  <img src="https://cdn-uploads.huggingface.co/production/uploads/646410e04bf9122922289dc7/gDUbZOu1ND0j-th4Q6tep.jpeg" alt="qwen" width="60%"/>
17
 
18
  Using llama.cpp commit [eca0fab](https://github.com/ggerganov/llama.cpp/commit/eca0fab) for quantization.
19
 
20
- Original model: [Qwen/Qwen2.5-14B-Instruct](https://huggingface.co/Qwen/Qwen2.5-14B-Instruct)
21
 
22
  All quants were made using the imatrix option and Bartowski's [calibration file](https://gist.github.com/bartowski1182/eb213dccb3571f863da82e99418f81e8).
23
 
@@ -27,7 +27,7 @@ All quants were made using the imatrix option and Bartowski's [calibration file]
27
 
28
  <hr>
29
 
30
- # Qwen2.5-14B-Instruct
31
 
32
  ## Introduction
33
 
@@ -38,13 +38,13 @@ Qwen2.5 is the latest series of Qwen large language models. For Qwen2.5, we rele
38
  - **Long-context Support** up to 128K tokens and can generate up to 8K tokens.
39
  - **Multilingual support** for over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, Arabic, and more.
40
 
41
- **This repo contains the instruction-tuned 14B Qwen2.5 model**, which has the following features:
42
  - Type: Causal Language Models
43
  - Training Stage: Pretraining & Post-training
44
  - Architecture: transformers with RoPE, SwiGLU, RMSNorm, and Attention QKV bias
45
- - Number of Parameters: 14.7B
46
- - Number of Paramaters (Non-Embedding): 13.1B
47
- - Number of Layers: 48
48
  - Number of Attention Heads (GQA): 40 for Q and 8 for KV
49
  - Context Length: Full 131,072 tokens and generation 8192 tokens
50
  - Please refer to [this section](#processing-long-texts) for detailed instructions on how to deploy Qwen2.5 for handling long texts.
@@ -67,7 +67,7 @@ Here provides a code snippet with `apply_chat_template` to show you how to load
67
  ```python
68
  from transformers import AutoModelForCausalLM, AutoTokenizer
69
 
70
- model_name = "Qwen/Qwen2.5-14B-Instruct"
71
 
72
  model = AutoModelForCausalLM.from_pretrained(
73
  model_name,
 
1
  ---
2
  license: apache-2.0
3
+ license_link: https://huggingface.co/Qwen/Qwen2.5-32B-Instruct/blob/main/LICENSE
4
  language:
5
  - en
6
  pipeline_tag: text-generation
7
+ base_model: Qwen/Qwen2.5-32B
8
  tags:
9
  - chat
10
  ---
11
 
12
  <hr>
13
 
14
+ # Llama.cpp imatrix quantizations of Qwen/Qwen2.5-32B-Instruct
15
 
16
  <img src="https://cdn-uploads.huggingface.co/production/uploads/646410e04bf9122922289dc7/gDUbZOu1ND0j-th4Q6tep.jpeg" alt="qwen" width="60%"/>
17
 
18
  Using llama.cpp commit [eca0fab](https://github.com/ggerganov/llama.cpp/commit/eca0fab) for quantization.
19
 
20
+ Original model: [Qwen/Qwen2.5-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct)
21
 
22
  All quants were made using the imatrix option and Bartowski's [calibration file](https://gist.github.com/bartowski1182/eb213dccb3571f863da82e99418f81e8).
23
 
 
27
 
28
  <hr>
29
 
30
+ # Qwen2.5-32B-Instruct
31
 
32
  ## Introduction
33
 
 
38
  - **Long-context Support** up to 128K tokens and can generate up to 8K tokens.
39
  - **Multilingual support** for over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, Arabic, and more.
40
 
41
+ **This repo contains the instruction-tuned 32B Qwen2.5 model**, which has the following features:
42
  - Type: Causal Language Models
43
  - Training Stage: Pretraining & Post-training
44
  - Architecture: transformers with RoPE, SwiGLU, RMSNorm, and Attention QKV bias
45
+ - Number of Parameters: 32.5B
46
+ - Number of Paramaters (Non-Embedding): 31.0B
47
+ - Number of Layers: 64
48
  - Number of Attention Heads (GQA): 40 for Q and 8 for KV
49
  - Context Length: Full 131,072 tokens and generation 8192 tokens
50
  - Please refer to [this section](#processing-long-texts) for detailed instructions on how to deploy Qwen2.5 for handling long texts.
 
67
  ```python
68
  from transformers import AutoModelForCausalLM, AutoTokenizer
69
 
70
+ model_name = "Qwen/Qwen2.5-32B-Instruct"
71
 
72
  model = AutoModelForCausalLM.from_pretrained(
73
  model_name,