ThomasBaruzier
commited on
Commit
•
c7e725e
1
Parent(s):
8b8c96e
Update README.md
Browse files
README.md
CHANGED
@@ -1,23 +1,23 @@
|
|
1 |
---
|
2 |
license: apache-2.0
|
3 |
-
license_link: https://huggingface.co/Qwen/Qwen2.5-
|
4 |
language:
|
5 |
- en
|
6 |
pipeline_tag: text-generation
|
7 |
-
base_model: Qwen/Qwen2.5-
|
8 |
tags:
|
9 |
- chat
|
10 |
---
|
11 |
|
12 |
<hr>
|
13 |
|
14 |
-
# Llama.cpp imatrix quantizations of Qwen/Qwen2.5-
|
15 |
|
16 |
<img src="https://cdn-uploads.huggingface.co/production/uploads/646410e04bf9122922289dc7/gDUbZOu1ND0j-th4Q6tep.jpeg" alt="qwen" width="60%"/>
|
17 |
|
18 |
Using llama.cpp commit [eca0fab](https://github.com/ggerganov/llama.cpp/commit/eca0fab) for quantization.
|
19 |
|
20 |
-
Original model: [Qwen/Qwen2.5-
|
21 |
|
22 |
All quants were made using the imatrix option and Bartowski's [calibration file](https://gist.github.com/bartowski1182/eb213dccb3571f863da82e99418f81e8).
|
23 |
|
@@ -27,7 +27,7 @@ All quants were made using the imatrix option and Bartowski's [calibration file]
|
|
27 |
|
28 |
<hr>
|
29 |
|
30 |
-
# Qwen2.5-
|
31 |
|
32 |
## Introduction
|
33 |
|
@@ -38,13 +38,13 @@ Qwen2.5 is the latest series of Qwen large language models. For Qwen2.5, we rele
|
|
38 |
- **Long-context Support** up to 128K tokens and can generate up to 8K tokens.
|
39 |
- **Multilingual support** for over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, Arabic, and more.
|
40 |
|
41 |
-
**This repo contains the instruction-tuned
|
42 |
- Type: Causal Language Models
|
43 |
- Training Stage: Pretraining & Post-training
|
44 |
- Architecture: transformers with RoPE, SwiGLU, RMSNorm, and Attention QKV bias
|
45 |
-
- Number of Parameters:
|
46 |
-
- Number of Paramaters (Non-Embedding):
|
47 |
-
- Number of Layers:
|
48 |
- Number of Attention Heads (GQA): 40 for Q and 8 for KV
|
49 |
- Context Length: Full 131,072 tokens and generation 8192 tokens
|
50 |
- Please refer to [this section](#processing-long-texts) for detailed instructions on how to deploy Qwen2.5 for handling long texts.
|
@@ -67,7 +67,7 @@ Here provides a code snippet with `apply_chat_template` to show you how to load
|
|
67 |
```python
|
68 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
69 |
|
70 |
-
model_name = "Qwen/Qwen2.5-
|
71 |
|
72 |
model = AutoModelForCausalLM.from_pretrained(
|
73 |
model_name,
|
|
|
1 |
---
|
2 |
license: apache-2.0
|
3 |
+
license_link: https://huggingface.co/Qwen/Qwen2.5-32B-Instruct/blob/main/LICENSE
|
4 |
language:
|
5 |
- en
|
6 |
pipeline_tag: text-generation
|
7 |
+
base_model: Qwen/Qwen2.5-32B
|
8 |
tags:
|
9 |
- chat
|
10 |
---
|
11 |
|
12 |
<hr>
|
13 |
|
14 |
+
# Llama.cpp imatrix quantizations of Qwen/Qwen2.5-32B-Instruct
|
15 |
|
16 |
<img src="https://cdn-uploads.huggingface.co/production/uploads/646410e04bf9122922289dc7/gDUbZOu1ND0j-th4Q6tep.jpeg" alt="qwen" width="60%"/>
|
17 |
|
18 |
Using llama.cpp commit [eca0fab](https://github.com/ggerganov/llama.cpp/commit/eca0fab) for quantization.
|
19 |
|
20 |
+
Original model: [Qwen/Qwen2.5-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct)
|
21 |
|
22 |
All quants were made using the imatrix option and Bartowski's [calibration file](https://gist.github.com/bartowski1182/eb213dccb3571f863da82e99418f81e8).
|
23 |
|
|
|
27 |
|
28 |
<hr>
|
29 |
|
30 |
+
# Qwen2.5-32B-Instruct
|
31 |
|
32 |
## Introduction
|
33 |
|
|
|
38 |
- **Long-context Support** up to 128K tokens and can generate up to 8K tokens.
|
39 |
- **Multilingual support** for over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, Arabic, and more.
|
40 |
|
41 |
+
**This repo contains the instruction-tuned 32B Qwen2.5 model**, which has the following features:
|
42 |
- Type: Causal Language Models
|
43 |
- Training Stage: Pretraining & Post-training
|
44 |
- Architecture: transformers with RoPE, SwiGLU, RMSNorm, and Attention QKV bias
|
45 |
+
- Number of Parameters: 32.5B
|
46 |
+
- Number of Paramaters (Non-Embedding): 31.0B
|
47 |
+
- Number of Layers: 64
|
48 |
- Number of Attention Heads (GQA): 40 for Q and 8 for KV
|
49 |
- Context Length: Full 131,072 tokens and generation 8192 tokens
|
50 |
- Please refer to [this section](#processing-long-texts) for detailed instructions on how to deploy Qwen2.5 for handling long texts.
|
|
|
67 |
```python
|
68 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
69 |
|
70 |
+
model_name = "Qwen/Qwen2.5-32B-Instruct"
|
71 |
|
72 |
model = AutoModelForCausalLM.from_pretrained(
|
73 |
model_name,
|