ThomasBaruzier
/

Qwen2.5-32B-Instruct-GGUF

@@ -1,23 +1,23 @@
 ---
 license: apache-2.0
-license_link: https://huggingface.co/Qwen/Qwen2.5-14B-Instruct/blob/main/LICENSE
 language:
 - en
 pipeline_tag: text-generation
-base_model: Qwen/Qwen2.5-14B
 tags:
 - chat
 ---
 <hr>
-# Llama.cpp imatrix quantizations of Qwen/Qwen2.5-14B-Instruct
 <img src="https://cdn-uploads.huggingface.co/production/uploads/646410e04bf9122922289dc7/gDUbZOu1ND0j-th4Q6tep.jpeg" alt="qwen" width="60%"/>
 Using llama.cpp commit [eca0fab](https://github.com/ggerganov/llama.cpp/commit/eca0fab) for quantization.
-Original model: [Qwen/Qwen2.5-14B-Instruct](https://huggingface.co/Qwen/Qwen2.5-14B-Instruct)
 All quants were made using the imatrix option and Bartowski's [calibration file](https://gist.github.com/bartowski1182/eb213dccb3571f863da82e99418f81e8).
@@ -27,7 +27,7 @@ All quants were made using the imatrix option and Bartowski's [calibration file]
 <hr>
-# Qwen2.5-14B-Instruct
 ## Introduction
@@ -38,13 +38,13 @@ Qwen2.5 is the latest series of Qwen large language models. For Qwen2.5, we rele
 - **Long-context Support** up to 128K tokens and can generate up to 8K tokens.
 - **Multilingual support** for over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, Arabic, and more.
-**This repo contains the instruction-tuned 14B Qwen2.5 model**, which has the following features:
 - Type: Causal Language Models
 - Training Stage: Pretraining & Post-training
 - Architecture: transformers with RoPE, SwiGLU, RMSNorm, and Attention QKV bias
-- Number of Parameters: 14.7B
-- Number of Paramaters (Non-Embedding): 13.1B
-- Number of Layers: 48
 - Number of Attention Heads (GQA): 40 for Q and 8 for KV
 - Context Length: Full 131,072 tokens and generation 8192 tokens
   - Please refer to [this section](#processing-long-texts) for detailed instructions on how to deploy Qwen2.5 for handling long texts.
@@ -67,7 +67,7 @@ Here provides a code snippet with `apply_chat_template` to show you how to load
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer
-model_name = "Qwen/Qwen2.5-14B-Instruct"
 model = AutoModelForCausalLM.from_pretrained(
     model_name,

 ---
 license: apache-2.0
+license_link: https://huggingface.co/Qwen/Qwen2.5-32B-Instruct/blob/main/LICENSE
 language:
 - en
 pipeline_tag: text-generation
+base_model: Qwen/Qwen2.5-32B
 tags:
 - chat
 ---
 <hr>
+# Llama.cpp imatrix quantizations of Qwen/Qwen2.5-32B-Instruct
 <img src="https://cdn-uploads.huggingface.co/production/uploads/646410e04bf9122922289dc7/gDUbZOu1ND0j-th4Q6tep.jpeg" alt="qwen" width="60%"/>
 Using llama.cpp commit [eca0fab](https://github.com/ggerganov/llama.cpp/commit/eca0fab) for quantization.
+Original model: [Qwen/Qwen2.5-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct)
 All quants were made using the imatrix option and Bartowski's [calibration file](https://gist.github.com/bartowski1182/eb213dccb3571f863da82e99418f81e8).
 <hr>
+# Qwen2.5-32B-Instruct
 ## Introduction
 - **Long-context Support** up to 128K tokens and can generate up to 8K tokens.
 - **Multilingual support** for over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, Arabic, and more.
+**This repo contains the instruction-tuned 32B Qwen2.5 model**, which has the following features:
 - Type: Causal Language Models
 - Training Stage: Pretraining & Post-training
 - Architecture: transformers with RoPE, SwiGLU, RMSNorm, and Attention QKV bias
+- Number of Parameters: 32.5B
+- Number of Paramaters (Non-Embedding): 31.0B
+- Number of Layers: 64
 - Number of Attention Heads (GQA): 40 for Q and 8 for KV
 - Context Length: Full 131,072 tokens and generation 8192 tokens
   - Please refer to [this section](#processing-long-texts) for detailed instructions on how to deploy Qwen2.5 for handling long texts.
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer
+model_name = "Qwen/Qwen2.5-32B-Instruct"
 model = AutoModelForCausalLM.from_pretrained(
     model_name,