feihu.hf
commited on
Commit
•
afb2829
1
Parent(s):
2d46b72
update README & LICENSE
Browse files
README.md
CHANGED
@@ -1,11 +1,12 @@
|
|
1 |
---
|
|
|
|
|
2 |
language:
|
3 |
- en
|
4 |
pipeline_tag: text-generation
|
5 |
base_model: Qwen/Qwen2.5-32B
|
6 |
tags:
|
7 |
- chat
|
8 |
-
license: apache-2.0
|
9 |
---
|
10 |
|
11 |
# Qwen2.5-32B-Instruct
|
@@ -59,7 +60,7 @@ tokenizer = AutoTokenizer.from_pretrained(model_name)
|
|
59 |
|
60 |
prompt = "Give me a short introduction to large language model."
|
61 |
messages = [
|
62 |
-
{"role": "system", "content": "You are a helpful assistant."},
|
63 |
{"role": "user", "content": prompt}
|
64 |
]
|
65 |
text = tokenizer.apply_chat_template(
|
@@ -82,11 +83,25 @@ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
|
|
82 |
|
83 |
### Processing Long Texts
|
84 |
|
85 |
-
|
86 |
-
|
87 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
88 |
|
89 |
-
|
|
|
|
|
|
|
90 |
|
91 |
## Evaluation & Performance
|
92 |
|
|
|
1 |
---
|
2 |
+
license: apache-2.0
|
3 |
+
license_link: https://huggingface.co/Qwen/Qwen2.5-32B-Instruct/blob/main/LICENSE
|
4 |
language:
|
5 |
- en
|
6 |
pipeline_tag: text-generation
|
7 |
base_model: Qwen/Qwen2.5-32B
|
8 |
tags:
|
9 |
- chat
|
|
|
10 |
---
|
11 |
|
12 |
# Qwen2.5-32B-Instruct
|
|
|
60 |
|
61 |
prompt = "Give me a short introduction to large language model."
|
62 |
messages = [
|
63 |
+
{"role": "system", "content": "You are Qwen, created by Alibaba Cloud. You are a helpful assistant."},
|
64 |
{"role": "user", "content": prompt}
|
65 |
]
|
66 |
text = tokenizer.apply_chat_template(
|
|
|
83 |
|
84 |
### Processing Long Texts
|
85 |
|
86 |
+
The current `config.json` is set for context length up to 32,768 tokens.
|
87 |
+
To handle extensive inputs exceeding 32,768 tokens, we utilize [YaRN](https://arxiv.org/abs/2309.00071), a technique for enhancing model length extrapolation, ensuring optimal performance on lengthy texts.
|
88 |
+
|
89 |
+
For supported frameworks, you could add the following to `config.json` to enable YaRN:
|
90 |
+
```json
|
91 |
+
{
|
92 |
+
...,
|
93 |
+
"rope_scaling": {
|
94 |
+
"factor": 4.0,
|
95 |
+
"original_max_position_embeddings": 32768,
|
96 |
+
"type": "yarn"
|
97 |
+
}
|
98 |
+
}
|
99 |
+
```
|
100 |
|
101 |
+
For deployment, we recommend using vLLM.
|
102 |
+
Please refer to our [Documentation](https://qwen.readthedocs.io/en/latest/deployment/vllm.html) for usage if you are not familar with vLLM.
|
103 |
+
Presently, vLLM only supports static YARN, which means the scaling factor remains constant regardless of input length, **potentially impacting performance on shorter texts**.
|
104 |
+
We advise adding the `rope_scaling` configuration only when processing long contexts is required.
|
105 |
|
106 |
## Evaluation & Performance
|
107 |
|