ChengjieLi
commited on
Commit
•
19f87d0
1
Parent(s):
4663c85
Upload folder using huggingface_hub
Browse files- README.md +52 -0
- configuration.json +11 -0
- qwen.tiktoken +0 -0
- qwen7b-ggml-q4_0.bin +3 -0
README.md
ADDED
@@ -0,0 +1,52 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: apache-2.0
|
3 |
+
---
|
4 |
+
|
5 |
+
## qwen-chat-7B-ggml
|
6 |
+
|
7 |
+
This repo contains GGML format model files for qwen-chat-7B.
|
8 |
+
|
9 |
+
### Example code
|
10 |
+
|
11 |
+
#### Install packages
|
12 |
+
```bash
|
13 |
+
pip install xinference[ggml]>=0.4.3
|
14 |
+
pip install qwen-cpp
|
15 |
+
```
|
16 |
+
If you want to run with GPU acceleration, refer to [installation](https://github.com/xorbitsai/inference#installation).
|
17 |
+
|
18 |
+
#### Start a local instance of Xinference
|
19 |
+
```bash
|
20 |
+
xinference -p 9997
|
21 |
+
```
|
22 |
+
|
23 |
+
#### Launch and inference
|
24 |
+
```python
|
25 |
+
from xinference.client import Client
|
26 |
+
|
27 |
+
client = Client("http://localhost:9997")
|
28 |
+
model_uid = client.launch_model(
|
29 |
+
model_name="qwen-chat",
|
30 |
+
model_format="ggmlv3",
|
31 |
+
model_size_in_billions=7,
|
32 |
+
quantization="q4_0",
|
33 |
+
)
|
34 |
+
model = client.get_model(model_uid)
|
35 |
+
|
36 |
+
chat_history = []
|
37 |
+
prompt = "最大的动物是什么?"
|
38 |
+
model.chat(
|
39 |
+
prompt,
|
40 |
+
chat_history,
|
41 |
+
generate_config={"max_tokens": 1024}
|
42 |
+
)
|
43 |
+
```
|
44 |
+
|
45 |
+
### More information
|
46 |
+
|
47 |
+
[Xinference](https://github.com/xorbitsai/inference) Replace OpenAI GPT with another LLM in your app
|
48 |
+
by changing a single line of code. Xinference gives you the freedom to use any LLM you need.
|
49 |
+
With Xinference, you are empowered to run inference with any open-source language models,
|
50 |
+
speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.
|
51 |
+
|
52 |
+
<i><a href="https://join.slack.com/t/xorbitsio/shared_invite/zt-1z3zsm9ep-87yI9YZ_B79HLB2ccTq4WA">👉 Join our Slack community!</a></i>
|
configuration.json
ADDED
@@ -0,0 +1,11 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"framework": "xinference",
|
3 |
+
"task": "code",
|
4 |
+
"model": {
|
5 |
+
"type": "qwen-chat"
|
6 |
+
},
|
7 |
+
"allow_remote": true,
|
8 |
+
"pipeline": {
|
9 |
+
"type": "text-generation-chat-pipeline"
|
10 |
+
}
|
11 |
+
}
|
qwen.tiktoken
ADDED
The diff for this file is too large to render.
See raw diff
|
|
qwen7b-ggml-q4_0.bin
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:ec3a608c9e60c1ee49d3b5a7102857d92edb87829bd4f1a31158c6e085682227
|
3 |
+
size 4345527328
|