ChengjieLi commited on
Commit
19f87d0
1 Parent(s): 4663c85

Upload folder using huggingface_hub

Browse files
Files changed (4) hide show
  1. README.md +52 -0
  2. configuration.json +11 -0
  3. qwen.tiktoken +0 -0
  4. qwen7b-ggml-q4_0.bin +3 -0
README.md ADDED
@@ -0,0 +1,52 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ ---
4
+
5
+ ## qwen-chat-7B-ggml
6
+
7
+ This repo contains GGML format model files for qwen-chat-7B.
8
+
9
+ ### Example code
10
+
11
+ #### Install packages
12
+ ```bash
13
+ pip install xinference[ggml]>=0.4.3
14
+ pip install qwen-cpp
15
+ ```
16
+ If you want to run with GPU acceleration, refer to [installation](https://github.com/xorbitsai/inference#installation).
17
+
18
+ #### Start a local instance of Xinference
19
+ ```bash
20
+ xinference -p 9997
21
+ ```
22
+
23
+ #### Launch and inference
24
+ ```python
25
+ from xinference.client import Client
26
+
27
+ client = Client("http://localhost:9997")
28
+ model_uid = client.launch_model(
29
+ model_name="qwen-chat",
30
+ model_format="ggmlv3",
31
+ model_size_in_billions=7,
32
+ quantization="q4_0",
33
+ )
34
+ model = client.get_model(model_uid)
35
+
36
+ chat_history = []
37
+ prompt = "最大的动物是什么?"
38
+ model.chat(
39
+ prompt,
40
+ chat_history,
41
+ generate_config={"max_tokens": 1024}
42
+ )
43
+ ```
44
+
45
+ ### More information
46
+
47
+ [Xinference](https://github.com/xorbitsai/inference) Replace OpenAI GPT with another LLM in your app
48
+ by changing a single line of code. Xinference gives you the freedom to use any LLM you need.
49
+ With Xinference, you are empowered to run inference with any open-source language models,
50
+ speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.
51
+
52
+ <i><a href="https://join.slack.com/t/xorbitsio/shared_invite/zt-1z3zsm9ep-87yI9YZ_B79HLB2ccTq4WA">👉 Join our Slack community!</a></i>
configuration.json ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "framework": "xinference",
3
+ "task": "code",
4
+ "model": {
5
+ "type": "qwen-chat"
6
+ },
7
+ "allow_remote": true,
8
+ "pipeline": {
9
+ "type": "text-generation-chat-pipeline"
10
+ }
11
+ }
qwen.tiktoken ADDED
The diff for this file is too large to render. See raw diff
 
qwen7b-ggml-q4_0.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ec3a608c9e60c1ee49d3b5a7102857d92edb87829bd4f1a31158c6e085682227
3
+ size 4345527328