jinxuewen commited on
Commit
64a6578
1 Parent(s): f447393

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -13
README.md CHANGED
@@ -15,27 +15,26 @@ pip3 install fschat
15
 
16
  <a href="https://chat.lmsys.org"><img src="assets/screenshot_cli.png" width="70%"></a>
17
 
 
 
18
  #### Single GPU
19
  The command below requires around 28GB of GPU memory for Vicuna-13B and 14GB of GPU memory for Vicuna-7B.
20
  See the "No Enough Memory" section below if you do not have enough memory.
21
  ```
22
  python3 -m fastchat.serve.cli --model-path /path/to/vicuna/weights
23
  ```
24
- When use huggingface, the `/path/to/vicuna/weights` is `jinxuewen/vicuna-13b`
25
 
26
  #### Multiple GPUs
27
  You can use model parallelism to aggregate GPU memory from multiple GPUs on the same machine.
28
  ```
29
  python3 -m fastchat.serve.cli --model-path /path/to/vicuna/weights --num-gpus 2
30
  ```
31
- When use huggingface, the `/path/to/vicuna/weights` is `jinxuewen/vicuna-13b`
32
 
33
  #### CPU Only
34
  This runs on the CPU only and does not require GPU. It requires around 60GB of CPU memory for Vicuna-13B and around 30GB of CPU memory for Vicuna-7B.
35
  ```
36
  python3 -m fastchat.serve.cli --model-path /path/to/vicuna/weights --device cpu
37
  ```
38
- When use huggingface, the `/path/to/vicuna/weights` is `jinxuewen/vicuna-13b`
39
 
40
  #### Metal Backend (Mac Computers with Apple Silicon or AMD GPUs)
41
  Use `--device mps` to enable GPU acceleration on Mac computers (requires torch >= 2.0).
@@ -43,11 +42,9 @@ Use `--load-8bit` to turn on 8-bit compression.
43
  ```
44
  python3 -m fastchat.serve.cli --model-path /path/to/vicuna/weights --device mps --load-8bit
45
  ```
46
- When use huggingface, the `/path/to/vicuna/weights` is `jinxuewen/vicuna-13b`
47
 
48
  Vicuna-7B can run on a 32GB M1 Macbook with 1 - 2 words / second.
49
 
50
-
51
  #### No Enough Memory or Other Platforms
52
  If you do not have enough memory, you can enable 8-bit compression by adding `--load-8bit` to commands above.
53
  This can reduce memory usage by around half with slightly degraded model quality.
@@ -57,8 +54,6 @@ Vicuna-13B with 8-bit compression can run on a single NVIDIA 3090/4080/V100(16GB
57
  ```
58
  python3 -m fastchat.serve.cli --model-path /path/to/vicuna/weights --load-8bit
59
  ```
60
- When use huggingface, the `/path/to/vicuna/weights` is `jinxuewen/vicuna-13b`
61
-
62
  Besides, we are actively exploring more methods to make the model easier to run on more platforms.
63
  Contributions and pull requests are welcome.
64
 
@@ -72,32 +67,26 @@ To serve using the web UI, you need three main components: web servers that inte
72
  ```bash
73
  python3 -m fastchat.serve.controller
74
  ```
75
-
76
  This controller manages the distributed workers.
77
 
78
  #### Launch the model worker
79
  ```bash
80
  python3 -m fastchat.serve.model_worker --model-path /path/to/vicuna/weights
81
  ```
82
- When use huggingface, the `/path/to/vicuna/weights` is `jinxuewen/vicuna-13b`
83
-
84
  Wait until the process finishes loading the model and you see "Uvicorn running on ...". You can launch multiple model workers to serve multiple models concurrently. The model worker will connect to the controller automatically.
85
 
86
  To ensure that your model worker is connected to your controller properly, send a test message using the following command:
87
  ```bash
88
  python3 -m fastchat.serve.test_message --model-name vicuna-13b
89
  ```
90
-
91
  #### Launch the Gradio web server
92
  ```bash
93
  python3 -m fastchat.serve.gradio_web_server
94
  ```
95
-
96
  This is the user interface that users will interact with.
97
 
98
  By following these steps, you will be able to serve your models using the web UI. You can open your browser and chat with a model now.
99
 
100
-
101
  ## API
102
 
103
  ### Huggingface Generation APIs
 
15
 
16
  <a href="https://chat.lmsys.org"><img src="assets/screenshot_cli.png" width="70%"></a>
17
 
18
+ When use huggingface, the </path/to/vicuna/weights> is "jinxuewen/vicuna-13b"
19
+
20
  #### Single GPU
21
  The command below requires around 28GB of GPU memory for Vicuna-13B and 14GB of GPU memory for Vicuna-7B.
22
  See the "No Enough Memory" section below if you do not have enough memory.
23
  ```
24
  python3 -m fastchat.serve.cli --model-path /path/to/vicuna/weights
25
  ```
 
26
 
27
  #### Multiple GPUs
28
  You can use model parallelism to aggregate GPU memory from multiple GPUs on the same machine.
29
  ```
30
  python3 -m fastchat.serve.cli --model-path /path/to/vicuna/weights --num-gpus 2
31
  ```
 
32
 
33
  #### CPU Only
34
  This runs on the CPU only and does not require GPU. It requires around 60GB of CPU memory for Vicuna-13B and around 30GB of CPU memory for Vicuna-7B.
35
  ```
36
  python3 -m fastchat.serve.cli --model-path /path/to/vicuna/weights --device cpu
37
  ```
 
38
 
39
  #### Metal Backend (Mac Computers with Apple Silicon or AMD GPUs)
40
  Use `--device mps` to enable GPU acceleration on Mac computers (requires torch >= 2.0).
 
42
  ```
43
  python3 -m fastchat.serve.cli --model-path /path/to/vicuna/weights --device mps --load-8bit
44
  ```
 
45
 
46
  Vicuna-7B can run on a 32GB M1 Macbook with 1 - 2 words / second.
47
 
 
48
  #### No Enough Memory or Other Platforms
49
  If you do not have enough memory, you can enable 8-bit compression by adding `--load-8bit` to commands above.
50
  This can reduce memory usage by around half with slightly degraded model quality.
 
54
  ```
55
  python3 -m fastchat.serve.cli --model-path /path/to/vicuna/weights --load-8bit
56
  ```
 
 
57
  Besides, we are actively exploring more methods to make the model easier to run on more platforms.
58
  Contributions and pull requests are welcome.
59
 
 
67
  ```bash
68
  python3 -m fastchat.serve.controller
69
  ```
 
70
  This controller manages the distributed workers.
71
 
72
  #### Launch the model worker
73
  ```bash
74
  python3 -m fastchat.serve.model_worker --model-path /path/to/vicuna/weights
75
  ```
 
 
76
  Wait until the process finishes loading the model and you see "Uvicorn running on ...". You can launch multiple model workers to serve multiple models concurrently. The model worker will connect to the controller automatically.
77
 
78
  To ensure that your model worker is connected to your controller properly, send a test message using the following command:
79
  ```bash
80
  python3 -m fastchat.serve.test_message --model-name vicuna-13b
81
  ```
 
82
  #### Launch the Gradio web server
83
  ```bash
84
  python3 -m fastchat.serve.gradio_web_server
85
  ```
 
86
  This is the user interface that users will interact with.
87
 
88
  By following these steps, you will be able to serve your models using the web UI. You can open your browser and chat with a model now.
89
 
 
90
  ## API
91
 
92
  ### Huggingface Generation APIs