catid commited on
Commit
6821a88
2 Parent(s): 77e13fd 9748863

Merge branch 'main' of hf.co:catid/cat-llama-3-70b-awq-q128-w4-gemm

Browse files
Files changed (1) hide show
  1. README.md +5 -3
README.md CHANGED
@@ -10,7 +10,9 @@ git clone https://huggingface.co/catid/cat-llama-3-70b-awq-q128-w4-gemm
10
 
11
  conda create -n vllm70 python=3.10 -y && conda activate vllm70
12
 
13
- pip install git+https://github.com/vllm-project/vllm.git
14
 
15
- python -m vllm.entrypoints.openai.api_server --model cat-llama-3-70b-awq-q128-w4-gemm --tensor-parallel-size 2 --gpu-memory-utilization 0.95
16
- ```
 
 
 
10
 
11
  conda create -n vllm70 python=3.10 -y && conda activate vllm70
12
 
13
+ pip install -U git+https://github.com/vllm-project/vllm.git
14
 
15
+ python -m vllm.entrypoints.openai.api_server --model cat-llama-3-70b-awq-q128-w4-gemm --tensor-parallel-size 2 --gpu-memory-utilization 0.935
16
+ ```
17
+
18
+ Sadly this *barely* doesn't fit by ~300MB or so.