Unable to run with vLLM

#1
by yaronr - opened

Hi
I am getting the following error when running with latest vllm, docker.
Here are my runtime params:

    "command": "--port=8000 
                                --model=fsaudm/Meta-Llama-3.1-70B-Instruct-INT8
                                --tensor-parallel-size=4
                                --pipeline-parallel-size=1 
                                --disable-log-requests
                                --enable-chunked-prefill
                                --num-scheduler-steps=10
                                --enable-prefix-caching
                                --max-num-batched-tokens=16192
                                --max-model-len=16192
                --max-seq-len-to-capture=16192
                                --gpu-memory-utilization=0.95"

Here's the error:

(VllmWorkerProcess pid=148) ERROR 10-30 01:05:12 multiproc_worker_utils.py:229] Exception in worker VllmWorkerProcess while processing method load_model.
(VllmWorkerProcess pid=148) ERROR 10-30 01:05:12 multiproc_worker_utils.py:229] Traceback (most recent call last):
(VllmWorkerProcess pid=148) ERROR 10-30 01:05:12 multiproc_worker_utils.py:229]   File "/usr/local/lib/python3.12/dist-packages/vllm/executor/multiproc_worker_utils.py", line 223, in _run_worker_process
(VllmWorkerProcess pid=148) ERROR 10-30 01:05:12 multiproc_worker_utils.py:229]     output = executor(*args, **kwargs)
(VllmWorkerProcess pid=148) ERROR 10-30 01:05:12 multiproc_worker_utils.py:229]              ^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=148) ERROR 10-30 01:05:12 multiproc_worker_utils.py:229]   File "/usr/local/lib/python3.12/dist-packages/vllm/worker/worker.py", line 183, in load_model
(VllmWorkerProcess pid=148) ERROR 10-30 01:05:12 multiproc_worker_utils.py:229]     self.model_runner.load_model()
(VllmWorkerProcess pid=148) ERROR 10-30 01:05:12 multiproc_worker_utils.py:229]   File "/usr/local/lib/python3.12/dist-packages/vllm/worker/multi_step_model_runner.py", line 645, in load_model
(VllmWorkerProcess pid=148) ERROR 10-30 01:05:12 multiproc_worker_utils.py:229]     return self._base_model_runner.load_model()
(VllmWorkerProcess pid=148) ERROR 10-30 01:05:12 multiproc_worker_utils.py:229]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=148) ERROR 10-30 01:05:12 multiproc_worker_utils.py:229]   File "/usr/local/lib/python3.12/dist-packages/vllm/worker/model_runner.py", line 1058, in load_model
(VllmWorkerProcess pid=148) ERROR 10-30 01:05:12 multiproc_worker_utils.py:229]     self.model = get_model(model_config=self.model_config,
(VllmWorkerProcess pid=148) ERROR 10-30 01:05:12 multiproc_worker_utils.py:229]                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=148) ERROR 10-30 01:05:12 multiproc_worker_utils.py:229]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/__init__.py", line 19, in get_model
(VllmWorkerProcess pid=148) ERROR 10-30 01:05:12 multiproc_worker_utils.py:229]     return loader.load_model(model_config=model_config,
(VllmWorkerProcess pid=148) ERROR 10-30 01:05:12 multiproc_worker_utils.py:229]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=148) ERROR 10-30 01:05:12 multiproc_worker_utils.py:229]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/loader.py", line 402, in load_model
(VllmWorkerProcess pid=148) ERROR 10-30 01:05:12 multiproc_worker_utils.py:229]     model.load_weights(self._get_all_weights(model_config, model))
(VllmWorkerProcess pid=148) ERROR 10-30 01:05:12 multiproc_worker_utils.py:229]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/llama.py", line 582, in load_weights
(VllmWorkerProcess pid=148) ERROR 10-30 01:05:12 multiproc_worker_utils.py:229]     loader.load_weights(
(VllmWorkerProcess pid=148) ERROR 10-30 01:05:12 multiproc_worker_utils.py:229]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 203, in load_weights
(VllmWorkerProcess pid=148) ERROR 10-30 01:05:12 multiproc_worker_utils.py:229]     autoloaded_weights = list(self._load_module("", self.module, weights))
(VllmWorkerProcess pid=148) ERROR 10-30 01:05:12 multiproc_worker_utils.py:229]                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=148) ERROR 10-30 01:05:12 multiproc_worker_utils.py:229]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 182, in _load_module
(VllmWorkerProcess pid=148) ERROR 10-30 01:05:12 multiproc_worker_utils.py:229]     yield from self._load_module(prefix,
(VllmWorkerProcess pid=148) ERROR 10-30 01:05:12 multiproc_worker_utils.py:229]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 169, in _load_module
(VllmWorkerProcess pid=148) ERROR 10-30 01:05:12 multiproc_worker_utils.py:229]     module_load_weights(weights)
(VllmWorkerProcess pid=148) ERROR 10-30 01:05:12 multiproc_worker_utils.py:229]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/llama.py", line 414, in load_weights
(VllmWorkerProcess pid=148) ERROR 10-30 01:05:12 multiproc_worker_utils.py:229]     param = params_dict[name]
(VllmWorkerProcess pid=148) ERROR 10-30 01:05:12 multiproc_worker_utils.py:229]             ~~~~~~~~~~~^^^^^^
(VllmWorkerProcess pid=148) ERROR 10-30 01:05:12 multiproc_worker_utils.py:229] KeyError: 'layers.0.mlp.down_proj.SCB'

Sign up or log in to comment