Unable to run with vLLM
#1
by
yaronr
- opened
Hi
I am getting the following error when running with latest vllm, docker.
Here are my runtime params:
"command": "--port=8000
--model=fsaudm/Meta-Llama-3.1-70B-Instruct-INT8
--tensor-parallel-size=4
--pipeline-parallel-size=1
--disable-log-requests
--enable-chunked-prefill
--num-scheduler-steps=10
--enable-prefix-caching
--max-num-batched-tokens=16192
--max-model-len=16192
--max-seq-len-to-capture=16192
--gpu-memory-utilization=0.95"
Here's the error:
(VllmWorkerProcess pid=148) ERROR 10-30 01:05:12 multiproc_worker_utils.py:229] Exception in worker VllmWorkerProcess while processing method load_model.
(VllmWorkerProcess pid=148) ERROR 10-30 01:05:12 multiproc_worker_utils.py:229] Traceback (most recent call last):
(VllmWorkerProcess pid=148) ERROR 10-30 01:05:12 multiproc_worker_utils.py:229] File "/usr/local/lib/python3.12/dist-packages/vllm/executor/multiproc_worker_utils.py", line 223, in _run_worker_process
(VllmWorkerProcess pid=148) ERROR 10-30 01:05:12 multiproc_worker_utils.py:229] output = executor(*args, **kwargs)
(VllmWorkerProcess pid=148) ERROR 10-30 01:05:12 multiproc_worker_utils.py:229] ^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=148) ERROR 10-30 01:05:12 multiproc_worker_utils.py:229] File "/usr/local/lib/python3.12/dist-packages/vllm/worker/worker.py", line 183, in load_model
(VllmWorkerProcess pid=148) ERROR 10-30 01:05:12 multiproc_worker_utils.py:229] self.model_runner.load_model()
(VllmWorkerProcess pid=148) ERROR 10-30 01:05:12 multiproc_worker_utils.py:229] File "/usr/local/lib/python3.12/dist-packages/vllm/worker/multi_step_model_runner.py", line 645, in load_model
(VllmWorkerProcess pid=148) ERROR 10-30 01:05:12 multiproc_worker_utils.py:229] return self._base_model_runner.load_model()
(VllmWorkerProcess pid=148) ERROR 10-30 01:05:12 multiproc_worker_utils.py:229] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=148) ERROR 10-30 01:05:12 multiproc_worker_utils.py:229] File "/usr/local/lib/python3.12/dist-packages/vllm/worker/model_runner.py", line 1058, in load_model
(VllmWorkerProcess pid=148) ERROR 10-30 01:05:12 multiproc_worker_utils.py:229] self.model = get_model(model_config=self.model_config,
(VllmWorkerProcess pid=148) ERROR 10-30 01:05:12 multiproc_worker_utils.py:229] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=148) ERROR 10-30 01:05:12 multiproc_worker_utils.py:229] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/__init__.py", line 19, in get_model
(VllmWorkerProcess pid=148) ERROR 10-30 01:05:12 multiproc_worker_utils.py:229] return loader.load_model(model_config=model_config,
(VllmWorkerProcess pid=148) ERROR 10-30 01:05:12 multiproc_worker_utils.py:229] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=148) ERROR 10-30 01:05:12 multiproc_worker_utils.py:229] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/loader.py", line 402, in load_model
(VllmWorkerProcess pid=148) ERROR 10-30 01:05:12 multiproc_worker_utils.py:229] model.load_weights(self._get_all_weights(model_config, model))
(VllmWorkerProcess pid=148) ERROR 10-30 01:05:12 multiproc_worker_utils.py:229] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/llama.py", line 582, in load_weights
(VllmWorkerProcess pid=148) ERROR 10-30 01:05:12 multiproc_worker_utils.py:229] loader.load_weights(
(VllmWorkerProcess pid=148) ERROR 10-30 01:05:12 multiproc_worker_utils.py:229] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 203, in load_weights
(VllmWorkerProcess pid=148) ERROR 10-30 01:05:12 multiproc_worker_utils.py:229] autoloaded_weights = list(self._load_module("", self.module, weights))
(VllmWorkerProcess pid=148) ERROR 10-30 01:05:12 multiproc_worker_utils.py:229] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=148) ERROR 10-30 01:05:12 multiproc_worker_utils.py:229] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 182, in _load_module
(VllmWorkerProcess pid=148) ERROR 10-30 01:05:12 multiproc_worker_utils.py:229] yield from self._load_module(prefix,
(VllmWorkerProcess pid=148) ERROR 10-30 01:05:12 multiproc_worker_utils.py:229] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 169, in _load_module
(VllmWorkerProcess pid=148) ERROR 10-30 01:05:12 multiproc_worker_utils.py:229] module_load_weights(weights)
(VllmWorkerProcess pid=148) ERROR 10-30 01:05:12 multiproc_worker_utils.py:229] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/llama.py", line 414, in load_weights
(VllmWorkerProcess pid=148) ERROR 10-30 01:05:12 multiproc_worker_utils.py:229] param = params_dict[name]
(VllmWorkerProcess pid=148) ERROR 10-30 01:05:12 multiproc_worker_utils.py:229] ~~~~~~~~~~~^^^^^^
(VllmWorkerProcess pid=148) ERROR 10-30 01:05:12 multiproc_worker_utils.py:229] KeyError: 'layers.0.mlp.down_proj.SCB'