Running model with vllm docker image.
Anybody successfully ran this model with vllm docker image?
(VllmWorkerProcess pid=356) ERROR 02-13 00:15:14 multiproc_worker_utils.py:229] raise ValueError("No available memory for the cache blocks. "
(VllmWorkerProcess pid=356) ERROR 02-13 00:15:14 multiproc_worker_utils.py:229] ValueError: No available memory for the cache blocks. Try increasing gpu_memory_utilization
when initializing the engine.
ERROR 02-13 00:15:14 engine.py:366] No available memory for the cache blocks. Try increasing gpu_memory_utilization
when initializing the engine.
ERROR 02-13 00:15:14 engine.py:366] Traceback (most recent call last):
ERROR 02-13 00:15:14 engine.py:366] File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 357, in run_mp_engine
ERROR 02-13 00:15:14 engine.py:366] engine = MQLLMEngine.from_engine_args(engine_args=engine_args,
ERROR 02-13 00:15:14 engine.py:366] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 02-13 00:15:14 engine.py:366] File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 119, in from_engine_args
ERROR 02-13 00:15:14 engine.py:366] return cls(ipc_path=ipc_path,
ERROR 02-13 00:15:14 engine.py:366] ^^^^^^^^^^^^^^^^^^^^^^
ERROR 02-13 00:15:14 engine.py:366] File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 71, in init
ERROR 02-13 00:15:14 engine.py:366] self.engine = LLMEngine(*args, **kwargs)
ERROR 02-13 00:15:14 engine.py:366] ^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 02-13 00:15:14 engine.py:366] File "/usr/local/lib/python3.12/dist-packages/vllm/engine/llm_engine.py", line 350, in init
ERROR 02-13 00:15:14 engine.py:366] self._initialize_kv_caches()
ERROR 02-13 00:15:14 engine.py:366] File "/usr/local/lib/python3.12/dist-packages/vllm/engine/llm_engine.py", line 500, in _initialize_kv_caches
ERROR 02-13 00:15:14 engine.py:366] self.model_executor.initialize_cache(num_gpu_blocks, num_cpu_blocks)
ERROR 02-13 00:15:14 engine.py:366] File "/usr/local/lib/python3.12/dist-packages/vllm/executor/distributed_gpu_executor.py", line 67, in initialize_cache
ERROR 02-13 00:15:14 engine.py:366] self._run_workers("initialize_cache",
ERROR 02-13 00:15:14 engine.py:366] File "/usr/local/lib/python3.12/dist-packages/vllm/executor/multiproc_gpu_executor.py", line 195, in _run_workers
ERROR 02-13 00:15:14 engine.py:366] driver_worker_output = driver_worker_method(*args, **kwargs)
ERROR 02-13 00:15:14 engine.py:366] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 02-13 00:15:14 engine.py:366] File "/usr/local/lib/python3.12/dist-packages/vllm/worker/worker.py", line 268, in initialize_cache
ERROR 02-13 00:15:14 engine.py:366] raise_if_cache_size_invalid(num_gpu_blocks,
ERROR 02-13 00:15:14 engine.py:366] File "/usr/local/lib/python3.12/dist-packages/vllm/worker/worker.py", line 493, in raise_if_cache_size_invalid
ERROR 02-13 00:15:14 engine.py:366] raise ValueError("No available memory for the cache blocks. "
ERROR 02-13 00:15:14 engine.py:366] ValueError: No available memory for the cache blocks. Try increasing gpu_memory_utilization
when initializing the engine.
(VllmWorkerProcess pid=354) ERROR 02-13 00:15:14 multiproc_worker_utils.py:229] Exception in worker VllmWorkerProcess while processing method initialize_cache.
(VllmWorkerProcess pid=354) ERROR 02-13 00:15:14 multiproc_worker_utils.py:229] Traceback (most recent call last):
(VllmWorkerProcess pid=354) ERROR 02-13 00:15:14 multiproc_worker_utils.py:229] File "/usr/local/lib/python3.12/dist-packages/vllm/executor/multiproc_worker_utils.py", line 223, in _run_worker_process
No luck with increasing gpu_memory_utilization
Model Args
args:
- "--model"
- "mistralai/Pixtral-Large-Instruct-2411"
- "--tokenizer-mode"
- "mistral"
- "--max-model-len"
- "16384"
- "--tensor_parallel_size"
- "8"
- "--port"
- "9105"
- "--enforce-eager"
- "--gpu_memory_utilization"
- "0.9"