Error while trying to run with latest vllm

#2
by DrRos - opened

with nightly build of vllm (vllm==0.10.1.dev466+ge5ebeeba5) got this error: MoeWNA16Method.get_weight_loader.<locals>.moe_wna16_weight_loader() got an unexpected keyword argument 'return_success' while trying to serve model (serving with vllm serve /mnt/nfs-share/LLM/ --host 0.0.0.0 --port 30000 --tensor-parallel-size 2 --enable-expert-parallel --served-model-name Qwen3-30B --enable-auto-tool-choice --tool-call-parser qwen3_coder --max-model-len 131072 --gpu-memory-utilization 0.8 --dtype float16

Sign up or log in to comment