What should the tool-call-parser be set to when running the model based on vllm?
I tested and found that it does not match the common tool call parsing formats
Hi, you can try setting --tool-call-parser qwen3_xml. This should work for the tool call parsing format used by this model.
Example:
vllm serve /path/to/your/model \
--port 8080 \
--tensor-parallel-size 1 \
--data-parallel-size 8 \
--served-model-name InCoder-32B \
--disable-log-requests \
--max-model-len 131072 \
--gpu-memory-utilization 0.9 \
--trust-remote-code \
--enable-auto-tool-choice \
--tool-call-parser qwen3_xml
Tested. The solution is feasible.
When calling the vllm-deployed model via OpenCode, it was found that the model tends to output only line breaks between <think> and </think>, with no text content.
This issue mostly occurs during the thinking process before tool calls, and also has a high probability before text output.
Sometimes it occurs. Each additional round of thinking increases the number of line breaks inside by one. For example, the first round of thinking outputs one line break, and the second outputs two line breaks.