Multilingual-Multimodal-NLP/IndustrialCoder · What should the tool-call-parser be set to when running the model based on vllm?

What should the tool-call-parser be set to when running the model based on vllm?

by wolfzr - opened 5 days ago

Discussion

wolfzr

5 days ago

I tested and found that it does not match the common tool call parsing formats

zwpride

Multilingual-Multimodal-NLP org 4 days ago

Hi, you can try setting --tool-call-parser qwen3_xml. This should work for the tool call parsing format used by this model.

Example:

vllm serve /path/to/your/model \
    --port 8080 \
    --tensor-parallel-size 1 \
    --data-parallel-size 8 \
    --served-model-name InCoder-32B \
    --disable-log-requests \
    --max-model-len 131072 \
    --gpu-memory-utilization 0.9 \
    --trust-remote-code \
    --enable-auto-tool-choice \
    --tool-call-parser qwen3_xml

wolfzr

about 15 hours ago

Tested. The solution is feasible.

When calling the vllm-deployed model via OpenCode, it was found that the model tends to output only line breaks between <think> and </think>, with no text content.

This issue mostly occurs during the thinking process before tool calls, and also has a high probability before text output.

Sometimes it occurs. Each additional round of thinking increases the number of line breaks inside by one. For example, the first round of thinking outputs one line break, and the second outputs two line breaks.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment