Kimi-K2-Instruct-0905 / 模型下载攻略 /tool_call_guidance.md
bigmoyan's picture
add FAQ for tool calls.
d56abb2 verified

Tool Calling

To enable the tool calling feature, you may need to set certain tool calling parser options when starting the service. See deploy_guidance for details. In Kimi-K2, a tool calling process includes:

  • Passing function descriptions to Kimi-K2
  • Kimi-K2 decides to make a function call and returns the necessary information for the function call to the user
  • The user performs the function call, collects the call results, and passes the function call results to Kimi-K2
  • Kimi-K2 continues to generate content based on the function call results until the model believes it has obtained sufficient information to respond to the user

Preparing Tools

Suppose we have a function get_weather that can query the weather conditions in real-time. This function accepts a city name as a parameter and returns the weather conditions. We need to prepare a structured description for it so that Kimi-K2 can understand its functionality.

def get_weather(city):
    return {"weather": "Sunny"}

# Collect the tool descriptions in tools
tools = [{
    "type": "function",
    "function": {        
        "name": "get_weather", 
        "description": "Get weather information. Call this tool when the user needs to get weather information", 
         "parameters": {
              "type": "object",
              "required": ["city"], 
              "properties": { 
                  "city": { 
                      "type": "string", 
                      "description": "City name", 
                }
            }
        }
    }
}]

# Tool name->object mapping for easy calling later
tool_map = {
    "get_weather": get_weather
}

Chat with tools

We use openai.OpenAI to send messages to Kimi-K2 along with tool descriptions. Kimi-K2 will autonomously decide whether to use and how to use the provided tools. If Kimi-K2 believes a tool call is needed, it will return a result with finish_reason='tool_calls'. At this point, the returned result includes the tool call information. After calling tools with the provided information, we then need to append the tool call results to the chat history and continue calling Kimi-K2. Kimi-K2 may need to call tools multiple times until the model believes the current results can answer the user's question. We should check finish_reason until it is not tool_calls.

The results obtained by the user after calling the tools should be added to messages with role='tool'.

import json
from openai import OpenAI
model_name='moonshotai/Kimi-K2-Instruct'
client = OpenAI(base_url=endpoint, 
                        api_key='xxx')

messages = [
{"role": "user", "content": "What's the weather like in Beijing today? Let's check using the tool."}
]
finish_reason = None
while finish_reason is None or finish_reason == "tool_calls":
    completion = client.chat.completions.create(
        model=model_name,
        messages=messages,
        temperature=0.3,
        tools=tools, 
        tool_choice="auto",
    )
    choice = completion.choices[0]
    finish_reason = choice.finish_reason
    # Note: The finish_reason when tool calls end may vary across different engines, so this condition check needs to be adjusted accordingly
    if finish_reason == "tool_calls": 
        messages.append(choice.message)
        for tool_call in choice.message.tool_calls: 
            tool_call_name = tool_call.function.name
            tool_call_arguments = json.loads(tool_call.function.arguments) 
            tool_function = tool_map[tool_call_name] 
            tool_result = tool_function(tool_call_arguments)
            print("tool_result", tool_result)

            messages.append({
                "role": "tool",
                "tool_call_id": tool_call.id,
                "name": tool_call_name,
                "content": json.dumps(tool_result), 
            })
print('-' * 100)
print(choice.message.content)

Tool Calling in Streaming Mode

Tool calling can also be used in streaming mode. In this case, we need to collect the tool call information returned in the stream until we have a complete tool call. Please refer to the code below:

messages = [
    {"role": "user", "content": "What's the weather like in Beijing today? Let's check using the tool."}
]
finish_reason = None
msg = ''
while finish_reason is None or finish_reason == "tool_calls":
    completion = client.chat.completions.create(
        model=model_name,
        messages=messages,
        temperature=0.3,
        tools=tools,
        tool_choice="auto",
        stream=True 
    )
    tool_calls = []
    for chunk in completion:
        delta = chunk.choices[0].delta
        if delta.content:
            msg += delta.content
        if delta.tool_calls:
            for tool_call_chunk in delta.tool_calls:
                if tool_call_chunk.index is not None:
                    # Extend the tool_calls list
                    while len(tool_calls) <= tool_call_chunk.index:
                        tool_calls.append({
                            "id": "",
                            "type": "function",
                            "function": {
                                "name": "",
                                "arguments": ""
                            }
                        })

                    tc = tool_calls[tool_call_chunk.index]

                    if tool_call_chunk.id:
                        tc["id"] += tool_call_chunk.id
                    if tool_call_chunk.function.name:
                        tc["function"]["name"] += tool_call_chunk.function.name
                    if tool_call_chunk.function.arguments:
                        tc["function"]["arguments"] += tool_call_chunk.function.arguments

        finish_reason = chunk.choices[0].finish_reason
    # Note: The finish_reason when tool calls end may vary across different engines, so this condition check needs to be adjusted accordingly
    if finish_reason == "tool_calls":
        for tool_call in tool_calls:
            tool_call_name = tool_call['function']['name']
            tool_call_arguments = json.loads(tool_call['function']['arguments'])
            tool_function = tool_map[tool_call_name] 
            tool_result = tool_function(tool_call_arguments)
            messages.append({
                "role": "tool",
                "tool_call_id": tool_call['id'],
                "name": tool_call_name,
                "content": json.dumps(tool_result),
            })
        # The text generated by the tool call is not the final version, reset msg
        msg = ''

    print(msg)

Manually Parsing Tool Calls

The tool call requests generated by Kimi-K2 can also be parsed manually, which is especially useful when the service you are using does not provide a tool-call parser. The tool call requests generated by Kimi-K2 are wrapped by <|tool_calls_section_begin|> and <|tool_calls_section_end|>, with each tool call wrapped by <|tool_call_begin|> and <|tool_call_end|>. The tool ID and arguments are separated by <|tool_call_argument_begin|>. The format of the tool ID is functions.{func_name}:{idx}, from which we can parse the function name.

Based on the above rules, we can directly post request to the completions interface and manually parse tool calls.

import requests
from transformers import AutoTokenizer
messages = [
    {"role": "user", "content": "What's the weather like in Beijing today? Let's check using the tool."}
]
msg = ''
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
while True:
    text = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        tools=tools,
        add_generation_prompt=True,
    )
    payload = {
        "model": model_name,
        "prompt": text,
        "max_tokens": 512
    }
    response = requests.post(
        f"{endpoint}/completions",
        headers={"Content-Type": "application/json"},
        json=payload,
        stream=False,
    )
    raw_out = response.json()

    raw_output = raw_out["choices"][0]["text"]
    tool_calls = extract_tool_call_info(raw_output)
    if len(tool_calls) == 0:
        # No tool calls
        msg = raw_output
        break
    else:
        for tool_call in tool_calls:
            tool_call_name = tool_call['function']['name']
            tool_call_arguments = json.loads(tool_call['function']['arguments'])
            tool_function = tool_map[tool_call_name]
            tool_result = tool_function(tool_call_arguments)

            messages.append({
                "role": "tool",
                "tool_call_id": tool_call['id'],
                "name": tool_call_name,
                "content": json.dumps(tool_result), 
            })
print('-' * 100)          
print(msg)

Here, extract_tool_call_info parses the model output and returns the model call information. A simple implementation would be:

def extract_tool_call_info(tool_call_rsp: str):
    if '<|tool_calls_section_begin|>' not in tool_call_rsp:
        # No tool calls
        return []
    import re
    pattern = r"<\|tool_calls_section_begin\|>(.*?)<\|tool_calls_section_end\|>"
    
    tool_calls_sections = re.findall(pattern, tool_call_rsp, re.DOTALL)
    
    # Extract multiple tool calls
    func_call_pattern = r"<\|tool_call_begin\|>\s*(?P<tool_call_id>[\w\.]+:\d+)\s*<\|tool_call_argument_begin\|>\s*(?P<function_arguments>.*?)\s*<\|tool_call_end\|>"
    tool_calls = []
    for match in re.findall(func_call_pattern, tool_calls_sections[0], re.DOTALL):
        function_id, function_args = match
        # function_id: functions.get_weather:0
        function_name = function_id.split('.')[1].split(':')[0]
        tool_calls.append(
            {
                "id": function_id,
                "type": "function",
                "function": {
                    "name": function_name,
                    "arguments": function_args
                }
            }
        )  
    return tool_calls

FAQ

Q1: I received special tokens like '<|tool_call_begin|>' in the 'content' field instead of a normal tool_call.

This indicates a tool-call crash, which most often occurs in multi-turn tool-calling scenarios due to incorrect tool-call ID. K2 expects the ID to follow the format functions.func_name:idx, where functions is a fixed string; func_name is the actual function name, like get_weather, and idx is a global counter that starts at 0 and increments with each function invocation. Please check all tool-call IDs in the message list.

Q2: My tool-call ID is incorrect—how can I fix it?

First, make sure your code and chat template are up to date with the latest version from the Hugging Face repo. If you're using vLLM or SGLang and they are generating random tool-call IDs, upgrade them to the latest release. For other frameworks, you must either parse the tool-call ID from the model output and set it correctly in the server-side response, or rewrite every tool-call ID according to the rules above on the client side before sending the messages to Kimi K2.

Q3: My tool call id is correct, but I still get crashed in multiturn tool call.

Please describe your situation in the discussion