tool calling not working as expected?
hi @all ,
I tried to use gpt-oss-20b with llama.cpp (converted to GGUF) using tool calls.
At offering a few functions to call the model always ends up with a reasoning_content mentioning what to use in a floating text but with no execution of the function itself or at least returning a tool_call structure providing the function (and parameters if needed) as e.g. command-r does.
'choices': [{'finish_reason': 'stop',
'index': 0,
'message': {'content': '',
'reasoning_content': 'User asks: "wie ist '
'das wetter in der '
'pariser region '
'cergy". Need to use '
'get_weather function. '
'Provide city and '
'region. City: Cergy. '
'Region: Paris. '
'Probably "Pariser '
'Region Cergy" -> '
'city: Cergy, region: '
'Paris. Use '
'function.<|start|>assistant<|channel|>commentary '
'to=functions.get_weather '
'json<|message|>{"city":"Cergy","region":"Paris"}',
'role': 'assistant'}}]
This is a bit unhandy to parse and not reliable as it's not always the same structure. Did I forget anything to get a usable result?
Thanks in advance for any help! :-)
Try ollama once.
@xbruce22 : Sorry, it's not my intention to be impolite, but if sth. doesn't work on my Linux machines I'm also not switching to windows as I'm interested in solutions, learning and making things better in the future. But not in throwing things away / replacing things at any tiny mistake. Btw.: ollama is based on llama.cpp
it just got released, I see constant fixes in llama.cpp or ollama. You re-build and it starts working. Thats why I asked you to try ollama. If you are interested in solution, raise issue, start looking into llama.cpp's code base etc. are ways to get started.
Much appreciated. Really. But couldn't it just be that reasoning models have just to be handled in a different way (I maybe don't know about and/or didn't know what to search for) than non reasoning ones?
Before using the sledge hammer (opening issues and digging into the code base) I want to find out if I'm doing everything correctly from my side... :-)
The result from my initial comment above was resulting from a simple request (e.g.: role: user, content: asking about the temperature in Paris, Cergy) and a default test collection of function I'm offering the model to choose from:
[{'function': {'description': 'gibt das aktuelle datum aus',
'name': 'get_date',
'parameters': {}},
'type': 'function'},
{'function': {'description': 'gibt die aktuelle uhrzeit aus',
'name': 'get_time',
'parameters': {}},
'type': 'function'},
{'function': {'description': 'gibt das aktuelle wetter für eine stadt aus',
'name': 'get_weather',
'parameters': {'properties': {'city': {'description': 'die '
'stadt '
'für die '
'die '
'temperatur '
'wiedergegeben '
'werden '
'soll',
'type': 'string'},
'region': {'description': 'die '
'region '
'für '
'die '
'die '
'temperatur '
'wiedergegeben '
'werden '
'soll',
'type': 'string'}},
'required': ['city', 'region'],
'type': 'object'}},
'type': 'function'}]
(Btw: are there any options to format code in the comments? The left alignment is really horrible to read... ^^)
The gpt-oss models do not properly call tools for me either. I've tried both 20b and 120b with a variety of quants. Qwen3-4b-instruct-2507 is more useful with tools than the gpt-oss-120b model
ex:
<think>
We should call the curl function with appropriate URL and request method.
We'll provide arguments array: ["GET", "
https://api.coingecko.com/v3/simple_price?ids=goog
"].
We need to consider that maybe token is not recognized. But we can attempt.
Let's proceed.
<think/>
Below is the curl command:
{
args: [
"GET",
"
https://api.coingecko.com/v3/simple_price?ids=goog
"
]
}The result returned from the curl function. This should contain the current price of GOOG in USD, as a JSON response like:
{
priceUsd: 0.12345 // Example value
}
``q
Just plain text, it does not actually invoke the tool. It also thought it should check a random crypto site for the google stock price. Overall these models aren't very useful in my testing. Unusable as an agent.
hello, any body find out any solution for this problem, i have the same.
Managed to get it working using completions API with ollama. I can say that gpt-oss:20b is a very good agentic model, and does an excellent job at instruction following.
Things I've learnt;
- Ensure tools are explicitly listed in system prompt (I also include schema as it reduces calls with invalid parameters, but up to you)
- Ensure tool_choice is set to 'required' or 'auto'
- Do NOT require a response format of JSON - this will cause gpt-oss to fail to respond with any content
I also stripped out reasoning tokens and put them inside tags, as I'm not sure Ollama handles them internally given they're not in OpenAi interface definition yet
Key snippets from working code...
// define Typescript types to keep linter happy
export type LLMConfig = Partial<ChatCompletionCreateParamsBase> & {options?: Record<string, any>}; // patch-in ollama field to type
export type ChatCompletionMessageReasoning = ChatCompletionMessage & {reasoning: string}; // patch-in missing reasoning field to type
export interface AgentTool { // just a simple type to keep definition and execution together - nothing special (uses Zod to infer and enforce arguments between the definition and execution)
definition: FunctionDefinition;
execute: (args: Record<string, any>) => Promise<string>;
}
// setup client config from environment vars
dotenv.config()
const openAiCfg = {
apiKey: `${process.env['OPENAI_API_KEY']}`,
baseURL: `${process.env['OPENAI_BASE_URL'] || 'https://api.openai.com/v1'}`,
project: `${process.env['OPENAI_PROJECT']}`,
organization: `${process.env['OPENAI_ORGANIZATION']}`,
}
if (!openAiCfg.apiKey) {
throw new Error('OPENAI_API_KEY is not set in the environment variables.');
}
const client = new OpenAI(openAiCfg);
...
// just to show option-setting works - may cause you to run out of VRAM so be careful with max_ctx
const getModelOptions = (model: string): LLMConfig => {
switch (model) {
case "gpt-5":
case "gpt-5-mini":
case "gpt-5-nano":
return {options: {num_ctx: 400000}, max_tokens: 128000 };
case 'gpt-oss:20b':
case 'gpt-oss:120b':
case 'gpt-oss:latest':
return {options: {num_ctx: 131072}, max_tokens: 131072 };
default:
return {options: {num_ctx: 8192}, max_tokens: 4096 }; // Default to a reasonable value
}
}
...
tools: AgentTool[] = []; // stubbed for this snippet, would normally contain AgentTool instances
const history: ChatCompletionMessageParam[] = []
const config: LLMConfig = {
model: 'gpt-oss:20b' // dont have an H100 to try 120b :(
}
...
const modelOptions = getModelOptions(config.model);
const response = await client.chat.completions.create({
max_tokens: modelOptions.max_tokens,
options: modelOptions.options,
temperature: 0.7, // haven't tried playing with this to see if it impacts tool calling
...config, // dynamic per-completion configuration tailoring (can override anything higher in list)
response_format: { // enforce text response
type: 'text',
},
stream: false, // force off to simplify testing
messages: history,
tools: tools.map(tool => agentTooltoFunction(tool.definition)), // this mapping just wraps each tool definition with {type: "function".... etc}
tool_choice: tools.length ? 'required' : undefined, // 'required' or 'auto' both work (required forces at least one tool call)
});
let message = (response.choices[0].message) as ChatCompletionMessageReasoning;
// patch reasoning between <think></think> tags TODO: Figure out if this is needed
message.content = message.reasoning
? `<think>${message.reasoning}</think>${message.content}`
: message
history.push(message);
Is it working with Kilo code or Roo Code?