gpt-oss-20b-GGUF
Read our guide on using
gpt-oss
to learn how to adjust its responses
Highlights
- Permissive Apache 2.0 license: Build freely without copyleft restrictions or patent risk—ideal for experimentation, customization, and commercial deployment.
- Configurable reasoning effort: Easily adjust the reasoning effort (low, medium, high) based on your specific use case and latency needs.
- Full chain-of-thought: Gain complete access to the model’s reasoning process, facilitating easier debugging and increased trust in outputs. It’s not intended to be shown to end users.
- Fine-tunable: Fully customize models to your specific use case through parameter fine-tuning.
- Agentic capabilities: Use the models’ native capabilities for function calling, web browsing, Python code execution, and Structured Outputs.
- Native MXFP4 quantization: The models are trained with native MXFP4 precision for the MoE layer, making
gpt-oss-120b
run on a single 80GB GPU (like NVIDIA H100 or AMD MI300X) and thegpt-oss-20b
model run within 16GB of memory.
Refer to the original model card for more details on the model
Quants
Link | URI | Size |
---|---|---|
GGUF | hf:giladgd/gpt-oss-20b-GGUF/gpt-oss-20b.MXFP4.gguf |
12.1GB |
GGUF | hf:giladgd/gpt-oss-20b-GGUF/gpt-oss-20b.F16.gguf |
13.8GB |
Download a quant using
node-llama-cpp
(more info):npx -y node-llama-cpp pull <URI>
Usage
Use with node-llama-cpp
(recommended)
CLI
Chat with the model:
npx -y node-llama-cpp chat hf:giladgd/gpt-oss-20b-GGUF/gpt-oss-20b.MXFP4.gguf
Ensure that you have
node.js
installed first:brew install nodejs
Code
Use it in your node.js project:
npm install node-llama-cpp
import {getLlama, resolveModelFile, LlamaChatSession} from "node-llama-cpp";
const modelUri = "hf:giladgd/gpt-oss-20b-GGUF/gpt-oss-20b.MXFP4.gguf";
const llama = await getLlama();
const model = await llama.loadModel({
modelPath: await resolveModelFile(modelUri)
});
const context = await model.createContext();
const session = new LlamaChatSession({
contextSequence: context.getSequence()
});
const q1 = "Hi there, how are you?";
console.log("User: " + q1);
const a1 = await session.prompt(q1);
console.log("AI: " + a1);
Read the getting started guide to quickly scaffold a new
node-llama-cpp
project
Customize inference options
Set Harmoy options using HarmonyChatWrapper
:
import {
getLlama, resolveModelFile, LlamaChatSession, HarmonyChatWrapper,
defineChatSessionFunction
} from "node-llama-cpp";
const modelUri = "hf:giladgd/gpt-oss-20b-GGUF/gpt-oss-20b.MXFP4.gguf";
const llama = await getLlama();
const model = await llama.loadModel({
modelPath: await resolveModelFile(modelUri)
});
const context = await model.createContext();
const session = new LlamaChatSession({
contextSequence: context.getSequence(),
chatWrapper: new HarmonyChatWrapper({
modelIdentity: "You are ChatGPT, a large language model trained by OpenAI.",
reasoningEffort: "high"
})
});
const functions = {
getCurrentWeather: defineChatSessionFunction({
description: "Gets the current weather in the provided location.",
params: {
type: "object",
properties: {
location: {
type: "string",
description: "The city and state, e.g. San Francisco, CA"
},
format: {
enum: ["celsius", "fahrenheit"]
}
}
},
handler({location, format}) {
console.log(`Getting current weather for "${location}" in ${format}`);
return {
// simulate a weather API response
temperature: format === "celsius" ? 20 : 68,
format
};
}
})
};
const q1 = "What is the weather like in SF?";
console.log("User: " + q1);
const a1 = await session.prompt(q1, {functions});
console.log("AI: " + a1);
Use with llama.cpp
Install llama.cpp through brew (works on Mac and Linux)
brew install llama.cpp
CLI
llama-cli --hf-repo giladgd/gpt-oss-20b-GGUF --hf-file gpt-oss-20b.MXFP4.gguf -p "The meaning to life and the universe is"
Server
llama-server --hf-repo giladgd/gpt-oss-20b-GGUF --hf-file gpt-oss-20b.MXFP4.gguf -c 2048
- Downloads last month
- 452
Hardware compatibility
Log In
to view the estimation
16-bit
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for giladgd/gpt-oss-20b-GGUF
Base model
openai/gpt-oss-20b