|
--- |
|
tags: |
|
- gguf |
|
- quantized |
|
- gpt-oss |
|
- multilingual |
|
- text-generation |
|
- llama-cpp |
|
- ollama |
|
language: |
|
- en |
|
- es |
|
- fr |
|
- de |
|
- it |
|
- pt |
|
license: apache-2.0 |
|
model_type: gpt-oss |
|
pipeline_tag: text-generation |
|
base_model: openai/gpt-oss-20b |
|
--- |
|
|
|
# GPT-OSS-20B Function Calling GGUF |
|
|
|
This repository contains the GPT-OSS-20B model fine-tuned on function calling data, converted to GGUF format for efficient inference with llama.cpp and Ollama. |
|
|
|
## Model Details |
|
|
|
- **Base Model:** openai/gpt-oss-20b |
|
- **Fine-tuning Dataset:** Salesforce/xlam-function-calling-60k (2000 samples) |
|
- **Fine-tuning Method:** LoRA (r=8, alpha=16) |
|
- **Context Length:** 131,072 tokens |
|
- **Model Size:** 20B parameters |
|
|
|
## Files |
|
|
|
- `gpt-oss-20b-function-calling-f16.gguf`: F16 precision model (best quality) |
|
- `gpt-oss-20b-function-calling.Q4_K_M.gguf`: Q4_K_M quantized model (recommended for inference) |
|
|
|
## Usage |
|
|
|
### With Ollama (Recommended) |
|
|
|
```bash |
|
# Direct from Hugging Face |
|
ollama run hf.co/cuijian0819/gpt-oss-20b-function-calling-gguf:Q4_K_M |
|
|
|
# Or create local model |
|
ollama create my-gpt-oss -f Modelfile |
|
ollama run my-gpt-oss |
|
``` |
|
|
|
### With llama.cpp |
|
|
|
```bash |
|
# Download model |
|
wget https://huggingface.co/cuijian0819/gpt-oss-20b-function-calling-gguf/resolve/main/gpt-oss-20b-function-calling.Q4_K_M.gguf |
|
|
|
# Run inference |
|
./llama-cli -m gpt-oss-20b-function-calling.Q4_K_M.gguf -p "Your prompt here" |
|
``` |
|
|
|
### Example Modelfile for Ollama |
|
|
|
```dockerfile |
|
FROM ./gpt-oss-20b-function-calling.Q4_K_M.gguf |
|
|
|
TEMPLATE """<|start|>user<|message|>{{ .Prompt }}<|end|> |
|
<|start|>assistant<|channel|>final<|message|>""" |
|
|
|
PARAMETER temperature 0.7 |
|
PARAMETER top_p 0.9 |
|
|
|
SYSTEM """You are a helpful AI assistant that can call functions to help users.""" |
|
``` |
|
|
|
## PyTorch Version |
|
|
|
For training and fine-tuning with PyTorch/Transformers, check out the PyTorch version: [cuijian0819/gpt-oss-20b-function-calling](https://huggingface.co/cuijian0819/gpt-oss-20b-function-calling) |
|
|
|
## Performance |
|
|
|
The Q4_K_M quantized version provides excellent performance: |
|
- **Size Reduction:** ~62% smaller than F16 |
|
- **Memory Requirements:** ~16GB VRAM recommended |
|
- **Quality:** Minimal degradation from quantization |
|
|
|
## License |
|
|
|
This model inherits the license from the base openai/gpt-oss-20b model. |
|
|
|
## Citation |
|
|
|
```bibtex |
|
@misc{gpt-oss-20b-function-calling-gguf, |
|
title={GPT-OSS-20B Function Calling GGUF}, |
|
author={cuijian0819}, |
|
year={2025}, |
|
url={https://huggingface.co/cuijian0819/gpt-oss-20b-function-calling-gguf} |
|
} |
|
``` |
|
|