cuijian0819's picture
Add comprehensive README
a71a80e verified
|
raw
history blame
2.48 kB
---
tags:
- gguf
- quantized
- gpt-oss
- multilingual
- text-generation
- llama-cpp
- ollama
language:
- en
- es
- fr
- de
- it
- pt
license: apache-2.0
model_type: gpt-oss
pipeline_tag: text-generation
base_model: openai/gpt-oss-20b
---
# GPT-OSS-20B Function Calling GGUF
This repository contains the GPT-OSS-20B model fine-tuned on function calling data, converted to GGUF format for efficient inference with llama.cpp and Ollama.
## Model Details
- **Base Model:** openai/gpt-oss-20b
- **Fine-tuning Dataset:** Salesforce/xlam-function-calling-60k (2000 samples)
- **Fine-tuning Method:** LoRA (r=8, alpha=16)
- **Context Length:** 131,072 tokens
- **Model Size:** 20B parameters
## Files
- `gpt-oss-20b-function-calling-f16.gguf`: F16 precision model (best quality)
- `gpt-oss-20b-function-calling.Q4_K_M.gguf`: Q4_K_M quantized model (recommended for inference)
## Usage
### With Ollama (Recommended)
```bash
# Direct from Hugging Face
ollama run hf.co/cuijian0819/gpt-oss-20b-function-calling-gguf:Q4_K_M
# Or create local model
ollama create my-gpt-oss -f Modelfile
ollama run my-gpt-oss
```
### With llama.cpp
```bash
# Download model
wget https://huggingface.co/cuijian0819/gpt-oss-20b-function-calling-gguf/resolve/main/gpt-oss-20b-function-calling.Q4_K_M.gguf
# Run inference
./llama-cli -m gpt-oss-20b-function-calling.Q4_K_M.gguf -p "Your prompt here"
```
### Example Modelfile for Ollama
```dockerfile
FROM ./gpt-oss-20b-function-calling.Q4_K_M.gguf
TEMPLATE """<|start|>user<|message|>{{ .Prompt }}<|end|>
<|start|>assistant<|channel|>final<|message|>"""
PARAMETER temperature 0.7
PARAMETER top_p 0.9
SYSTEM """You are a helpful AI assistant that can call functions to help users."""
```
## PyTorch Version
For training and fine-tuning with PyTorch/Transformers, check out the PyTorch version: [cuijian0819/gpt-oss-20b-function-calling](https://huggingface.co/cuijian0819/gpt-oss-20b-function-calling)
## Performance
The Q4_K_M quantized version provides excellent performance:
- **Size Reduction:** ~62% smaller than F16
- **Memory Requirements:** ~16GB VRAM recommended
- **Quality:** Minimal degradation from quantization
## License
This model inherits the license from the base openai/gpt-oss-20b model.
## Citation
```bibtex
@misc{gpt-oss-20b-function-calling-gguf,
title={GPT-OSS-20B Function Calling GGUF},
author={cuijian0819},
year={2025},
url={https://huggingface.co/cuijian0819/gpt-oss-20b-function-calling-gguf}
}
```