cuijian0819
/

gpt-oss-20b-function-calling-gguf

Text Generation

Model card Files Files and versions Community

gpt-oss-20b-function-calling-gguf / README.md

cuijian0819's picture

Add comprehensive README

a71a80e verified 2 days ago

|

2.48 kB

	---
	tags:
	- gguf
	- quantized
	- gpt-oss
	- multilingual
	- text-generation
	- llama-cpp
	- ollama
	language:
	- en
	- es
	- fr
	- de
	- it
	- pt
	license: apache-2.0
	model_type: gpt-oss
	pipeline_tag: text-generation
	base_model: openai/gpt-oss-20b
	---

	# GPT-OSS-20B Function Calling GGUF

	This repository contains the GPT-OSS-20B model fine-tuned on function calling data, converted to GGUF format for efficient inference with llama.cpp and Ollama.

	## Model Details

	- Base Model: openai/gpt-oss-20b
	- Fine-tuning Dataset: Salesforce/xlam-function-calling-60k (2000 samples)
	- Fine-tuning Method: LoRA (r=8, alpha=16)
	- Context Length: 131,072 tokens
	- Model Size: 20B parameters

	## Files

	- `gpt-oss-20b-function-calling-f16.gguf`: F16 precision model (best quality)
	- `gpt-oss-20b-function-calling.Q4_K_M.gguf`: Q4_K_M quantized model (recommended for inference)

	## Usage

	### With Ollama (Recommended)

	```bash
	# Direct from Hugging Face
	ollama run hf.co/cuijian0819/gpt-oss-20b-function-calling-gguf:Q4_K_M

	# Or create local model
	ollama create my-gpt-oss -f Modelfile
	ollama run my-gpt-oss
	```

	### With llama.cpp

	```bash
	# Download model
	wget https://huggingface.co/cuijian0819/gpt-oss-20b-function-calling-gguf/resolve/main/gpt-oss-20b-function-calling.Q4_K_M.gguf

	# Run inference
	./llama-cli -m gpt-oss-20b-function-calling.Q4_K_M.gguf -p "Your prompt here"
	```

	### Example Modelfile for Ollama

	```dockerfile
	FROM ./gpt-oss-20b-function-calling.Q4_K_M.gguf

	TEMPLATE """<\|start\|>user<\|message\|>{{ .Prompt }}<\|end\|>
	<\|start\|>assistant<\|channel\|>final<\|message\|>"""

	PARAMETER temperature 0.7
	PARAMETER top_p 0.9

	SYSTEM """You are a helpful AI assistant that can call functions to help users."""
	```

	## PyTorch Version

	For training and fine-tuning with PyTorch/Transformers, check out the PyTorch version: [cuijian0819/gpt-oss-20b-function-calling](https://huggingface.co/cuijian0819/gpt-oss-20b-function-calling)

	## Performance

	The Q4_K_M quantized version provides excellent performance:
	- Size Reduction: ~62% smaller than F16
	- Memory Requirements: ~16GB VRAM recommended
	- Quality: Minimal degradation from quantization

	## License

	This model inherits the license from the base openai/gpt-oss-20b model.

	## Citation

	```bibtex
	@misc{gpt-oss-20b-function-calling-gguf,
	title={GPT-OSS-20B Function Calling GGUF},
	author={cuijian0819},
	year={2025},
	url={https://huggingface.co/cuijian0819/gpt-oss-20b-function-calling-gguf}
	}
	```