openbuddy-r1-0528-distill-qwen3-32b-preview0-qat-gptq-4bit

Repository: ramgpt/openbuddy-r1-0528-distill-qwen3-32b-preview0-qat-gptq-4bit

This is a 4-bit SGPTQ quantized version of OpenBuddy/OpenBuddy-R1-0528-Distill-Qwen3-32B-Preview0-QAT, built for efficient inference with reduced memory and compute requirements.

Model Details

Base model: OpenBuddy-R1-0528-Distill-Qwen3-32B-Preview0-QAT
Quantization: SGPTQ, 4-bit
Format: GPTQ
Precision: INT4 (NF4 or compatible)
Use case: Chatbot, general-purpose LLM tasks
Target hardware: GPU inference with GPTQ-supported libraries (e.g., exllama, gptq-for-llama, vLLM with GPTQ)

How to Use

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("ramgpt/openbuddy-r1-0528-distill-qwen3-32b-preview0-qat-gptq-4bit", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("ramgpt/openbuddy-r1-0528-distill-qwen3-32b-preview0-qat-gptq-4bit", device_map="auto")