See gpt-oss-120b 6.5bit MLX in action - demonstration video

q6.5bit quant typically achieves 1.128 perplexity in our testing which is equivalent to q8.

Quantization Perplexity
q2 41.293
q3 1.900
q4 1.168
q6 1.128
q8 1.128

Usage Notes

  • Built with a modified version of MLX 0.26
  • Memory usage: ~95 GB (down from ~251GB required by native MXFP4 format)
  • Expect ~60 tokens/s
  • For more details see demonstration video or visit OpenAI gpt-oss-20b.
Downloads last month
2,277
Safetensors
Model size
117B params
Tensor type
BF16
·
U32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for inferencerlabs/openai-gpt-oss-120b-MLX-6.5bit

Quantized
(33)
this model