inferencerlabs's picture
Upload complete model
fa92bd9 verified
|
raw
history blame
871 Bytes
metadata
license: apache-2.0
pipeline_tag: text-generation
library_name: mlx
tags:
  - vllm
  - mlx
base_model: openai/gpt-oss-120b

See gpt-oss-120b 6.5bit MLX in action - demonstration video

q6.5bit quant typically achieves 1.128 perplexity in our testing which is equivalent to q8.

Quantization Perplexity
q2 41.293
q3 1.900
q4 1.168
q6 1.128
q8 1.128

Usage Notes

  • Built with a modified version of MLX 0.26
  • Memory usage: ~95 GB (down from ~251GB required by native MXFP4 format)
  • Expect ~60 tokens/s
  • For more details see demonstration video or visit OpenAI gpt-oss-20b.