inferencerlabs's picture
Upload complete model
c3620d0 verified
|
raw
history blame
838 Bytes
metadata
license: apache-2.0
pipeline_tag: text-generation
library_name: mlx
tags:
  - vllm
  - mlx
base_model: openai/gpt-oss-120b

See gpt-oss-120b 6.5bit MLX in action - demonstration video

q6.5bit quant typically achieves 1.128 perplexity in our testing which is equivalent to q8 perplexity (1.128).

Quantization Perplexity
q2 41.293
q3 1.900
q4 1.168
q6 1.128
q8 1.128

Usage Notes