Update README.md
Browse files
README.md
CHANGED
@@ -65,7 +65,7 @@ python -m mlx_lm generate --model halley-ai/gpt-oss-20b-MLX-5bit-gs32 \
|
|
65 |
|
66 |
LM Studio / CLI (MLX, Q5 gs=32) ≈2k-token responses:
|
67 |
- M1 Max (32 GB): ~45–50 tok/s, 0.40–0.60 s TTFB
|
68 |
-
- M4 Pro (24 GB):
|
69 |
- M3 Ultra (256 GB): pending
|
70 |
|
71 |
Throughput varies with Mac model, context, and sampler settings.
|
|
|
65 |
|
66 |
LM Studio / CLI (MLX, Q5 gs=32) ≈2k-token responses:
|
67 |
- M1 Max (32 GB): ~45–50 tok/s, 0.40–0.60 s TTFB
|
68 |
+
- M4 Pro (24 GB): ~65–70 tok/s, 0.25–0.45 s TTFB
|
69 |
- M3 Ultra (256 GB): pending
|
70 |
|
71 |
Throughput varies with Mac model, context, and sampler settings.
|