inferencerlabs's picture
Upload complete model
eb2d0f0 verified
|
raw
history blame
1.25 kB
metadata
license: other
license_name: modified-mit
library_name: mlx
base_model: moonshotai/Kimi-K2-Instruct
pipeline_tag: text-generation
tags:
  - mlx

See Kimi-K2 Dynamic MLX in action - https://youtu.be/-zfUvA2CDqE

q3.95bit dynamic quant achieves 1.243 perplexity in our testing, slotting closer to q4 perplexity (1.168) than q3 perplexity (1.900).

| | | |:---:|:---:| | **q2** | perplexity: 41.293 | | **q3** | perplexity: 1.900 | | **q3.95** | perplexity: 1.243 | | **q4** | perplexity: 1.168 | | **q6** | perplexity: 1.128 | | **q8** | perplexity: 1.128 |

Kimi K2 Usage Notes

- Built with a modified version of MLX 0.26 - Runs on a single M3 Ultra 512GB RAM - Requires expanding VRAM limit to at least ~500000 MB (I use 507000 for a larger context window) sudo sysctl iogpu.wired_limit_mb=507000 - Expect ~20 tokens/s - For more details see demonstration video or visit Kimi K2. ---