Add amazing new smol-IQ4_KSS

Files changed (2) hide show

README.md CHANGED Viewed

@@ -267,6 +267,63 @@ numactl -N "$SOCKET" -m "$SOCKET" \
 </details>
 ## IQ3_K 293.177 GiB (3.753 BPW)
 Final estimate: PPL = 3.4260 +/- 0.01995

 </details>
+## smol-IQ4_KSS 318.745 GiB (4.080 BPW)
+Final estimate: PPL = 3.3898 +/- 0.01964
+<details>
+<summary>👈 Secret Recipe</summary>
+```bash
+#!/usr/bin/env bash
+custom="
+## Attention [0-60] (GPU)
+blk\..*\.attn_k_b\.weight=q8_0
+blk\..*\.attn_v_b\.weight=q8_0
+# Balance of attn tensors
+blk\..*\.attn_kv_a_mqa\.weight=q8_0
+blk\..*\.attn_q_a\.weight=q8_0
+blk\..*\.attn_q_b\.weight=q8_0
+blk\..*\.attn_output\.weight=iq6_k
+## First Three Dense Layers [0-2] (GPU)
+blk\..*\.ffn_down\.weight=iq5_ks
+blk\..*\.ffn_(gate|up)\.weight=iq5_ks
+## Shared Expert [3-60] (GPU)
+blk\..*\.ffn_down_shexp\.weight=iq5_ks
+blk\..*\.ffn_(gate|up)_shexp\.weight=iq5_ks
+## Routed Experts [3-60] (CPU)
+blk\..*\.ffn_down_exps\.weight=iq4_kss
+blk\..*\.ffn_(gate|up)_exps\.weight=iq4_kss
+## Token embedding and output tensors (GPU)
+token_embd\.weight=iq4_k
+output\.weight=iq6_k
+"
+custom=$(
+  echo "$custom" | grep -v '^#' | \
+  sed -Ez 's:\n+:,:g;s:,$::;s:^,::'
+)
+SOCKET=1
+numactl -N "$SOCKET" -m "$SOCKET" \
+./build/bin/llama-quantize \
+    --custom-q "$custom" \
+    --imatrix /mnt/raid/models/ubergarm/DeepSeek-V3.1-GGUF/imatrix-DeepSeek-V3.1-Q8_0.dat \
+    /mnt/raid/models/ubergarm/DeepSeek-V3.1-GGUF/DeepSeek-V3.1-256x20B-safetensors-BF16-00001-of-00030.gguf \
+    /mnt/raid/models/ubergarm/DeepSeek-V3.1-GGUF/DeepSeek-V3.1-smol-IQ4_KSS.gguf \
+    IQ4_KSS \
+    192
+```
+</details>
 ## IQ3_K 293.177 GiB (3.753 BPW)
 Final estimate: PPL = 3.4260 +/- 0.01995

images/perplexity.png CHANGED Viewed