mlx-community/GLM-4.5V-abliterated-4bit · Great Model, but Quantization Issues Impact Performance

14 days ago

•

Hi team, thank you for creating this excellent MLX-quantized model! 🚀I'm running this quantized model with LM Studio. I encountered a critical issue: when processing images, the model constantly throws an AttributeError during prediction streams:
'LanguageModel' object has no attribute 'n_kv_heads'
This causes text generation to fail completely.

Even in pure-text conversations, the model exhibits "intelligence decay" after initial responses. Later messages become garbled, nonsensical (hallucinations), or include random symbols + perpetual thinking loops. Context retention is nearly zero—users can’t hold coherent multi-turn dialogues. This suggests the model may be unusable as-is.

Keep up the great work!

hehua2008

MLX Community org 3 days ago

It has been fixed by Blaizzy:
https://github.com/Blaizzy/mlx-vlm/commit/42be3c96087651dbcc057e1c0336b05e1b39e2e6
https://github.com/Blaizzy/mlx-vlm/commit/4a02624dcda53a6cb53bbc077db19150b6931a6f

But it still does not work well on LM Studio, and I have no idea why for now...

hehua2008 changed discussion status to closed 3 days ago