Great Model, but Quantization Issues Impact Performance
Hi team, thank you for creating this excellent MLX-quantized model! 🚀I'm running this quantized model with LM Studio. I encountered a critical issue: when processing images, the model constantly throws an AttributeError during prediction streams:
'LanguageModel' object has no attribute 'n_kv_heads'
This causes text generation to fail completely.
Even in pure-text conversations, the model exhibits "intelligence decay" after initial responses. Later messages become garbled, nonsensical (hallucinations), or include random symbols + perpetual thinking loops. Context retention is nearly zero—users can’t hold coherent multi-turn dialogues. This suggests the model may be unusable as-is.
Keep up the great work!
It has been fixed by Blaizzy:
https://github.com/Blaizzy/mlx-vlm/commit/42be3c96087651dbcc057e1c0336b05e1b39e2e6
https://github.com/Blaizzy/mlx-vlm/commit/4a02624dcda53a6cb53bbc077db19150b6931a6f
But it still does not work well on LM Studio, and I have no idea why for now...