MiniCPM-V-4_5-Q5_K_M.gguf not running on llama.cpp. gives "GGML_ASSERT(false && "unsupported minicpmv version") failed"
I am running your q5 gguf on the llama-server (build b6271) with the following parameters:
-m "..\models\MiniCPM\MiniCPM-V-4_5-Q5_K_M.gguf" --mmproj "..\models\MiniCPM\mmproj-MiniCPM-V-4_5-f16.gguf"
--image "..\models\hydration_page.png" --threads 8
--ctx-size 10000 --flash-attn
--n-gpu-layers 99 --cache-type-k q8_0
--cache-type-v q8_0 --temp 0.4
--top-p 0.95 --min-p 0.05
--top-k 0 --repeat-penalty 1.1
--seed 3407 `
I am getting the error:
D:\LLMs\llama.cpp\tools\mtmd\mtmd.cpp:223: GGML_ASSERT(false && "unsupported minicpmv version") failed.
We have merged it in llama.cpp, but the merge was not long ago, so you may need to synchronize the latest code.
Thanks. it works after updating llama.cpp.
There's another issue. Its thinking is way too long. is there a way to disable thinking on llama.cpp? I tried adding "/no_think " in the prompt, but it didnt work,
@mku1988
I've received your question and figured out the source of the issue. I'll submit a pull request to llama.cpp this week to fix this.
This likely means llama.cpp doesn't yet have a control for enabling or disabling COT in multimodal mode. I'll contribute this to the community.
Adding "--reasoning-budget 0" to my llama-server launch parameters worked for me to disable thinking.
I’m experiencing the same issue. After starting llama-server
with --reasoning-budget 0
, it still performs reasoning when asked with complex images. I hope this can be fixed soon. Thanks!