GGML_ASSERT errors running on llama.cpp 27e8a23300e30cd6ff6107ce262acf832ca60597
#1
by
SamPurkis
- opened
Does llama.cpp support this?
I get the error
.../llama.cpp/ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp:4012: GGML_ASSERT(params->wsize >= (GGML_PAD(nbw3, sizeof(int64_t)) + n_as * sizeof(int64_t) + n_as * ne12 * sizeof(mmid_row_mapping)))
when using the Q4_0
running
./build/bin/llama-cli -m ./models/OLMoE-1B-7B-0125-Instruct-Q4_0.gguf -no-cnv -p "what is the capital of paris?"
Hey @SamPurkis , I ran into the same issue on my end. I tried converting the model again, but the error persists. It seems like it might be an incompatibility with llama.cpp. I’ll dig a bit deeper, but I'm assuming if there’s been any recent change in llama.cpp that affects Q4_0 models.