GGML_ASSERT errors running on llama.cpp 27e8a23300e30cd6ff6107ce262acf832ca60597

#1
by SamPurkis - opened

Does llama.cpp support this?
I get the error

.../llama.cpp/ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp:4012: GGML_ASSERT(params->wsize >= (GGML_PAD(nbw3, sizeof(int64_t)) + n_as * sizeof(int64_t) + n_as * ne12 * sizeof(mmid_row_mapping)))

when using the Q4_0
running

./build/bin/llama-cli -m ./models/OLMoE-1B-7B-0125-Instruct-Q4_0.gguf -no-cnv -p "what is the capital of paris?"

Hey @SamPurkis , I ran into the same issue on my end. I tried converting the model again, but the error persists. It seems like it might be an incompatibility with llama.cpp. I’ll dig a bit deeper, but I'm assuming if there’s been any recent change in llama.cpp that affects Q4_0 models.

Sign up or log in to comment