GGML_ASSERT errors running on llama.cpp 27e8a23300e30cd6ff6107ce262acf832ca60597

by SamPurkis - opened 11 days ago

11 days ago

Does llama.cpp support this?
I get the error

.../llama.cpp/ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp:4012: GGML_ASSERT(params->wsize >= (GGML_PAD(nbw3, sizeof(int64_t)) + n_as * sizeof(int64_t) + n_as * ne12 * sizeof(mmid_row_mapping)))

when using the Q4_0
running

./build/bin/llama-cli -m ./models/OLMoE-1B-7B-0125-Instruct-Q4_0.gguf -no-cnv -p "what is the capital of paris?"

amanrangapur

Ai2 org 5 days ago

Hey @SamPurkis , I ran into the same issue on my end. I tried converting the model again, but the error persists. It seems like it might be an incompatibility with llama.cpp. I’ll dig a bit deeper, but I'm assuming if there’s been any recent change in llama.cpp that affects Q4_0 models.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment