Llama.cpp support

#2
by wsbagnsv1 - opened

Would it be possible to add support for this model into llama.cpp? Since this is based on qwen3 and siglip it shouldnt be too hard, since those are already implemented in llama.cpp. The reason i ask, is because at7b this model would nicely fit into a 12gb vram card in even q8 quant but full precision is just too big for that /:

Sign up or log in to comment