πŸ‘” Working quants for Qwen2.5 VL 7B.

We'll be uploading benchmark results along with the quants here.

The models have been tested on latest llama.cpp built with CLIP hardware acceleration manually enabled!

Consult the following post for more details: https://github.com/ggml-org/llama.cpp/issues/11483#issuecomment-2676422772

For now you can only do single prompts via the cli:

llama-qwen2vl-cli -m ~/gguf/Qwen2.5-VL-7B-Instruct-Q4_0.gguf --mmproj ~/gguf/mmproj-Qwen2.5-VL-7B-Instruct-f32.gguf --n_gpu_layers 9999 -p "Describe the image." --image ~/Pictures/test_small.png

We're working on a wrapper API solution until multimodal support is added back to llama.cpp

API will be published here: https://github.com/Independent-AI-Labs/local-super-agents

Let us know if you need a specific quant!

πŸ’ͺ Benchmarking Update:

The latest main looks stable with Vulkan CLIP and any model thrown at it so far. Some preliminary insights:

  • 1200x1200 is the maximum you can encode with 16GB of VRAM. clip.cpp does not seem to support multi-GPU Vulkan yet.
  • A 4060Ti-class GPU delivers 20-30 t/s with the Q8_0 and double that on Q4 @ 16-32K context.
  • Batching (multiple images) in a single cli call seems to be working fine:
llama-qwen2vl-cli--ctx-size 16000 -n 16000 -m ~/gguf/Qwen2.5-VL-7B-Instruct-Q4_0.gguf --mmproj ~/gguf/mmproj-Qwen2.5-VL-7B-Instruct-f32.gguf --n_gpu_layers 9999 -p "Describe the image in detail. Extract all textual information from it. Output as detailed JSON." -p "Analyze the image." --image ~/Pictures/test_small.png --image ~/Pictures/test_small.png

Output quality looks very promising! We'll release all of the benchmark code when ready, so the process can be streamlined for other models.

Downloads last month
146
GGUF
Model size
7.62B params
Architecture
qwen2vl

4-bit

8-bit

16-bit

32-bit

Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for IAILabs/Qwen2.5-VL-7b-Instruct-GGUF

Quantized
(15)
this model