GGUF IQ3_M quant of cognitivecomputations/dolphin-2.7-mixtral-8x7b (both non-imatrix and imatrix)
It fits into 24GiB VRAM with 32768 context (@ 8bit KV cache quantization).

Downloads last month
27
GGUF
Model size
46.7B params
Architecture
llama
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for NeoChen1024/dolphin-2.7-mixtral-8x7b-GGUF-IQ3_M

Quantized
(8)
this model