prudant/Qwen3-Reranker-4B-seq-cls-vllm-fixed-W4A16_ASYM

This is a compressed version of danielchalef/Qwen3-Reranker-4B-seq-cls-vllm-fixed using llm-compressor with the following scheme: W4A16_ASYM

Serving

python3 -m vllm.entrypoints.openai.api_server --model 'dolfsai/Qwen3-Reranker-4B-seq-cls-vllm-W4A16_ASYM' --task classify

Important: You MUST read the following guide for correct usage of this model here Guide

Model Details

  • Original Model: danielchalef/Qwen3-Reranker-4B-seq-cls-vllm-fixed
  • Quantization Method: AWQ
  • Compression Libraries: llm-compressor
  • Calibration Dataset: ultrachat_200k (512 samples)
  • Optimized For: Inference with vLLM
  • License: same as original model
Downloads last month
15
Safetensors
Model size
875M params
Tensor type
I64
·
I32
·
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for dolfsai/Qwen3-Reranker-4B-seq-cls-vllm-W4A16_ASYM

Base model

Qwen/Qwen3-4B-Base
Quantized
(15)
this model