prudant/Qwen3-Reranker-4B-seq-cls-vllm-fixed-W4A16_ASYM
This is a compressed version of danielchalef/Qwen3-Reranker-4B-seq-cls-vllm-fixed using llm-compressor with the following scheme: W4A16_ASYM
Serving
python3 -m vllm.entrypoints.openai.api_server --model 'dolfsai/Qwen3-Reranker-4B-seq-cls-vllm-W4A16_ASYM' --task classify
Important: You MUST read the following guide for correct usage of this model here Guide
Model Details
- Original Model: danielchalef/Qwen3-Reranker-4B-seq-cls-vllm-fixed
- Quantization Method: AWQ
- Compression Libraries: llm-compressor
- Calibration Dataset: ultrachat_200k (512 samples)
- Optimized For: Inference with vLLM
- License: same as original model
- Downloads last month
- 15
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support