question about deploying the model using containers
#1
by
rthenamvar
- opened
Hi, I hope everything is great for you. I'm trying to deploy this model using Docker containers. The model is deployed successfully and generates prompts, but is really slow compared to other models currently deployed (Qwen2.5-7B-Instruct-GPTQ-Int4). So I'm guessing that there may be something wrong with my config which is here:
--model neuralmagic/starcoder2-7b-quantized.w8a8
--disable-log-requests
--enable-prefix-caching
--use-v2-block-manager
--max_num_batched_tokens 32000
--disable-sliding-window
--block-size 32
--max-num-seqs 600"
I would appreciate any kind of help