uyiosa commited on
Commit
2a03737
·
verified ·
1 Parent(s): 6555649

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +11 -4
README.md CHANGED
@@ -19,11 +19,18 @@ print(model)
19
 
20
  ## Loading this model with VLLM via docker
21
  ```
22
- docker run --runtime nvidia --gpus all --env "HUGGING_FACE_HUB_TOKEN = .........." -p 8000:8000 \
 
 
23
  --ipc=host --model jsbaicenter/Llama-3.3-70b-Instruct-AWQ-4BIT-GEMM \
24
- --gpu-memory-utilization 0.9 --swap-space 0 \
25
- --max-seq-len-to-capture 512 --max-num-seqs 1 --api-key "token-abc123" --max-model-len 8000 \
26
- --trust-remote-code --enable-chunked-prefill --max_num_batched_tokens 1024
 
 
 
 
 
27
  ```
28
 
29
  ## A method to merge adapter weights to the base model and quantize
 
19
 
20
  ## Loading this model with VLLM via docker
21
  ```
22
+ docker run --runtime nvidia --gpus all \
23
+ --env "HUGGING_FACE_HUB_TOKEN = .........." \
24
+ -p 8000:8000 \
25
  --ipc=host --model jsbaicenter/Llama-3.3-70b-Instruct-AWQ-4BIT-GEMM \
26
+ --gpu-memory-utilization 0.9 \
27
+ --swap-space 0 \
28
+ --max-seq-len-to-capture 512 \
29
+ --max-num-seqs 1 \
30
+ --api-key "token-abc123" \
31
+ --max-model-len 8000 \
32
+ --trust-remote-code --enable-chunked-prefill \
33
+ --max_num_batched_tokens 1024
34
  ```
35
 
36
  ## A method to merge adapter weights to the base model and quantize