90+ tokens per second for MI300x8 using batch_size = 1
#166
by
ghostplant
- opened
# Step-1: Download Deepseek R1 671B Model
huggingface-cli download deepseek-ai/DeepSeek-R1 --local-dir ./deepseek-ai/DeepSeek-R1
# Step-2: Using 8 MI300 GPUs to Run Deepseek R1 Chat with Full Precision (PPL = 0)
docker run -it --rm --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 --privileged \
-v /:/host -w /host$(pwd) tutelgroup/deepseek-671b:mi300x8-fp16xfp8 \
--model_path ./deepseek-ai/DeepSeek-R1 \
--prompt "Calculate the result of: 1 / (sqrt(5) - sqrt(3))"
ghostplant
changed discussion title from
90+ tokens per second for MI300x8
to 90+ tokens per second for MI300x8 using batch_size = 1