90+ tokens per second for MI300x8 using batch_size = 1

#166
by ghostplant - opened
# Step-1: Download Deepseek R1 671B Model
huggingface-cli download deepseek-ai/DeepSeek-R1 --local-dir ./deepseek-ai/DeepSeek-R1

# Step-2: Using 8 MI300 GPUs to Run Deepseek R1 Chat with Full Precision (PPL = 0)
docker run -it --rm --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 --privileged \
    -v /:/host -w /host$(pwd) tutelgroup/deepseek-671b:mi300x8-fp16xfp8 \
    --model_path ./deepseek-ai/DeepSeek-R1 \
    --prompt "Calculate the result of: 1 / (sqrt(5) - sqrt(3))"
ghostplant changed discussion title from 90+ tokens per second for MI300x8 to 90+ tokens per second for MI300x8 using batch_size = 1

Sign up or log in to comment