90+ tokens per second for MI300x8 using batch_size = 1

#166

by ghostplant - opened 2 days ago

2 days ago

# Step-1: Download Deepseek R1 671B Model
huggingface-cli download deepseek-ai/DeepSeek-R1 --local-dir ./deepseek-ai/DeepSeek-R1

# Step-2: Using 8 MI300 GPUs to Run Deepseek R1 Chat with Full Precision (PPL = 0)
docker run -it --rm --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 --privileged \
    -v /:/host -w /host$(pwd) tutelgroup/deepseek-671b:mi300x8-fp16xfp8 \
    --model_path ./deepseek-ai/DeepSeek-R1 \
    --prompt "Calculate the result of: 1 / (sqrt(5) - sqrt(3))"

ghostplant changed discussion title from 90+ tokens per second for MI300x8 to 90+ tokens per second for MI300x8 using batch_size = 1 1 day ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment