Can gpt-oss support local vllm deployment on a100 GPU?

#73

by ustcHouHou - opened 14 days ago

Discussion

ustcHouHou

14 days ago

Has anyone successfully deployed it?

dytarasov

14 days ago

At PostgresPro, we are trying to deploy a model on our A100 80GB GPU, but we’re currently facing issues with vLLM and FlashAttention 3. We followed an OpenAI guide, but the model won’t start. If we achieve positive results, I’ll write a guide.

yusufhadiwinata

14 days ago

•

edited 13 days ago

Yes you can on H100 but not on A100

wolframko

14 days ago

you can't since A100 does not support mxfp4 quantization

surak

13 days ago

They say that the models, even the small one, are only available for Ada/Hopper and newer. So a100 is a no-no. Too bad, as we have thousands of those here

sfjhu

13 days ago

We cannot currently deploy them on our A100s or our L40Ss. There's a few GitHub issues a mile long with people having the same problems, hopefully the vLLM crew is working on getting older cards supported.

surak

13 days ago

This is a flash-attention thing. Give it a try running without it.

ghostplant

13 days ago

Actually, regardless of efficiency, Sinks doesn't have to be supported only since Hopper and FA3, so does MXFP4.

Here is a temp solution for single A100 (80G) to serve whatever 20B and 120B version: Tutel Instruction to Run GptOSS.

ustcHouHou

13 days ago

See https://github.com/vllm-project/vllm/issues/22290#issuecomment-3165645703

ghostplant

13 days ago

•

edited 13 days ago

See https://github.com/vllm-project/vllm/issues/22290#issuecomment-3165645703

Performance for a single user (until 2025/Aug/8):
Tutel GptOSS (20B on 1xA100): 212 tps
VLLM GptOSS (20B on 1xA100): 139 tps
SLANG GptOSS (20B on 1xA100): (on-going?)
OLLAMA GptOSS (20B on 1xA100): 75 tps

reach-vb

13 days ago

Just bumping that indeed https://github.com/vllm-project/vllm/issues/22290#issuecomment-3165645703 you can now run vllm on A100.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment