gpt-oss-20b Nvdidia A10G error deploy Attention Sinks

#120
by Brayanb - opened

This is the main issue. The gpt-oss-20b model uses a feature called “Attention Sinks”, which vLLM can only run with FlashAttention 3. Problem is, FlashAttention 3 needs super-new GPU hardware) found only on Hopper/Blackwell GPUs—not your NVIDIA A10G (Ampere)
help, the deployment is in Modal.
ideas?

Captura de pantalla 2025-08-13 072051.png

Sign up or log in to comment