This is the main issue. The gpt-oss-20b model uses a feature called “Attention Sinks”, which vLLM can only run with FlashAttention 3. Problem is, FlashAttention 3 needs super-new GPU hardware) found only on Hopper/Blackwell GPUs—not your NVIDIA A10G (Ampere) help, the deployment is in Modal. ideas?