PyTorch now natively supports Flash Attention. I created a PR to add Flash Attention support for GPT-OSS:
https://github.com/huggingface/transformers/pull/42345
If you can't wait for the PR to get merged and registered in PyPI, here's a patch:
https://gist.github.com/markrogersjr/ebada9ad3a31381d8d4e0d956c852569