Is it mandatory to install flash-attention for GPT-OSS?
#112
by
xiaotianyu2025
- opened
Is it mandatory to install flash-attention for GPT-OSS? Installing it causes the 3090 server to freeze for an hour, making it impossible to access the server. After it ends, pip list doesn't show flash-attention either.
Running:
"git clone https://github.com/Dao-AILab/flash-attention.git
cd flash-attention
git checkout v3
pip install --upgrade pip setuptools wheel
pip install . --use-pep517 "
and
"MAKEFLAGS="-j2" pip install . --use-pep517 "
always causes the server to freeze.
vllm==0.10.1+gptoss transformers==4.55.0 python3.12 cuda12.0