Is it mandatory to install flash-attention for GPT-OSS?

#112
by xiaotianyu2025 - opened

Is it mandatory to install flash-attention for GPT-OSS? Installing it causes the 3090 server to freeze for an hour, making it impossible to access the server. After it ends, pip list doesn't show flash-attention either.

Running:
"git clone https://github.com/Dao-AILab/flash-attention.git
cd flash-attention
git checkout v3
pip install --upgrade pip setuptools wheel
pip install . --use-pep517 "
and
"MAKEFLAGS="-j2" pip install . --use-pep517 "
always causes the server to freeze.

vllm==0.10.1+gptoss transformers==4.55.0 python3.12 cuda12.0

Sign up or log in to comment