Important: Update modeling_mpt.py after Flash Attention 2.7.0

#22
by KingRei - opened

In PyTorch 2.7, the unpadding_function() returns five outputs. Therefore, assigning its result to four variables, as in (, indices_q, cu_seqlens_q, max_seqlen_q) = unpadding_function(...), will result in an error due to mismatched unpacking. To resolve this, you can modify the assignment to include an additional variable that captures the extra output, like so: (, indices_q, cu_seqlens_q, max_seqlen_q, *rest) = unpadding_function(...). This approach ensures that all returned values are properly accounted for, preventing unpacking errors. The same adjustment applies when unpacking outputs for k and v.

My mistake, not in PyTorch 2.7, but latest flash_attantion version 2.7.0
You can refer https://github.com/Dao-AILab/flash-attention/blob/v2.7.0/flash_attn/bert_padding.py#L127 it generate 5 variables, not 4

KingRei changed pull request title from Update modeling_mpt.py to Important: Update modeling_mpt.py after Flash Attention 2.7.0
Ready to merge
This branch is ready to get merged automatically.

Sign up or log in to comment