kernels-community/vllm-flash-attn3 · using SlidingWindowLayer Cache will cause a crash

Tested the kernel with gpt-oss with a variable cache implementation that uses SlidingWindowLayer Cache for sliding window layers and static cache for full attention layers.

with this setup the current implementation will crash on the first generation step on the first attention layer which is a sliding attention layer

sent arguments to the attention interface:

query_states.shape
torch.Size([1, 64, 1, 64])
key_states.shape
torch.Size([1, 8, 128, 64])
value_states.shape
torch.Size([1, 8, 128, 64])

sliding_window=128
kwargs
{'position_ids': tensor([[575]], device='cuda:0'), 'output_attentions': False, 'use_cache': True}

same arguments work fine on eager implementation