view reply run 20B with kernels-community/vllm-flash-attn3 on Ampere architecture, it predicts stupid things. Without flash-attn(remove attn_implementation), it runs normally. Can someone help on this?