Does it support Flash Attention-2?

#12
by deshwalmahesh - opened

More importantly, if it does, is it bug free unlike Phi-2 as that one is still having lots of issues in Flash-Attention-2 in either loading time or results are bad with Flash-Attn-2

Yes, it does support and was tested with Flash-Attention-2.

Thanks a lot @caiom . Out of curiosity, will it still work good if I load it with attn_implementation = None

My code is breaking with dropout_layer_norm: an issue already opened in the discussions, with solution

Microsoft org
edited Apr 23

Please re-download the latest revision and dropout_layer_norm will not be a problem anymore.

Oh amazing! Thanks. My Flash-Attn was breaking when I tried the hack given in that thread. Anyways, still curious:

If it still works good when loaded with attn_implementation = None

nguyenbh changed discussion status to closed

Sign up or log in to comment