doesnt work

#2
by ArtemSI - opened

when i start this code i get this error: You are not running the flash-attention implementation, expect numerical differences. how to fix it? i have flash_attn installed

Make sure you use transformers==4.37.2. and your GPU version is Ampere or higher (A100, H100 etc). Otherwise, modify the "use_flash_attn" value to "false" in config file and then you can still run it with warning "You are not running the flash-attention implementation, expect numerical differences". I used Tesla 4 to run this demo and the answer still looks fine.

OpenGVLab org

when i start this code i get this error: You are not running the flash-attention implementation, expect numerical differences. how to fix it? i have flash_attn installed

Thank you for your feedback. Now that flash attention is enabled for Phi3, eager attention is automatically used if flash attention is not installed in the environment.

czczup changed discussion status to closed

Sign up or log in to comment