doesnt work
when i start this code i get this error: You are not running the flash-attention implementation, expect numerical differences. how to fix it? i have flash_attn installed
Make sure you use transformers==4.37.2. and your GPU version is Ampere or higher (A100, H100 etc). Otherwise, modify the "use_flash_attn" value to "false" in config file and then you can still run it with warning "You are not running the flash-attention implementation, expect numerical differences". I used Tesla 4 to run this demo and the answer still looks fine.
when i start this code i get this error: You are not running the flash-attention implementation, expect numerical differences. how to fix it? i have flash_attn installed
Thank you for your feedback. Now that flash attention is enabled for Phi3, eager attention is automatically used if flash attention is not installed in the environment.