Require flash attention2 for AIDC-AI/Ovis1.6-Llama3.2-3B model, please help
#2
by
pawanc
- opened
Require flash attention2 for AIDC-AI/Ovis1.6-Llama3.2-3B model, please help
Could you specify the specific dilemma? Are you referring to performing inference without relying on flash attention?
I tried using it without the flash attention as currently I have CUDA 12.3 and when I downgrade it to 11.8 for flash attention for some reasons my GPU does not work. I believe we have to change the config.json where we have to set
llm_attn_implementation" to "eager", and also disable flash attention 2 by setting its value to false in modeling_ovis.py file in class ovis.
If there is any simpler way please do share.