Require flash attention2 for AIDC-AI/Ovis1.6-Llama3.2-3B model, please help

by pawanc - opened 25 days ago

Discussion

pawanc

25 days ago

Require flash attention2 for AIDC-AI/Ovis1.6-Llama3.2-3B model, please help

runninglsy

AIDC-AI org 24 days ago

Could you specify the specific dilemma? Are you referring to performing inference without relying on flash attention?

Atul0012803

23 days ago

•

edited 23 days ago

I tried using it without the flash attention as currently I have CUDA 12.3 and when I downgrade it to 11.8 for flash attention for some reasons my GPU does not work. I believe we have to change the config.json where we have to set
llm_attn_implementation" to "eager", and also disable flash attention 2 by setting its value to false in modeling_ovis.py file in class ovis.
If there is any simpler way please do share.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment