Thanks for posting this cool model !
Regarding using "flash-attention 2", do you mind elaborating more? Do you mean you are using FA-2 in your PEFT tuning?
· Sign up or log in to comment