RuntimeError: FlashAttention is not installed.
Hi, can you tell me how to disable flash_attn?
model = SentenceTransformer("jinaai/jina-embeddings-v3",
device = device, trust_remote_code=True, model_kwargs={'default_task': 'text-matching' })
................
trainer = SentenceTransformerTrainer(
model=model,
args=args,
train_dataset=train_dataset,
eval_dataset=eval_dataset,
loss=train_loss,
evaluator=dev_evaluator,
)
trainer.train()
RuntimeError: FlashAttention is not installed. To proceed with training, please install FlashAttention. For inference, you have two options: either install FlashAttention or disable it by setting use_flash_attn=False when loading the model.
Sentence Transformers v3.2
Hi @seregadgl , you need to have flash attention installed if you want to train the model, you can only disable it during inference
Thanks for the answer, maybe you can tell me what version of flash attention to install so that I can fine-tune the model in Google Colab on the T4 video card. Thanks!
Seems like you also need to install other dependencies (i.e. triton).
If you see rotary.py file, you could find that the RuntimeError: FlashAttention is not installed
exception is raised if you failed to run from flash_attn.ops.triton.rotary import apply_rotary
.
This line requires both flash attention and triton.
So, I guess you should also install the triton by running pip install triton
@seregadgl you can install any recent version, the last one (2.6.3) should work fine
@BlackBeenie you're right, it requires triton as well, however triton should be automatically installed as you install torch if cuda is enabled
@jupyterjazz
Seems like triton is not installed automatically in Google Colab. Cos, I also faced similar error, and running the pip install triton
actually fixes the issue.
@BlackBeenie , makes sense. This happens because Colab comes with pre-installed torch. If you uninstall it and reinstall it while connected to a GPU runtime, triton should be installed as well