running model on a Tesla T4
#2
by
xiaoajie738
- opened
Hello, can this model run on a Tesla T4? When I run this model on a T4, I get the following error: "RuntimeError: FlashAttention only supports Ampere GPUs or newer."
Would you please try again now? I have changed this part of the code to automatically switch to the original pytorch attention if there is no flash attention.
@czczup
I tried it on a Tesla T4 gpu and it does not seem to work?
This is the main error
RuntimeError: FlashAttention only supports Ampere GPUs or newer.
Idk, but this might be useful, i got this in the error
File ~/.cache/huggingface/modules/transformers_modules/OpenGVLab/Mini-InternVL-Chat-2B-V1-5/e29cbb875c3039de7d81258cb5efaf754bf7d42c/modeling_intern_vit.py:77, in FlashAttention.forward(self, qkv, key_padding_mask, causal, cu_seqlens, max_s, need_weights)
74 max_s = seqlen
75 cu_seqlens = torch.arange(0, (batch_size + 1) * seqlen, step=seqlen, dtype=torch.int32,
76 device=qkv.device)
---> 77 output = flash_attn_unpadded_qkvpacked_func(
78 qkv, cu_seqlens, max_s, self.dropout_p if self.training else 0.0,
79 softmax_scale=self.softmax_scale, causal=causal
80 )
81 output = rearrange(output, '(b s) ... -> b s ...', b=batch_size)
82 else:
If you installed flash attention in the environment, try to uninstall it.
Thank you. I am now able to run it on T4.
Yep same, thank you!
zwgao
changed discussion status to
closed