flash_attn package makes it non-portable

by bghira - opened Sep 27

Discussion

bghira

Sep 27

only runs on NVIDIA systems. not Apple, or AMD.

zoldaten

Oct 5

try to avoid :

model = AutoModelForCausalLM.from_pretrained(
EMU_HUB,
device_map="cuda:0",
torch_dtype=torch.bfloat16,
#attn_implementation="flash_attention_2",
trust_remote_code=True,
)

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment