Awful phi3
#28 opened 16 days ago
by
JesusCrist
Getting the error: "triton.runtime.autotuner.OutOfResources: out of resource: shared memory, Required: 180224, Hardware limit: 166912. Reducing block sizes or `num_stages` may help."
1
#27 opened about 1 month ago
by
Pranav0511
Why the inference speed so slow compare with same 7B parameters of Qwen?
#26 opened about 2 months ago
by
lucasjin
Upload triton_flash_blocksparse_attn.py
#25 opened about 2 months ago
by
barcelosallan
Phi-3-small doesn't load with TGI
1
#24 opened 2 months ago
by
aveer30
Multi-GPU training fails when using device_map = "auto"
2
#23 opened 2 months ago
by
aveer30
Shared memory error
8
#15 opened 3 months ago
by
marktenenholtz
RuntimeError: FlashAttention only support fp16 and bf16 data type during fine tuning.
7
#11 opened 3 months ago
by
faizsameerahmed96
GGUF version?
4
#9 opened 3 months ago
by
shtirlic
No triton for windows
2
#4 opened 3 months ago
by
fernandomir