No triton for windows

#4
by fernandomir - opened

Any workaround? Anyways, hwo much VRAM to run the microsoft/Phi-3-small-8k-instruct model?
Thanks!

Has a solution been found yet?

Avoid using this model, it's extremly slow

Almost the slowest 7B model I have ever seen.

Am tested on same A100, compare with Qwen2-7B, no quante, just pure compare with raw transformers.

It's just extrem slow...

Sign up or log in to comment