can inference with vllm?
#21
by
amosxy
- opened
mistral_inference speed is slow when inference with h800
prefix = """def add("""
suffix = """ return sum"""
it need 4 second
Yes. Use ArthurGprog/Codestral-22B-v0.1-FIM-Fix-GPTQ
Or
dan-kwiat/Codestral-22B-v0.1-hf-FIM-fix-awq