fastest inference
#2
by
ehartford
- opened
Hi I would like advice about the fastest way to do inference with this?
I wanna run this on 5 million samples, it seems it will take several months, unless i find a faster way.
Hi @ehartford ,
I have found Deepspeed inference to be quite good for inferencing this model which allows you to use tensor parallelism.
Here are some links to get started:
https://deepspeed.readthedocs.io/en/latest/inference-init.html
https://www.deepspeed.ai/tutorials/inference-tutorial/
Also note that it is a bit faster to have the tokenizer pad to 'longest' rather than 'max_length'.
Hope this helps!