Transformers
Safetensors
English
llama
text-generation-inference
Inference Endpoints

Could you share scripts for fast inference?

#3
by chujiezheng - opened

Thanks for your great work! I am trying to run this 34B RM but find it very slow when loaded by transformers (device_map='auto') and processing long texts (2048). Could you share scripts that enable fast inference, such as using tensor parallel?

Sign up or log in to comment