Could you share scripts for fast inference?
#3
by
chujiezheng
- opened
Thanks for your great work! I am trying to run this 34B RM but find it very slow when loaded by transformers (device_map='auto') and processing long texts (2048). Could you share scripts that enable fast inference, such as using tensor parallel?