Does llava supports multi-gpu inference?

by ZealLin

Hi, thanks for your contribution to open-souce LLM community. I recently try llava-v1.6-mistral-7b using newest transformers(4.39), everything go right if I load model to single gpu, a.k.a device_map={'':0}, but if I load the model dispatch to multi-gpu, a.k.a device_map='auto', the inference will raise cuda errors. It may looks like this issue:

I want to make sure that it's this model support load to multi gpu?

Feel free to open an issue on the Transformers library regarding the use of device_map="auto"

@ZealLin I also need multi-gpu inference, are you planning to open an issue, and if not, can I open one?

@seyongh I have not dive into this issue, so, open an issue on Transformers repo if you like.

