2024-09-30 20:53:03 | INFO | model_worker | args: Namespace(host='0.0.0.0', port=40000, worker_address='http://localhost:40000', controller_address='http://localhost:20001', model_path='/home/jack/Projects/yixin-llm/merge_med_llava_3', model_base=None, model_name=None, device='cuda', multi_modal=False, limit_model_concurrency=5, stream_interval=1, no_register=False, load_8bit=False, load_4bit=False) 2024-09-30 20:53:03 | INFO | model_worker | Loading the model merge_med_llava_3 on worker 187b68 ... 2024-09-30 20:53:03 | WARNING | transformers.models.llama.tokenization_llama | You are using the legacy behaviour of the . This means that tokens that come after special tokens will not be properly handled. We recommend you to read the related pull request available at https://github.com/huggingface/transformers/pull/24565 2024-09-30 20:53:03 | ERROR | stderr | /home/jack/anaconda3/envs/llavaplus/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`. 2024-09-30 20:53:03 | ERROR | stderr | warnings.warn( 2024-09-30 20:53:03 | ERROR | stderr | Loading checkpoint shards: 0%| | 0/2 [00:00