Inference ~3x slower in v4.45
I am experimenting with few shot prompting with images. Now I experienced run time being 3x slower on the new version (4.45 compared to 4.44. as well as getting the following warning now:
logger.warning_once(
"Expanding inputs for image tokens in LLaVa-NeXT should be done in processing. "
"Please add `patch_size` and `vision_feature_select_strategy` to the model's processing config or set directly "
"with `processor.patch_size = {{patch_size}}` and processor.vision_feature_select_strategy = {{vision_feature_select_strategy}}`. "
"Using processors without these attributes in the config is deprecated and will throw an error in v4.47."
)
My code for batched inference:
for img_batch,prompt_batch in tqdm(zip(image_batches, prompts_batches)):
# Prepare inputs for the current batch
inputs = processor(images=img_batch, text=prompt_batch, padding=True, return_tensors="pt").to("cuda:0")
# Generate outputs for the batch
output = model.generate(**inputs, max_new_tokens=250)
# Collect the outputs
all_outputs.append(output)
Any changes I need to make, possible in the model / processor settings to work properly with version 4.45? Thanks!
Hmm, I'll check that soon. Afaik we haven't done drastic changes in code, so might be as well related to general generation loop
It seems that the original speed is back adding the patch_size
and vision_feature_select_strategy
to the processor:processor = LlavaNextProcessor.from_pretrained(model_name, patch_size=16, vision_feature_select_strategy="full")
Indeed I found that the image bacbone was run for each forward pass in the legacy_path
. The new logic as adding patch_size
to processors was not enabled in official (current) HF repo because we have one more PR to make solve an edge case. But in general the feature is ready to be used and setting the args yourselves is a workaround yeah
So I'll see how long is left to make the new logic default and try to merge it asap, as it is causing latency issues. Also, I'll solve the latency issue depending on how long we'll support the legacy-path, thanks a lot for reporting this!
PR will be in https://github.com/huggingface/transformers/pull/34460 for fixing
I solved this problem by adding 2 lines when in llava-1.5-7b-hf initialization:
self.processor.patch_size = self.model.config.vision_config.patch_size
self.processor.vision_feature_select_strategy = self.model.config.vision_feature_select_strategy
The code above means that I point out the patch_size and vision_feature_select_strategy manually using the same values from model.config.