Inference ~3x slower in v4.45

#4
by hjdeheer - opened

I am experimenting with few shot prompting with images. Now I experienced run time being 3x slower on the new version (4.45 compared to 4.44. as well as getting the following warning now:

                logger.warning_once(
                    "Expanding inputs for image tokens in LLaVa-NeXT should be done in processing. "
                    "Please add `patch_size` and `vision_feature_select_strategy` to the model's processing config or set directly "
                    "with `processor.patch_size = {{patch_size}}` and processor.vision_feature_select_strategy = {{vision_feature_select_strategy}}`. "
                    "Using processors without these attributes in the config is deprecated and will throw an error in v4.47."
                )

My code for batched inference:

for img_batch,prompt_batch in tqdm(zip(image_batches, prompts_batches)):
    # Prepare inputs for the current batch
    inputs = processor(images=img_batch, text=prompt_batch, padding=True, return_tensors="pt").to("cuda:0")
    
    # Generate outputs for the batch
    output = model.generate(**inputs, max_new_tokens=250)
    
    # Collect the outputs
    all_outputs.append(output)

Any changes I need to make, possible in the model / processor settings to work properly with version 4.45? Thanks!

Llava Hugging Face org

Hmm, I'll check that soon. Afaik we haven't done drastic changes in code, so might be as well related to general generation loop

It seems that the original speed is back adding the patch_size and vision_feature_select_strategy to the processor:
processor = LlavaNextProcessor.from_pretrained(model_name, patch_size=16, vision_feature_select_strategy="full")

Llava Hugging Face org

Indeed I found that the image bacbone was run for each forward pass in the legacy_path. The new logic as adding patch_size to processors was not enabled in official (current) HF repo because we have one more PR to make solve an edge case. But in general the feature is ready to be used and setting the args yourselves is a workaround yeah

So I'll see how long is left to make the new logic default and try to merge it asap, as it is causing latency issues. Also, I'll solve the latency issue depending on how long we'll support the legacy-path, thanks a lot for reporting this!

Llava Hugging Face org

I solved this problem by adding 2 lines when in llava-1.5-7b-hf initialization:

self.processor.patch_size = self.model.config.vision_config.patch_size

self.processor.vision_feature_select_strategy = self.model.config.vision_feature_select_strategy

The code above means that I point out the patch_size and vision_feature_select_strategy manually using the same values from model.config.

Sign up or log in to comment