Inference ~3x slower in v4.45

by hjdeheer - opened 25 days ago

25 days ago

I am experimenting with few shot prompting with images. Now I experienced run time being 3x slower on the new version (4.45 compared to 4.44. as well as getting the following warning now:

                logger.warning_once(
                    "Expanding inputs for image tokens in LLaVa-NeXT should be done in processing. "
                    "Please add `patch_size` and `vision_feature_select_strategy` to the model's processing config or set directly "
                    "with `processor.patch_size = {{patch_size}}` and processor.vision_feature_select_strategy = {{vision_feature_select_strategy}}`. "
                    "Using processors without these attributes in the config is deprecated and will throw an error in v4.47."
                )

My code for batched inference:

for img_batch,prompt_batch in tqdm(zip(image_batches, prompts_batches)):
    # Prepare inputs for the current batch
    inputs = processor(images=img_batch, text=prompt_batch, padding=True, return_tensors="pt").to("cuda:0")
    
    # Generate outputs for the batch
    output = model.generate(**inputs, max_new_tokens=250)
    
    # Collect the outputs
    all_outputs.append(output)

Any changes I need to make, possible in the model / processor settings to work properly with version 4.45? Thanks!

RaushanTurganbay

Llava Hugging Face org 24 days ago

Hmm, I'll check that soon. Afaik we haven't done drastic changes in code, so might be as well related to general generation loop

hjdeheer

23 days ago

It seems that the original speed is back adding the patch_size and vision_feature_select_strategy to the processor:
processor = LlavaNextProcessor.from_pretrained(model_name, patch_size=16, vision_feature_select_strategy="full")

RaushanTurganbay

Llava Hugging Face org 23 days ago

Indeed I found that the image bacbone was run for each forward pass in the legacy_path. The new logic as adding patch_size to processors was not enabled in official (current) HF repo because we have one more PR to make solve an edge case. But in general the feature is ready to be used and setting the args yourselves is a workaround yeah

So I'll see how long is left to make the new logic default and try to merge it asap, as it is causing latency issues. Also, I'll solve the latency issue depending on how long we'll support the legacy-path, thanks a lot for reporting this!

RaushanTurganbay

Llava Hugging Face org 20 days ago

PR will be in https://github.com/huggingface/transformers/pull/34460 for fixing

AsteriaCao

12 days ago

I solved this problem by adding 2 lines when in llava-1.5-7b-hf initialization:

self.processor.patch_size = self.model.config.vision_config.patch_size

self.processor.vision_feature_select_strategy = self.model.config.vision_feature_select_strategy

The code above means that I point out the patch_size and vision_feature_select_strategy manually using the same values from model.config.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment