image processing is different from the github version

#18
by enhaoz - opened

Hi team,
Thanks for the great work! While I am experimenting with llava-hf/llava-1.5-7b-hf and the github version (liuhaotian/llava-v1.5-7b), I realize the image processing stage is different, thus giving different generation results.
With llava-hf/llava-1.5-7b-hf, images seem to be cropped to square size without padding.
test1.png
While in the github repo, images are cropped to square size after padding, due to the field {"image_aspect_ratio": "pad"} in model.config.
test2.png
Am I missing something?

Llava Hugging Face org

cc @nielsr pretty sure we can control that with the image processor no?

Same question, I think there's a difference in image processor.

image_processor is more like the original llava solution, llavanextimageprocessor uses pad

Sign up or log in to comment