How to sft this model?(prefix-LM attention-mask related)

#8
by Coobiw - opened

Hi, thanks for your great work! I've find that self._merge_input_ids_with_image_features will generate a 4D full-attention-mask(if there's no padded). When I sft this model, for the text-input(instruction), attention_mask should be fully. As for the text-output(response), attention_mask should be causal. In PaliGemma's forward func, this is not supported. If I input 4D fully-mask(for image&instruction input) + casual-mask(for response output), self._merge_input_ids_with_image_features will not work(because it needs 2D mask).

I've find that my transformers == 4.31. After I update it to 4.32. I find a suffix input for processer. This solves my problem. Thanks!

Sign up or log in to comment