How to sft this model?(prefix-LM attention-mask related)
#8
by
Coobiw
- opened
Hi, thanks for your great work! I've find that self._merge_input_ids_with_image_features
will generate a 4D full-attention-mask(if there's no padded). When I sft this model, for the text-input(instruction), attention_mask
should be fully. As for the text-output(response), attention_mask
should be causal. In PaliGemma's forward func, this is not supported. If I input 4D fully-mask(for image&instruction input) + casual-mask(for response output), self._merge_input_ids_with_image_features
will not work(because it needs 2D mask).
I've find that my transformers == 4.31. After I update it to 4.32. I find a suffix
input for processer. This solves my problem. Thanks!