Can Blip2ForImageTextRetrieval be trained with Trainer?

by wang-sy - opened 8 days ago

8 days ago

In the definition of Blip2ImageTextMatchingModelOutput, loss was defined as

Args:
        loss (`torch.FloatTensor` of shape `(1,)`, *optional*, returned when `return_loss` is `True`):
            Contrastive loss for image-text similarity.

However the calculation of loss was not done in the forward loop of Blip2ForImageTextRetrieval, am I missing out on this calculation, where is loss calculated?

Is the training of Blip2ForImageTextRetrievalsupported by the Trainer?

Thank you for the great work!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment