Can Blip2ForImageTextRetrieval be trained with Trainer?

#5
by wang-sy - opened

In the definition of Blip2ImageTextMatchingModelOutput, loss was defined as

Args:
        loss (`torch.FloatTensor` of shape `(1,)`, *optional*, returned when `return_loss` is `True`):
            Contrastive loss for image-text similarity.

However the calculation of loss was not done in the forward loop of Blip2ForImageTextRetrieval, am I missing out on this calculation, where is loss calculated?

Is the training of Blip2ForImageTextRetrievalsupported by the Trainer?

Thank you for the great work!

Sign up or log in to comment