DiT for object detection
Could you please show a demo or an example of how to use this model on object detection tasks? I need this model for this task on my own custom dataset but the code in their repository throws an error. And I hope at least it can be used in hugging face library
Can you share the error?
Provide more information
DiT model card (https://huggingface.co/docs/transformers/model_doc/dit) refers to 3 use cases: image classification, layout analysis and table detection.
However, the resources section in the model card contains only a notebook with an image classification working example.
In addition, the only code snippet from dit-base
model card (captioned below), returns only the logits, other than demonstrating the complete pipeline for each of the use cases.
Getting a working example of each use case complete pipeline will be very helpful.
Thank you in advance.
import torch
from PIL import Image
image = Image.open('path_to_your_document_image').convert('RGB')
processor = BeitImageProcessor.from_pretrained("microsoft/dit-base")
model = BeitForMaskedImageModeling.from_pretrained("microsoft/dit-base")
num_patches = (model.config.image_size // model.config.patch_size) ** 2
pixel_values = processor(images=image, return_tensors="pt").pixel_values
# create random boolean mask of shape (batch_size, num_patches)
bool_masked_pos = torch.randint(low=0, high=2, size=(1, num_patches)).bool()
outputs = model(pixel_values, bool_masked_pos=bool_masked_pos)
loss, logits = outputs.loss, outputs.logits```