hsarfraz
/

donut-irs-tax-docs-classifier

Inference Endpoints

Model card Files Files and versions Community

hsarfraz commited on 22 days ago

Commit

176bacd

•

1 Parent(s): 546fdfb

Update README.md

Files changed (1) hide show

README.md +57 -0

README.md CHANGED Viewed

@@ -38,5 +38,62 @@ The config.json file has assocociated label2id updated to reflect all labels tha
 For inference use image size with width: 1920 px and height: 2560 px
 [base]: https://huggingface.co/naver-clova-ix/donut-base-finetuned-rvlcdip

 For inference use image size with width: 1920 px and height: 2560 px
+## Sample Code for Document Inference
+```python
+# load dependencies
+import torch
+from transformers import DonutSwinModel, DonutSwinPreTrainedModel,DonutProcessor
+from torch import nn
+from PIL import Image
+#
+class DonutForImageClassification(DonutSwinPreTrainedModel):
+    def __init__(self, config):
+        super().__init__(config)
+        self.num_labels = config.num_labels
+        self.swin = DonutSwinModel(config)
+        self.dropout = nn.Dropout(0.5)
+        self.classifier = nn.Linear(self.swin.num_features, config.num_labels)
+    def forward(self, pixel_values: torch.Tensor) -> torch.Tensor:
+        outputs = self.swin(pixel_values)
+        pooled_output = outputs[1]
+        pooled_output = self.dropout(pooled_output)
+        logits = self.classifier(pooled_output)
+        return logits
+sModelName = 'hsarfraz/donut-irs-tax-docs-classifier'
+processor = DonutProcessor.from_pretrained(sModelName)
+model = DonutForImageClassification.from_pretrained(sModelName)
+device = 'cuda' if torch.cuda.is_available() else 'cpu'
+model.to(device)
+model.eval()
+# load test image
+sTestImagePath ='replace this with document image path' # i.e.
+# open image
+img = Image.open(sTestImagePath)
+# resize image to width 1920 and height 2560 - fine tuned model is trained with this width and height
+img_new = img.resize((1920,2560),Image.Resampling.LANCZOS)
+# perfoem inference
+predicted_label = ''
+with torch.no_grad():
+    pixel_values = processor(img_new.convert("RGB"), return_tensors="pt").pixel_values
+    print(pixel_values.shape)
+    pixel_values = pixel_values.to(device)
+    outputs = model(pixel_values)
+    logits, predicted = torch.max(outputs.data, 1)
+    pval = predicted.cpu().numpy()[0]
+    predicted_label = model.config.id2label[pval]
+print('---------------------------------- ')
+print('Document Image Classification: ',predicted_label)
+```
 [base]: https://huggingface.co/naver-clova-ix/donut-base-finetuned-rvlcdip