---
license: mit
base_model: naver-clova-ix/donut-base-finetuned-rvlcdip
library_name: transformers
tags: ['donut','classification','irs','tax','document AI']
---

# Donut - model fine-tuned for US IRS tax documents classification
This donut model has been fine-tuned for IRS (US) tax document classification. It can classify up to 28 different types of IRS documents, targeting common set of documents used for tax returns. 


1. 1040 U.S. Individual Income Tax Return
2. 1040-NR U.S. Nonresident Alien Income Tax Return
3. 1040-NR SCHEDULE OI Other Information
4. 1040 SCHEDULE 1 Additional Income and Adjustments to Income
5. 1040 SCHEDULE 2 Additional Taxes
6. 1040 SCHEDULE 3 Additional Credits and Payments
7. 1040 SCHEDULE 8812 Credits for Qualifying Children and Other Dependents
8. 1040 SCHEDULE A Itemized Deductions
9. 1040 SCHEDULE B Interest and Ordinary Dividends
10. 1040 SCHEDULE C Profit or Loss From Business  
11. 1040 SCHEDULE D Capital Gains and Losses 
12. 1040 SCHEDULE E Supplemental Income and Loss
13. 1040 SCHEDULE SE Self-Employment Tax
14. Form 1125-A Cost of Goods Sold
15. Form 8949 Sales and Other Dispositions of Capital Assets
16. Form 8959 Additional Medicare Tax
17. Form 8960 Net Investment Income Tax — Individuals, Estates, and Trusts
18. Form 8995 Qualified Business Income Deduction Simplified Computation
19. Form 8995-A SCHEDULE A Specified Service Trades or Businesses
20. Form W-2 Wage and Tax Statement


## Model Details & Description
The base model is ['naver-clova-ix/donut-base-finetuned-rvlcdip'][base], the model is finetuned using training data set of over 3000+ documents. 
The config.json file has assocociated label2id updated to reflect all labels that can be classified via the model.   

For inference use image size with width: 1920 px and height: 2560 px  

## Sample Code for Document Inference
```python
# load dependencies
import torch
from transformers import DonutSwinModel, DonutSwinPreTrainedModel,DonutProcessor
from torch import nn
from PIL import Image

# 
class DonutForImageClassification(DonutSwinPreTrainedModel):
    def __init__(self, config):
        super().__init__(config)
        self.num_labels = config.num_labels
        self.swin = DonutSwinModel(config)
        self.dropout = nn.Dropout(0.5)
        self.classifier = nn.Linear(self.swin.num_features, config.num_labels)

    def forward(self, pixel_values: torch.Tensor) -> torch.Tensor:
        outputs = self.swin(pixel_values)
        pooled_output = outputs[1]
        pooled_output = self.dropout(pooled_output)
        logits = self.classifier(pooled_output)
        return logits

sModelName = 'hsarfraz/donut-irs-tax-docs-classifier'
processor = DonutProcessor.from_pretrained(sModelName)
model = DonutForImageClassification.from_pretrained(sModelName)

device = 'cuda' if torch.cuda.is_available() else 'cpu'
model.to(device)

model.eval()

# load test image
sTestImagePath ='replace this with document image path' # i.e. 
# open image
img = Image.open(sTestImagePath)
# resize image to width 1920 and height 2560 - fine tuned model is trained with this width and height 
img_new = img.resize((1920,2560),Image.Resampling.LANCZOS)

# perfoem inference
predicted_label = ''
with torch.no_grad():
    pixel_values = processor(img_new.convert("RGB"), return_tensors="pt").pixel_values
    print(pixel_values.shape)
    pixel_values = pixel_values.to(device)
    outputs = model(pixel_values)
    logits, predicted = torch.max(outputs.data, 1)
    pval = predicted.cpu().numpy()[0]
    predicted_label = model.config.id2label[pval]

print('---------------------------------- ')
print('Document Image Classification: ',predicted_label)


```


[base]: https://huggingface.co/naver-clova-ix/donut-base-finetuned-rvlcdip