File size: 2,708 Bytes
8187697 e580976 8187697 6bbd095 8187697 a25ded2 8187697 17eb1a4 5e5dfb5 8187697 18bd282 8187697 17eb1a4 c075a72 a445bc0 c075a72 a445bc0 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 |
---
tags:
- donut
- image-to-text
- vision
datasets:
- shreyanshu09/Block_Diagram
- shreyanshu09/BD-EnKo
language:
- en
- ko
---
# Block Diagram Global Information Extractor
It was introduced in the paper **"Unveiling the Power of Integration: Block Diagram Summarization through Local-Global Fusion"** accepted at ACL 2024. The full code is available in this [BlockNet](https://github.com/shreyanshu09/BlockNet) github repository.
## Model description
This model is trained using a transformer encoder and decoder architecture, based on the configuration specified in [Donut](https://arxiv.org/abs/2111.15664), to extract the overall summary of block diagram images. It supports both English and Korean languages. The straightforward architecture comprises a visual encoder module and a text decoder module, both based on the Transformer architecture.
## Training dataset
- 41,933 samples from the synthetic and real-world block diagrams in English language (BD-EnKo)
- 33,101 samples from the synthetic and real-world block diagrams in Korean language (BD-EnKo)
- 396 samples from real-world English block diagram dataset (CBD)
- 357 samples from handwritten English block diagram dataset (FC_A)
- 476 samples from handwritten English block diagram dataset (FC_B)
## How to use
Here is how to use this model in PyTorch:
```python
import os
from PIL import Image
import torch
from donut import DonutModel
# Load the pre-trained model
model = DonutModel.from_pretrained("shreyanshu09/block_diagram_global_information")
# Move the model to GPU if available
if torch.cuda.is_available():
model.half()
device = torch.device("cuda:0")
model.to(device)
# Function to process a single image
def process_image(image_path):
# Load and process the image
image = Image.open(image_path)
task_name = os.path.basename('/block_diagram_global_information/dataset/c2t_data/') # Create empty folder anywhere
result = model.inference(image=image, prompt=f"<s_{task_name}>")["predictions"][0]
# Extract the relevant information from the result
if 'c2t' in result:
return result['c2t']
else:
return result['text_sequence']
# Example usage
image_path = 'image.png' # Input image file
result = process_image(image_path)
```
## Contact
If you have any questions about this work, please contact **[Shreyanshu Bhushan](https://github.com/shreyanshu09)** using the following email addresses: **[email protected]**.
## License
The content of this project itself is licensed under the [Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)](https://creativecommons.org/licenses/by-nc-sa/4.0/). |