Open models
Collection
Finetuned open source models.
•
3 items
•
Updated
This repository contains the YOLOv8X model trained on the entire DocLayNet dataset, comprising ~41GB of annotated document layout images. The training was conducted utilizing a single A100 GPU with 80GB of memory. The batch size was set to 42, and images were resized to 1024x1024 pixels while retaining the default hyperparameters for image augmentation.
The model was trained on all the class labels available in the DocLayNet dataset, which include the following classes:
The performance of the trained model was evaluated on the validation set, yielding the following metrics:
Class | Images | Instances | Box(P) | Box(R) | mAP50 | mAP |
---|---|---|---|---|---|---|
all | 6476 | 98604 | 0.905 | 0.866 | 0.925 | 0.759 |
Caption | 6476 | 1763 | 0.921 | 0.868 | 0.949 | 0.878 |
Footnote | 6476 | 312 | 0.888 | 0.779 | 0.839 | 0.637 |
Formula | 6476 | 1894 | 0.893 | 0.839 | 0.914 | 0.748 |
List-item | 6476 | 13320 | 0.905 | 0.915 | 0.94 | 0.807 |
Page-footer | 6476 | 5571 | 0.94 | 0.941 | 0.974 | 0.651 |
Page-header | 6476 | 6683 | 0.952 | 0.862 | 0.957 | 0.702 |
Picture | 6476 | 1565 | 0.834 | 0.827 | 0.88 | 0.81 |
Section-header | 6476 | 15744 | 0.919 | 0.902 | 0.962 | 0.635 |
Table | 6476 | 2269 | 0.87 | 0.873 | 0.92 | 0.865 |
Text | 6476 | 49185 | 0.937 | 0.923 | 0.967 | 0.833 |
Title | 6476 | 298 | 0.898 | 0.792 | 0.873 | 0.779 |
These results demonstrate the model's capability in detecting various elements of document layouts with high precision and recall.
from ultralytics import YOLO
from PIL import Image
onnx_model = YOLO("best.onnx")
results = onnx_model("<path_to_image>", imgsz=1024)
for i, r in enumerate(results):
im_bgr = r.plot()
im_rgb = Image.fromarray(im_bgr[..., ::-1])
r.show()
r.save(filename=f'results{i}.jpg')