wjbmattingly's picture
Update README.md
e15c028 verified
metadata
license: mit
language:
  - la
  - fr
  - esp
datasets:
  - CATMuS/medieval
tags:
  - trocr
  - image-to-text
widget:
  - src: >-
      https://huggingface.co/medieval-data/trocr-medieval-print/resolve/main/images/print-1.png
    example_title: Print 1
  - src: >-
      https://huggingface.co/medieval-data/trocr-medieval-print/resolve/main/images/print-2.png
    example_title: Print 2
  - src: >-
      https://huggingface.co/medieval-data/trocr-medieval-print/resolve/main/images/print-3.png
    example_title: Print 3
model-index:
  - name: trc-medieval-print
    results:
      - task:
          name: HTR
          type: image-to-text
        metrics:
          - name: CER
            type: CER
            value: 0.05

logo

About

CER: 0.05

This is a TrOCR model for medieval Print. The base model was microsoft/trocr-base-handwritten. The model was then finetuned to Caroline: medieval-data/trocr-medieval-latin-caroline. From a saved checkpoint, the model was further finetuned to Print.

The dataset used for training was CATMuS.

The model has not been formally tested. Preliminary examination indicates that further finetuning is needed.

Finetuning was done with finetune.py found in this repository.

Usage

from transformers import TrOCRProcessor, VisionEncoderDecoderModel
from PIL import Image
import requests

# load image from the IAM database
url = 'https://huggingface.co/medieval-data/trocr-medieval-print/resolve/main/images/print-1.png'
image = Image.open(requests.get(url, stream=True).raw).convert("RGB")

processor = TrOCRProcessor.from_pretrained('medieval-data/trocr-medieval-print')
model = VisionEncoderDecoderModel.from_pretrained('medieval-data/trocr-medieval-print')
pixel_values = processor(images=image, return_tensors="pt").pixel_values

generated_ids = model.generate(pixel_values)
generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]

BibTeX entry and citation info

TrOCR Paper

@misc{li2021trocr,
      title={TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models}, 
      author={Minghao Li and Tengchao Lv and Lei Cui and Yijuan Lu and Dinei Florencio and Cha Zhang and Zhoujun Li and Furu Wei},
      year={2021},
      eprint={2109.10282},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

CATMuS Paper

@unpublished{clerice:hal-04453952,
  TITLE = {{CATMuS Medieval: A multilingual large-scale cross-century dataset in Latin script for handwritten text recognition and beyond}},
  AUTHOR = {Cl{\'e}rice, Thibault and Pinche, Ariane and Vlachou-Efstathiou, Malamatenia and Chagu{\'e}, Alix and Camps, Jean-Baptiste and Gille-Levenson, Matthias and Brisville-Fertin, Olivier and Fischer, Franz and Gervers, Michaels and Boutreux, Agn{\`e}s and Manton, Avery and Gabay, Simon and O'Connor, Patricia and Haverals, Wouter and Kestemont, Mike and Vandyck, Caroline and Kiessling, Benjamin},
  URL = {https://inria.hal.science/hal-04453952},
  NOTE = {working paper or preprint},
  YEAR = {2024},
  MONTH = Feb,
  KEYWORDS = {Historical sources ; medieval manuscripts ; Latin scripts ; benchmarking dataset ; multilingual ; handwritten text recognition},
  PDF = {https://inria.hal.science/hal-04453952/file/ICDAR24___CATMUS_Medieval-1.pdf},
  HAL_ID = {hal-04453952},
  HAL_VERSION = {v1},
}