Example finetuning?
Hi,
Could you provide an example on how to fine-tune this model?
Is it enough to have the correct text for fine-tuning, or do i need Image + text?
I am trying to transcribe 1700th century court records from northern Sweden/Finland.
The hardest part for the model seems to be all the different names.
Do you have any pre trained model that only has been trained on 1600-1800?
best regards,
Patrik
Hi!
Yes, here are two links you could follow on how to could train a such model:
- Huggingface: https://github.com/NielsRogge/Transformers-Tutorials/blob/master/TrOCR/Fine_tune_TrOCR_on_IAM_Handwriting_Database_using_Seq2SeqTrainer.ipynb
- MMOCR: https://colab.research.google.com/github/open-mmlab/mmocr/blob/dev-1.x/demo/tutorial.ipynb#scrollTo=2AZqwCt09XqR
You will need image+text paris:
Here is an example:
So if you are going to use the SATRN architecture from MMOCR (which is very good out of domain) you can use our model as our base.
- Here is the config file: https://huggingface.co/Riksarkivet/satrn_htr/blob/main/config.py
- And here is the pretrained weights: https://huggingface.co/Riksarkivet/satrn_htr/blob/main/model.pth
- And if you want to train it from scratch use this: https://github.com/open-mmlab/mmocr/tree/main/configs/textrecog/satrn
You need to modify the config file to our or the plain one from mmocr:
Altough, you dont need to change any of the hyperparameters if you start with our config file. But perhaps the learning rate can be adjusted if you are only going to fine tune on a smaller dataset.
TrOCR is much more simplier to train just follow the link i sent :)
Thanks, great!
I will try this. But it will probably take some time to get a good amount of image+text pairs.