How to get text position in TR-OCR

#5
by maifeeulasad - opened

Some coordinate information would be really helpful. This is something so basic that even the tesseract provides it.

So it would be really awesome if trocr could give us this info.

Maybe I'm missing something, can I get some references to the code then.

Thanks

This comment has been hidden

Hi @nielsr ,

I hope you are doing well. I am writing to join the question posed by maifeeulasad in the Hugging Face community regarding how to obtain the text position using TR-OCR. I would like to know if you could provide a guide or any advice on achieving this. In my case, each word in an image of a page from a book was detected with its coordinates within the image and possible word using Pytesseract. However, I intend to use these results in combination with TrOCR to, with a corpus in Spanish and English, recognize if the word detected by Pytesseract is correct and, if not, rectify it thanks to TrOCR. I intend to achieve all this with the best accuracy to enable reading with TTS.

Additionally, I am open to any clarification on whether it is possible to perform the same work I am doing but only with TrOCR fine-tuned on a dataset without labeled data, that is, in unsupervised training. Your expertise and knowledge in this field would greatly help us. I appreciate your time and response in advance.

Thank you and best regards.

Hi,

This model doesn't provide that information. For that I'd recommend taking a look at newer models such as Florence-2 and KOSMOS-2.5. Both will be integrated natively in the Transformers library.

This comment has been hidden

Sign up or log in to comment