pix2text-mfr / README.md
breezedeus's picture
Update README.md
77666a0 verified
|
raw
history blame
7.68 kB
metadata
tags:
  - latex-ocr
  - math-ocr
  - math-formula-recognition
  - mfr
  - pix2text
  - image-to-text
license: mit
library_name: transformers

Model Card: Pix2Text-MFR

Mathematical Formula Recognition (MFR) model from Pix2Text (P2T).

Model Details / 模型细节

This MFR model utilizes the TrOCR architecture developed by Microsoft, starting with its initial values and retrained using a dataset of mathematical formula images. The resulting MFR model can be used to convert images of mathematical formulas into LaTeX text representation.

此 MFR 模型使用了微软的 TrOCR 架构,以其为初始值并利用数学公式图片数据集进行了重新训练。 获得的 MFR 模型可用于把数学公式图片转换为 LaTeX 文本表示。

Usage and Limitations / 使用和限制

  • Purpose: This model is a mathematical formula recognition model, capable of converting input images of mathematical formulas into LaTeX text representation.

  • Limitation: Since the model is trained on images of mathematical formulas, it may not work when recognizing other types of images.

  • 用途:此模型为数学公式识别模型,它可以把输入的数学公式图片转换为 LaTeX 文本表示。

  • 限制:由于模型是在数学公式图片数据上训练的,它在识别其他类型的图片时可能无法工作。

Documents / 文档

Model Use / 模型使用

Method 1: Using the model Directly

This method doesn't need to install pix2text, but can only recognize pure formula images.

这种方法无需安装 pix2text,但只能识别纯公式图片。

#! pip install pillow transformers optimum
from PIL import Image
from transformers import TrOCRProcessor
from optimum.onnxruntime import ORTModelForVision2Seq

processor = TrOCRProcessor.from_pretrained('breezedeus/pix2text-mfr')
model = ORTModelForVision2Seq.from_pretrained('breezedeus/pix2text-mfr', use_cache=False)

image_fps = [
    'examples/example.jpg',
    'examples/42.png',
    'examples/0000186.png',
]
images = [Image.open(fp).convert('RGB') for fp in image_fps]
pixel_values = processor(images=images, return_tensors="pt").pixel_values
generated_ids = model.generate(pixel_values)
generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)
print(f'generated_ids: {generated_ids}, \ngenerated text: {generated_text}')

Method 2: Using Pix2Text

This method requires the installation of pix2text, utilizing the Mathematical Formula Detection model (MFD) within Pix2Text. It is capable of recognizing not only pure formula images but also mixed images containing text.

这种方法需要安装 pix2text,借助 Pix2Text 中的数学公式检测模型(MFD),它不仅可以识别纯公式图片,还可以识别包含文本的混合图片。

$ pip install pix2text
#! pip install pix2text

from pix2text import Pix2Text, merge_line_texts

image_fps = [
    'examples/example.jpg',
    'examples/42.png',
    'examples/0000186.png',
]
p2t = Pix2Text()
outs = p2t.recognize_formula(image_fps)  # recognize pure formula images

outs2 = p2t.recognize('examples/mixed.jpg')  # recognize mixed images
print(merge_line_texts(outs2))

Performance / 性能

The original images for the test data are derived from real data uploaded by users on the Pix2Text Online Service. Initially, real user data from a specific period is selected, and then the Mathematical Formula Detection model (MFD) within Pix2Text is used to detect the mathematical formulas in these images and crop the corresponding parts. A subset of these formula images is then randomly chosen for manual annotation to create the test dataset. The following image shows some sample pictures from the test dataset. It is evident that the images in the test dataset are quite diverse, including mathematical formulas of various lengths and complexities, from single letters to formula groups and even matrices. This test dataset includes 485 images.

测试数据对应的原始图片来源于 Pix2Text 网页版 用户上传的真实数据。首先选取一段时间内用户的真实数据,然后利用 Pix2Text 中数学公式检测模型(MFD)检测出这些图片中的数学公式并截取出对应的部分,再从中随机选取部分公式图片进行人工标注。就获得了用于测试的测试数据集了。下图是测试数据集中的部分样例图片。从中可以看出测试数据集中的图片比较多样,包括了各种不同长度和复杂度的数学公式,有单个字母的图片,也有公式组甚至矩阵图片。本测试数据集包括了 485 张图片。

Examples from test data

Below are the Character Error Rates (CER, the lower, the better) of various models on this test dataset. For the true annotated results, as well as the output of each model, normalization was first performed to ensure that irrelevant factors such as spaces do not affect the test outcomes. For the recognition results of Texify, the leading and trailing symbols $ or $$ of the formula are removed first.

以下是各个模型在此测试数据集上的 CER(字错误率,越小越好)。其中对真实标注结果,以及每个模型的输出都首先进行了标准化,以保证不会因为空格等无关因素影响测试结果。对 Texify 的识别结果会首先去掉公式的首尾符号$或$$。

CER Comparison Among Different MFR Models

As can be seen from the figure above, the Pix2Text V1.0 MFR open-source free version model has significantly outperformed the previous versions of the paid model. Moreover, compared to the V1.0 MFR open-source free model, the precision of the Pix2Text V1.0 MFR paid model has been further improved.

由上图可见,Pix2Text V1.0 MFR 开源免费版模型已经大大优于之前版本的付费模型。而相比 V1.0 MFR 开源免费模型,Pix2Text V1.0 MFR 付费模型精度得到了进一步的提升。

Texify is more suited for recognizing images with standard formatting. It performs poorly in recognizing images containing single letters. This is the main reason why Texify's performance on this test dataset is inferior to that of Latex-OCR.

Texify 更适用于识别标准排版的图片,它对包含单字母的图片识别较差。这也是 Texify 在此测试数据集上效果比 Latex-OCR 还差的主要原因。

Feedback / 反馈

Where to send questions or comments about the model.

Welcome to contact the author Breezedeus.

欢迎联系作者 Breezedeus