breezedeus commited on
Commit
77666a0
1 Parent(s): 7a43874

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +67 -30
README.md CHANGED
@@ -11,80 +11,117 @@ library_name: transformers
11
  ---
12
 
13
  # Model Card: Pix2Text-MFR
14
- Math Formula Recognition (MFR) model from [Pix2Text (P2T)]().
15
 
16
  ## Model Details / 模型细节
17
 
18
- This model is fine-tuned on a coin dataset using **contrastive learning** techniques, based on OpenAI's CLIP (ViT-B/32). It aims to enhance the feature extraction capabilities for **Coin** images, thus achieving more accurate image-based search functionalities. The model combines the powerful features of the Vision Transformer (ViT) with the multimodal learning capabilities of CLIP, specifically optimized for coin imagery.
 
19
 
20
- 这个模型是在 OpenAI 的 CLIP (ViT-B/32) 基础上,利用对比学习技术并使用硬币数据集进行微调得到的。它旨在提高硬币图像的特征提取能力,从而实现更准确的以图搜图功能。该模型结合了视觉变换器(ViT)的强大功能和 CLIP 的多模态学习能力,专门针对硬币图像进行了优化。
21
 
 
 
22
 
23
 
24
- ## Usage and Limitations / 使用和限制
25
-
26
- - **Usage**: This model is primarily used for extracting representation vectors from coin images, enabling efficient and precise image-based searches in a coin image database.
27
- - **Limitations**: As the model is trained specifically on coin images, it may not perform well on non-coin images.
28
 
 
29
 
 
 
30
 
31
 
32
- - **用途**:此模型主要用于提取硬币图片的表示向量,以实现在硬币图像库中进行高效、精确的以图搜图。
33
- - **限制**:由于模型是针对硬币图像进行训练的,因此在处理非硬币图像时可能效果不佳。
34
 
35
 
36
 
37
  ## Documents / 文档
38
 
39
- - Base Model: [openai/clip-vit-base-patch32](https://huggingface.co/openai/clip-vit-base-patch32)
 
 
40
 
41
 
42
 
43
  ## Model Use / 模型使用
44
 
45
- ```python3
46
- from PIL import Image
47
- import requests
48
 
49
- from transformers import CLIPProcessor, CLIPModel
50
 
51
- model = CLIPModel.from_pretrained("breezedeus/coin-clip-vit-base-patch32")
52
- processor = CLIPProcessor.from_pretrained("breezedeus/coin-clip-vit-base-patch32")
53
 
54
- image_fp = "path/to/coin_image.jpg"
55
- image = Image.open(image_fp).convert("RGB")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
56
 
57
- inputs = processor(images=image, return_tensors="pt")
58
- img_features = model.get_image_features(**inputs)
59
- img_features = F.normalize(img_features, dim=1)
60
  ```
61
 
 
62
 
 
63
 
64
- ## Training Data / ���练数据
65
 
66
- The model was trained on a specialized coin image dataset. This dataset includes images of various currencies' coins.
 
 
 
 
 
67
 
 
68
 
 
 
 
 
 
 
 
69
 
70
- 本模型使用的是专门的硬币图像数据集进行训练。这个数据集包含了多种货币的硬币图片。
 
 
71
 
72
- ## Training Process / 训练过程
73
 
74
- The model was fine-tuned on the OpenAI CLIP (ViT-B/32) pretrained model using a coin image dataset. The training process involved Contrastive Learning fine-tuning techniques and parameter settings.
75
 
 
76
 
 
77
 
78
- 模型是在 OpenAI CLIP (ViT-B/32) 预训练模型的基础上,使用硬币图像数据集进行微调。训练过程采用了对比学习的微调技巧和参数设置。
79
 
80
- ## Performance / 性能
81
 
82
- This model demonstrates excellent performance in coin image retrieval tasks.
83
 
 
84
 
 
85
 
86
- 该模型在硬币图像检索任务上展现了优异的性能。
87
 
 
 
 
88
 
89
 
90
  ## Feedback / 反馈
 
11
  ---
12
 
13
  # Model Card: Pix2Text-MFR
14
+ Mathematical Formula Recognition (MFR) model from [Pix2Text (P2T)](https://github.com/breezedeus/Pix2Text).
15
 
16
  ## Model Details / 模型细节
17
 
18
+ This MFR model utilizes the [TrOCR](https://huggingface.co/docs/transformers/model_doc/trocr) architecture developed by Microsoft, starting with its initial values and retrained using a dataset of mathematical formula images.
19
+ The resulting MFR model can be used to convert images of mathematical formulas into LaTeX text representation.
20
 
 
21
 
22
+ 此 MFR 模型使用了微软的 [TrOCR](https://huggingface.co/docs/transformers/model_doc/trocr) 架构,以其为初始值并利用数学公式图片数据集进行了重新训练。
23
+ 获得的 MFR 模型可用于把数学公式图片转换为 LaTeX 文本表示。
24
 
25
 
 
 
 
 
26
 
27
+ ## Usage and Limitations / 使用和限制
28
 
29
+ - **Purpose**: This model is a mathematical formula recognition model, capable of converting input images of mathematical formulas into LaTeX text representation.
30
+ - **Limitation**: Since the model is trained on images of mathematical formulas, it may not work when recognizing other types of images.
31
 
32
 
33
+ - **用途**:此模型为数学公式识别模型,它可以把输入的数学公式图片转换为 LaTeX 文本表示。
34
+ - **限制**:由于模型是在数学公式图片数据上训练的,它在识别其他类型的图片时可能无法工作。
35
 
36
 
37
 
38
  ## Documents / 文档
39
 
40
+ - Pix2Text (P2T) Github: [breezedeus/pix2text](https://github.com/breezedeus/Pix2Text) ;
41
+ - Pix2Text Online Free Service: [p2t.breezedeus.com](https://p2t.breezedeus.com/) ;
42
+ - Pix2Text More: [breezedeus.com/pix2text](https://breezedeus.com/pix2text) ;
43
 
44
 
45
 
46
  ## Model Use / 模型使用
47
 
48
+ ### Method 1: Using the model Directly
 
 
49
 
50
+ This method doesn't need to install pix2text, but can only recognize pure formula images.
51
 
52
+ 这种方法无需安装 pix2text,但只能识别纯公式图片。
 
53
 
54
+ ```python3
55
+ #! pip install pillow transformers optimum
56
+ from PIL import Image
57
+ from transformers import TrOCRProcessor
58
+ from optimum.onnxruntime import ORTModelForVision2Seq
59
+
60
+ processor = TrOCRProcessor.from_pretrained('breezedeus/pix2text-mfr')
61
+ model = ORTModelForVision2Seq.from_pretrained('breezedeus/pix2text-mfr', use_cache=False)
62
+
63
+ image_fps = [
64
+ 'examples/example.jpg',
65
+ 'examples/42.png',
66
+ 'examples/0000186.png',
67
+ ]
68
+ images = [Image.open(fp).convert('RGB') for fp in image_fps]
69
+ pixel_values = processor(images=images, return_tensors="pt").pixel_values
70
+ generated_ids = model.generate(pixel_values)
71
+ generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)
72
+ print(f'generated_ids: {generated_ids}, \ngenerated text: {generated_text}')
73
 
 
 
 
74
  ```
75
 
76
+ ### Method 2: Using Pix2Text
77
 
78
+ This method requires the installation of pix2text, utilizing the Mathematical Formula Detection model (MFD) within Pix2Text. It is capable of recognizing not only pure formula images but also mixed images containing text.
79
 
80
+ 这种方法需要安装 pix2text,借助 Pix2Text 中的数学公式检测模型(MFD),它不仅可以识别纯公式图片,还可以识别包含文本的混合图片。
81
 
82
+ ```bash
83
+ $ pip install pix2text
84
+ ```
85
+
86
+ ```python3
87
+ #! pip install pix2text
88
 
89
+ from pix2text import Pix2Text, merge_line_texts
90
 
91
+ image_fps = [
92
+ 'examples/example.jpg',
93
+ 'examples/42.png',
94
+ 'examples/0000186.png',
95
+ ]
96
+ p2t = Pix2Text()
97
+ outs = p2t.recognize_formula(image_fps) # recognize pure formula images
98
 
99
+ outs2 = p2t.recognize('examples/mixed.jpg') # recognize mixed images
100
+ print(merge_line_texts(outs2))
101
+ ```
102
 
 
103
 
104
+ ## Performance / 性能
105
 
106
+ The original images for the test data are derived from real data uploaded by users on the [Pix2Text Online Service](https://p2t.breezedeus.com). Initially, real user data from a specific period is selected, and then the Mathematical Formula Detection model (MFD) within Pix2Text is used to detect the mathematical formulas in these images and crop the corresponding parts. A subset of these formula images is then randomly chosen for manual annotation to create the test dataset. The following image shows some sample pictures from the test dataset. It is evident that the images in the test dataset are quite diverse, including mathematical formulas of various lengths and complexities, from single letters to formula groups and even matrices. This test dataset includes `485` images.
107
 
108
+ 测试数据对应的原始图片来源于 [Pix2Text 网页版](https://p2t.breezedeus.com) 用户上传的真实数据。首先选取一段时间内用户的真实数据,然后利用 Pix2Text 中数学公式检测模型(MFD)检测出这些图片中的数学公式并截取出对应的部分,再从中随机选取部分公式图片进行人工标注。就获得了用于测试的测试数据集了。下图是测试数据集中的部分样例图片。从中可以看出测试数据集中的图片比较多样,包括了各种不同长度和复杂度的数学公式,有单个字母的图片,也有公式组甚至矩阵图片。本测试数据集包括了 `485` 张图片。
109
 
110
+ ![Examples from test data](https://www.notion.so/image/https%3A%2F%2Fprod-files-secure.s3.us-west-2.amazonaws.com%2F9341931a-53f0-48e1-b026-0f1ad17b457c%2Ffb23b2d4-cdcf-46c9-9095-027591402a54%2FUntitled.png?table=block&id=269900d5-299a-4dcd-a26c-6555e831caff)
111
 
112
+ Below are the Character Error Rates (CER, the lower, the better) of various models on this test dataset. For the true annotated results, as well as the output of each model, normalization was first performed to ensure that irrelevant factors such as spaces do not affect the test outcomes. For the recognition results of Texify, the leading and trailing symbols `$` or `$$` of the formula are removed first.
113
 
114
+ 以下是各个模型在此测试数据集上的 CER(字错误率,越小越好)。其中对真实标注结果,以及每个模型的输出都首先进行了标准化,以保证不会因为空格等无关因素影响测试结果。对 Texify 的识别结果会首先去掉公式的首尾符号$或$$。
115
 
116
+ ![CER Comparison Among Different MFR Models](https://www.notion.so/image/https%3A%2F%2Fprod-files-secure.s3.us-west-2.amazonaws.com%2F9341931a-53f0-48e1-b026-0f1ad17b457c%2F976b6c14-879d-4a3b-b027-6d2b15ce28b3%2FUntitled.png?table=block&id=6c503402-9b34-4937-a103-e4fd3bdbe754)
117
 
118
+ As can be seen from the figure above, the Pix2Text V1.0 MFR open-source free version model has significantly outperformed the previous versions of the paid model. Moreover, compared to the V1.0 MFR open-source free model, the precision of the Pix2Text V1.0 MFR paid model has been further improved.
119
 
120
+ 由上图可见,Pix2Text V1.0 MFR 开源免费版模型已经大大优于之前版本的付费模型。而相比 V1.0 MFR 开源免费模型,Pix2Text V1.0 MFR 付费模型精度得到了进一步的提升。
121
 
122
+ > [Texify](https://github.com/VikParuchuri/texify) is more suited for recognizing images with standard formatting. It performs poorly in recognizing images containing single letters. This is the main reason why Texify's performance on this test dataset is inferior to that of Latex-OCR.
123
+ >
124
+ > [Texify](https://github.com/VikParuchuri/texify) 更适用于识别标准排���的图片,它对包含单字母的图片识别较差。这也是 Texify 在此测试数据集上效果比 Latex-OCR 还差的主要原因。
125
 
126
 
127
  ## Feedback / 反馈