ahmed-masry
commited on
Commit
•
8ca9fa4
1
Parent(s):
8559242
Update README.md
Browse files
README.md
CHANGED
@@ -4,7 +4,7 @@ license: mit
|
|
4 |
|
5 |
# ColFlor: Towards BERT-Size Vision-Language Document Retrieval Models
|
6 |
|
7 |
-
In June 2024, [ColPali](https://arxiv.org/abs/2407.01449) was introduced as an OCR-free document retrieval model, built over [PaliGemma](https://arxiv.org/abs/2407.07726), shifting the paradigm of PDF document retrieval by directly processing images instead of using error-prone and resource-heavy OCR pipelines. However, with three billion parameters, ColPali might be computationally expensive, especially for large document databases. In contrast, text retrieval models like [ColBERT](https://arxiv.org/abs/2004.12832) are more efficient with just a few hundred million parameters, but they require error-prone and expensive OCR pipelines to. To bridge this gap, we introduce ColFlor, an OCR-free visual document retrieval model with only 174 million parameters.
|
8 |
|
9 |
<p align="center"><img width=800 src="https://github.com/AhmedMasryKU/colflor/blob/main/assets/colflor.png?raw=true"/></p>
|
10 |
|
|
|
4 |
|
5 |
# ColFlor: Towards BERT-Size Vision-Language Document Retrieval Models
|
6 |
|
7 |
+
In June 2024, [ColPali](https://arxiv.org/abs/2407.01449) was introduced as an OCR-free document retrieval model, built over [PaliGemma](https://arxiv.org/abs/2407.07726), shifting the paradigm of PDF document retrieval by directly processing images instead of using error-prone and resource-heavy OCR pipelines. However, with three billion parameters, ColPali might be computationally expensive, especially for large document databases. In contrast, text retrieval models like [ColBERT](https://arxiv.org/abs/2004.12832) are more efficient with just a few hundred million parameters, but they require error-prone and expensive OCR pipelines to. To bridge this gap, we introduce ColFlor, an OCR-free visual document retrieval model with only 174 million parameters. ColFlor is 17 times smaller than ColPali, 9.8 times faster in encoding queries and 5.25 faster in encoding images, with only a 1.8% drop in performance on text-rich English documents.
|
8 |
|
9 |
<p align="center"><img width=800 src="https://github.com/AhmedMasryKU/colflor/blob/main/assets/colflor.png?raw=true"/></p>
|
10 |
|