Feature Extraction
Safetensors
English
minicpmv
VisRAG
custom_code
tcy6 commited on
Commit
890393c
1 Parent(s): 228010f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -6
README.md CHANGED
@@ -15,7 +15,7 @@ pipeline_tag: feature-extraction
15
  **VisRAG** is a novel vision-language model (VLM)-based RAG pipeline. In this pipeline, instead of first parsing the document to obtain text, the document is directly embedded using a VLM as an image and then retrieved to enhance the generation of a VLM.Compared to traditional text-based RAG, **VisRAG** maximizes the retention and utilization of the data information in the original documents, eliminating the information loss introduced during the parsing process.
16
  <p align="center"><img width=800 src="https://github.com/openbmb/VisRAG/blob/master/assets/main_figure.png?raw=true"/></p>
17
 
18
- ## VisRAG Description
19
 
20
  ### VisRAG-Ret
21
  **VisRAG-Ret** is a document embedding model built on [MiniCPM-V 2.0](https://huggingface.co/openbmb/MiniCPM-V-2), a vision-language model that integrates [SigLIP](https://huggingface.co/google/siglip-so400m-patch14-384) as the vision encoder and [MiniCPM-2B](https://huggingface.co/openbmb/MiniCPM-2B-sft-bf16) as the language model.
@@ -118,8 +118,4 @@ print(scores.tolist())
118
  ## Contact
119
 
120
  - Shi Yu: [email protected]
121
- - Chaoyue Tang: [email protected]
122
-
123
- ## Citation
124
-
125
- If you use any datasets or models from this organization in your research, please cite the original dataset as follows:
 
15
  **VisRAG** is a novel vision-language model (VLM)-based RAG pipeline. In this pipeline, instead of first parsing the document to obtain text, the document is directly embedded using a VLM as an image and then retrieved to enhance the generation of a VLM.Compared to traditional text-based RAG, **VisRAG** maximizes the retention and utilization of the data information in the original documents, eliminating the information loss introduced during the parsing process.
16
  <p align="center"><img width=800 src="https://github.com/openbmb/VisRAG/blob/master/assets/main_figure.png?raw=true"/></p>
17
 
18
+ ## VisRAG Pipeline
19
 
20
  ### VisRAG-Ret
21
  **VisRAG-Ret** is a document embedding model built on [MiniCPM-V 2.0](https://huggingface.co/openbmb/MiniCPM-V-2), a vision-language model that integrates [SigLIP](https://huggingface.co/google/siglip-so400m-patch14-384) as the vision encoder and [MiniCPM-2B](https://huggingface.co/openbmb/MiniCPM-2B-sft-bf16) as the language model.
 
118
  ## Contact
119
 
120
  - Shi Yu: [email protected]
121
+ - Chaoyue Tang: [email protected]