Boke Syo PRO

bokesyo

AI & ML interests

None yet

Organizations

Posts 2

view post
Post
3478
What Happens When RAG System Become Fully Vision-Language Model-Based?
HF Demo: bokesyo/MiniCPMV-RAG-PDFQA
Multimodal Dense Retriever: RhapsodyAI/minicpm-visual-embedding-v0
Generation Model: openbmb/MiniCPM-V-2_6
Github: https://github.com/RhapsodyAILab/MiniCPM-V-Embedding-v0-Train

The Vision-Language Model Dense Retriever MiniCPM-Visual-Embedding-v0 reads PDFs directly -- no OCR required. With strong OCR capability and visual understanding capability, it generates multimodal dense representations, allowing you to build and search through your personal library with ease.

Ask a question, it retrieves the most relevant pages. Then, MiniCPM-V-2.6 provides answers based on the retrieved pages, with strong multi-image understanding capabilities.

Whether you’re working with a visually-intensive or text-oriented PDF, it helps you quickly find the information you need. You can also build a personal library with it.

It operates just like a human: reading, storing, retrieving, and answering with full visual comprehension.

Currently, the online demo supports PDFs with up to 50 pages due to GPU time limits. For longer PDFs or entire books, you can deploy it on your own machine.
view post
Post
4478
It's time to switch from bge to Memex! We introduce Memex: OCR-free Visual Document Embedding Model as Your Personal Librarian.

The model only takes images as document-side inputs and produce vectors representing document pages. Memex is trained with over 200k query-visual document pairs, including textual document, visual document, arxiv figures, plots, charts, industry documents, textbooks, ebooks, and openly-available PDFs, etc. Its performance is on a par with our ablation text embedding model on text-oriented documents, and an advantages on visually-intensive documents.

Our model is capable of:

😋 Help you read a long visually-intensive or text-oriented PDF document and find the pages that answer your question.

🤗 Help you build a personal library and retireve book pages from a large collection of books.

🤩 It has only 2.8B parameters, and has the potential to run on your PC.

🐵 It works like human: read and comprehend with vision and remember multimodal information in hippocampus.

The model is open-sourced at RhapsodyAI/minicpm-visual-embedding-v0

Everyone is welcome to try our online demo at https://huggingface.co/spaces/bokesyo/minicpm-visual-embeeding-v0-demo

models

None public yet