7 3

Filippo B

Filippo

NLP, data-intensive applications, cloud data platforms, information retrieval, workflows orchestration

upvoted an article 8 days ago

updated a collection about 2 months ago

updated a collection about 2 months ago

Filippo's activity

upvoted an article 8 days ago

Article

•

Aug 20

• 10

updated 2 collections about 2 months ago

upvoted a paper about 2 months ago

Reacted to merve's post with 🔥 about 2 months ago

Post

5516

I have put together a notebook on Multimodal RAG, where we do not process the documents with hefty pipelines but natively use:
- vidore/colpali for retrieval 📖 it doesn't need indexing with image-text pairs but just images!
- Qwen/Qwen2-VL-2B-Instruct for generation 💬 directly feed images as is to a vision language model with no processing to text!
I used ColPali implementation of the new 🐭 Byaldi library by @bclavie 🤗
https://github.com/answerdotai/byaldi
Link to notebook: https://github.com/merveenoyan/smol-vision/blob/main/ColPali_%2B_Qwen2_VL.ipynb