view article Article Self-Hosting LLaMA 3.1 70B (or any ~70B LLM) Affordably By abhinand β’ Aug 20 β’ 10
ColPali: Efficient Document Retrieval with Vision Language Models Paper β’ 2407.01449 β’ Published Jun 27 β’ 41
view post Post 5516 Reply I have put together a notebook on Multimodal RAG, where we do not process the documents with hefty pipelines but natively use:- vidore/colpali for retrieval π it doesn't need indexing with image-text pairs but just images!- Qwen/Qwen2-VL-2B-Instruct for generation π¬ directly feed images as is to a vision language model with no processing to text! I used ColPali implementation of the new π Byaldi library by @bclavie π€https://github.com/answerdotai/byaldiLink to notebook: https://github.com/merveenoyan/smol-vision/blob/main/ColPali_%2B_Qwen2_VL.ipynb π₯ 23 23 π 10 10 β€οΈ 4 4 +
view article Article ColPali: Efficient Document Retrieval with Vision Language Models π By manu β’ Jul 5 β’ 165