6 1 2

Ahmed Masry PRO

ahmed-masry

https://ahmedmasryku.github.io/

Ahmed_Masry97

AI & ML interests

Multimodal Chart Understanding, Multimodal Document AI, Multimodal Vision - Language Models,

Recent Activity

New activity 15 days ago

ahmed-masry/ColFlor:Update README.md

New activity 15 days ago

ahmed-masry/ColFlor:Update README.md

View all activity

Articles

ColFlor: Towards BERT-Size Vision-Language Document Retrieval Models

Oct 18

• 16

Organizations

None yet

ahmed-masry's activity

Reacted to merve's post with 🚀 about 1 month ago

Post

1962

It's raining depth estimation models ☔️
DepthPro is a zero-shot depth estimation model by Apple, it's fast, sharp and accurate 🔥
Demo: akhaliq/depth-pro
Model: apple/DepthPro
Paper page: Depth Pro: Sharp Monocular Metric Depth in Less Than a Second (2410.02073)

The model consists of two encoders: an encoder for patches and an image encoder 🖼️ The outputs of both are merged to decode to depth maps and get the focal length.
The model outperforms the previous state-of-the-art models in average of various benchmarks 📑

posted an update about 1 month ago

Post

1124

🚀 Introducing ColFlor: An Efficient, OCR-Free Vision-Language Document Retrieval Model 🌟

Earlier this year, ColPali revolutionized document retrieval by eliminating the need for error-prone OCR pipelines. Instead, it directly processes the document images. However, with its 3 billion parameters, ColPali is computationally heavy for large-scale applications.

That’s where ColFlor comes in—a smaller, faster alternative! 🎉 At 17x smaller than ColPali, ColFlor offers a more efficient, OCR-free document retrieval solution, making it ideal for users with limited computing resources (GPU Poor). 💡

Key Highlights:
🧠 174M parameters (vs. 3B for ColPali)
⚡ 9.8x faster query encoding, 5.25x faster image encoding
📉 Only 1.8% performance drop on text-rich English documents

Check out the full blog post for more insights on modeling, training, and evaluations across various document retrieval tasks! 🚀
Also, feel free to try our demo on huggingface 🤗

🔗 Resources:
📄 Blog post: https://huggingface.co/blog/ahmed-masry/colflor
🧠 Model: ahmed-masry/ColFlor
💻 Demo: ahmed-masry/ColFlor-Demo
🏋️‍♂️ Training code: https://github.com/AhmedMasryKU/colflor
📊 Evaluation code: https://github.com/AhmedMasryKU/vidore-benchmark-colflor

posted an update 5 months ago

Post

3344

📢 Exciting News! Our latest paper "ChartGemma" is out! 📊

🧵1/3: ChartGemma overcomes existing chart models key limitations that rely too much on data tables. Instead, it is trained on data generated directly from chart images, capturing crucial visual trends📸🔍

🧵2/3: ChartGemma builds upon PaliGemma from Google Research and is fine-tuned on a high-quality visual instruction tuning dataset generated from Gemini Flash 1.5. 🌟📊

🧵3/3: Achieves state-of-the-art results in chart summarization, question answering, and fact-checking tasks. 🏅📊 It can also generate more accurate and realistic chart summaries. 📝🔍

Our model and data are publicly available. We also have a cool web demo. Check it out! 🚀
Demo: ahmed-masry/ChartGemma
Code: https://github.com/vis-nlp/ChartGemma
Paper: ChartGemma: Visual Instruction-tuning for Chart Reasoning in the Wild (2407.04172)

Reacted to fdaudens's post with 🔥 5 months ago

Post

587

Just updated the Journalists on 🤗 community with two new AI tools! 🚀📊

Check them out:
- Free video transcription & smart summary tool artificialguybr/Video-Transcription-Smart-Summary
- ChartGemma - next-level chart analysis focusing on visual trends ahmed-masry/ChartGemma

Media folks: Join our community for more tools! https://huggingface.co/JournalistsonHF

posted an update 5 months ago

Post

3399

Hey everyone!

I'm excited to share a new demo for my ChartInstruct model from our ACL 2024 paper. It excels at various chart understanding tasks like QA, captioning, open-ended QA, fact checking and more!
Thanks to Hugging Face's ZeroGPU program, the demo runs smoothly even with the model's 7B parameters!

Check it out and enjoy!

Demo: ahmed-masry/ChartInstruct-LLama2
Model: ahmed-masry/ChartInstruct-LLama2
Paper: https://arxiv.org/abs/2403.09028