Umitcan Sahin's picture

Umitcan Sahin

ucsahin

·

AI & ML interests

Visual Language Models, Large Language Models, Vision Transformers

Organizations

None yet

ucsahin's activity

upvoted a collection 1 day ago

Nov 15 Releases 🍂

15 items • Updated 1 day ago • 5

upvoted a collection 2 months ago

Turkish Vision-Language Datasets

Collection of Turkish vision-language datasets. • 19 items • Updated about 10 hours ago • 4

upvoted 5 papers 3 months ago

LLaVA-OneVision: Easy Visual Task Transfer

Paper • 2408.03326 • Published Aug 6 • 59

MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models

Paper • 2408.02718 • Published Aug 5 • 60

VITA: Towards Open-Source Interactive Omni Multimodal LLM

Paper • 2408.05211 • Published Aug 9 • 46

SAM 2: Segment Anything in Images and Videos

Paper • 2408.00714 • Published Aug 1 • 108

Gemma 2: Improving Open Language Models at a Practical Size

Paper • 2408.00118 • Published Jul 31 • 73

upvoted a collection 4 months ago

Vision Language Leaderboards

This collection has all the vision language leaderboards. • 7 items • Updated Aug 24 • 10

upvoted 2 articles 4 months ago

Article

Google releases Gemma 2 2B, ShieldGemma and Gemma Scope

Jul 31

• 59

Article

The Rise of Agentic Data Generation

By

•

Jul 15

• 75

upvoted 2 papers 4 months ago

EVLM: An Efficient Vision-Language Model for Visual Understanding

Paper • 2407.14177 • Published Jul 19 • 42

Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Model

Paper • 2407.07053 • Published Jul 9 • 41

upvoted a collection 4 months ago

🪐 SmolLM

A series of smol LLMs: 135M, 360M and 1.7B. We release base and Instruct models as well as the training corpus and some WebGPU demos • 12 items • Updated Aug 18 • 197

upvoted 2 articles 4 months ago

Article

TGI Multi-LoRA: Deploy Once, Serve 30 Models

Jul 18

• 48

Article

Docmatix - a huge dataset for Document Visual Question Answering

Jul 18

• 66

upvoted 4 papers 4 months ago

Model Surgery: Modulating LLM's Behavior Via Simple Parameter Editing

Paper • 2407.08770 • Published Jul 11 • 19

AgentInstruct: Toward Generative Teaching with Agentic Flows

Paper • 2407.03502 • Published Jul 3 • 48

Multi-Object Hallucination in Vision-Language Models

Paper • 2407.06192 • Published Jul 8 • 9

ColPali: Efficient Document Retrieval with Vision Language Models

Paper • 2407.01449 • Published Jun 27 • 41

upvoted an article 5 months ago

Article

Fine-tuning Florence-2 - Microsoft's Cutting-edge Vision Language Models

Jun 24

• 177