Hugo Laurençon's picture

Hugo Laurençon

HugoLaurencon

·

HugoLaurencon

AI & ML interests

None yet

Articles

Docmatix - a huge dataset for Document Visual Question Answering

Introducing Idefics2: A Powerful 8B Vision-Language Model for the community

Unlocking the conversion of Web Screenshots into HTML Code with the WebSight Dataset

Introducing IDEFICS: An Open Reproduction of State-of-the-art Visual Language Model

Putting ethical principles at the core of research lifecycle

Organizations

HugoLaurencon's activity

upvoted a paper 9 days ago

Watermark Anything with Localized Messages

Paper • 2411.07231 • Published 10 days ago • 19

upvoted a paper 13 days ago

Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models

Paper • 2411.04996 • Published 14 days ago • 48

upvoted 2 papers 27 days ago

Unleashing Reasoning Capability of LLMs via Scalable Question Synthesis from Scratch

Paper • 2410.18693 • Published 28 days ago • 40

WAFFLE: Multi-Modal Model for Automated Front-End Development

Paper • 2410.18362 • Published 29 days ago • 11

upvoted 4 papers about 1 month ago

MoH: Multi-Head Attention as Mixture-of-Head Attention

Paper • 2410.11842 • Published Oct 15 • 20

Movie Gen: A Cast of Media Foundation Models

Paper • 2410.13720 • Published Oct 17 • 88

Efficient Diffusion Models: A Comprehensive Survey from Principles to Practices

Paper • 2410.11795 • Published Oct 15 • 16

Diversity-Rewarded CFG Distillation

Paper • 2410.06084 • Published Oct 8 • 10

upvoted 3 papers about 2 months ago

LEOPARD : A Vision Language Model For Text-Rich Multi-Image Tasks

Paper • 2410.01744 • Published Oct 2 • 25

The All-Seeing Project V2: Towards General Relation Comprehension of the Open World

Paper • 2402.19474 • Published Feb 29 • 2

Imagine yourself: Tuning-Free Personalized Image Generation

Paper • 2409.13346 • Published Sep 20 • 67

upvoted 7 papers 2 months ago

InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced Mathematical Reasoning

Paper • 2409.12568 • Published Sep 19 • 47

MMSearch: Benchmarking the Potential of Large Models as Multi-modal Search Engines

Paper • 2409.12959 • Published Sep 19 • 36

Qwen2.5-Coder Technical Report

Paper • 2409.12186 • Published Sep 18 • 136

Preference Tuning with Human Feedback on Language, Speech, and Vision Tasks: A Survey

Paper • 2409.11564 • Published Sep 17 • 19

OmniGen: Unified Image Generation

Paper • 2409.11340 • Published Sep 17 • 108

Seed-Music: A Unified Framework for High Quality and Controlled Music Generation

Paper • 2409.09214 • Published Sep 13 • 47

Towards a Unified View of Preference Learning for Large Language Models: A Survey

Paper • 2409.02795 • Published Sep 4 • 72

upvoted 2 papers 3 months ago

Arctic-SnowCoder: Demystifying High-Quality Data in Code Pretraining

Paper • 2409.02326 • Published Sep 3 • 18

Mini-Omni: Language Models Can Hear, Talk While Thinking in Streaming

Paper • 2408.16725 • Published Aug 29 • 52