Introducing IDEFICS: An Open Reproduction of State-of-the-art Visual Language Model Aug 22, 2023 • 26
The All-Seeing Project V2: Towards General Relation Comprehension of the Open World Paper • 2402.19474 • Published Feb 29 • 2
Imagine yourself: Tuning-Free Personalized Image Generation Paper • 2409.13346 • Published 10 days ago • 64
InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced Mathematical Reasoning Paper • 2409.12568 • Published 11 days ago • 45
MMSearch: Benchmarking the Potential of Large Models as Multi-modal Search Engines Paper • 2409.12959 • Published 10 days ago • 33
Preference Tuning with Human Feedback on Language, Speech, and Vision Tasks: A Survey Paper • 2409.11564 • Published 12 days ago • 17
Seed-Music: A Unified Framework for High Quality and Controlled Music Generation Paper • 2409.09214 • Published 16 days ago • 44
Towards a Unified View of Preference Learning for Large Language Models: A Survey Paper • 2409.02795 • Published 25 days ago • 71
Arctic-SnowCoder: Demystifying High-Quality Data in Code Pretraining Paper • 2409.02326 • Published 26 days ago • 16
Mini-Omni: Language Models Can Hear, Talk While Thinking in Streaming Paper • 2408.16725 • Published Aug 29 • 50
VisionTS: Visual Masked Autoencoders Are Free-Lunch Zero-Shot Time Series Forecasters Paper • 2408.17253 • Published about 1 month ago • 35
Building and better understanding vision-language models: insights and future directions Paper • 2408.12637 • Published Aug 22 • 110
Scaling Laws and Interpretability of Learning from Repeated Data Paper • 2205.10487 • Published May 21, 2022 • 1
xGen-MM (BLIP-3): A Family of Open Large Multimodal Models Paper • 2408.08872 • Published Aug 16 • 96
PDF Document / OCR Datasets Collection Document datasets with .pdf files that are usable with pixparse libraries and tools. • 2 items • Updated Mar 30 • 47
mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models Paper • 2408.04840 • Published Aug 9 • 31
WARP: On the Benefits of Weight Averaged Rewarded Policies Paper • 2406.16768 • Published Jun 24 • 22
view article Article Fine-tuning Florence-2 - Microsoft's Cutting-edge Vision Language Models Jun 24 • 168
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone Paper • 2404.14219 • Published Apr 22 • 250
Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length Paper • 2404.08801 • Published Apr 12 • 62
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model Paper • 2211.05100 • Published Nov 9, 2022 • 28
Idefics2 🐶 Collection Idefics2-8B is a foundation vision-language model. In this collection, you will find the models, datasets and demo related to its creation. • 11 items • Updated May 6 • 88
view article Article Introducing Idefics2: A Powerful 8B Vision-Language Model for the community Apr 15 • 161
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models Paper • 2404.02258 • Published Apr 2 • 103
Unlocking the conversion of Web Screenshots into HTML Code with the WebSight Dataset Paper • 2403.09029 • Published Mar 14 • 54
From screenshots to HTML Collection WebSight is a dataset of 823,000 HTML/CSS codes representing synthetically generated English websites, each accompanied by a corresponding screenshot. • 4 items • Updated Apr 15 • 17
Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling Paper • 2401.16380 • Published Jan 29 • 46
Generative Multimodal Models are In-Context Learners Paper • 2312.13286 • Published Dec 20, 2023 • 34
Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling Paper • 2311.00430 • Published Nov 1, 2023 • 56
Small-scale proxies for large-scale Transformer training instabilities Paper • 2309.14322 • Published Sep 25, 2023 • 19
MagiCapture: High-Resolution Multi-Concept Portrait Customization Paper • 2309.06895 • Published Sep 13, 2023 • 27
LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding Paper • 2306.17107 • Published Jun 29, 2023 • 12
OBELICS: An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents Paper • 2306.16527 • Published Jun 21, 2023 • 47