foundation model - a weleen Collection

weleen 's Collections

foundation model

aigc

aigc acceleration

gs

foundation model

updated Aug 26

DreamLLM: Synergistic Multimodal Comprehension and Creation

Paper • 2309.11499 • Published Sep 20, 2023 • 58
An Introduction to Vision-Language Modeling

Paper • 2405.17247 • Published May 27 • 85
Chameleon: Mixed-Modal Early-Fusion Foundation Models

Paper • 2405.09818 • Published May 16 • 125
No Time to Waste: Squeeze Time into Channel for Mobile Video Understanding

Paper • 2405.08344 • Published May 14 • 12
KAN or MLP: A Fairer Comparison

Paper • 2407.16674 • Published Jul 23 • 40
OmniBind: Large-scale Omni Multimodal Representation via Binding Spaces

Paper • 2407.11895 • Published Jul 16 • 7
VILA^2: VILA Augmented VILA

Paper • 2407.17453 • Published Jul 24 • 38
Improving 2D Feature Representations by 3D-Aware Fine-Tuning

Paper • 2407.20229 • Published Jul 29 • 7
POA: Pre-training Once for Models of All Sizes

Paper • 2408.01031 • Published Aug 2 • 26
Improving Text Embeddings for Smaller Language Models Using Contrastive Fine-tuning

Paper • 2408.00690 • Published Aug 1 • 21
LongVILA: Scaling Long-Context Visual Language Models for Long Videos

Paper • 2408.10188 • Published Aug 19 • 51
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model

Paper • 2408.11039 • Published Aug 20 • 56
Show-o: One Single Transformer to Unify Multimodal Understanding and Generation

Paper • 2408.12528 • Published Aug 22 • 50
TWLV-I: Analysis and Insights from Holistic Evaluation on Video Foundation Models

Paper • 2408.11318 • Published Aug 21 • 54