-
RLHF Workflow: From Reward Modeling to Online RLHF
Paper • 2405.07863 • Published • 67 -
Chameleon: Mixed-Modal Early-Fusion Foundation Models
Paper • 2405.09818 • Published • 126 -
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models
Paper • 2405.15574 • Published • 53 -
An Introduction to Vision-Language Modeling
Paper • 2405.17247 • Published • 85
Collections
Discover the best community collections!
Collections including paper arxiv:2407.14358
-
Generative AI meets 3D: A Survey on Text-to-3D in AIGC Era
Paper • 2305.06131 • Published • 2 -
Perpetual Humanoid Control for Real-time Simulated Avatars
Paper • 2305.06456 • Published • 1 -
Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold
Paper • 2305.10973 • Published • 32 -
LDM3D: Latent Diffusion Model for 3D
Paper • 2305.10853 • Published • 10
-
Taming Data and Transformers for Audio Generation
Paper • 2406.19388 • Published -
GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities
Paper • 2406.11768 • Published • 20 -
PicoAudio: Enabling Precise Timestamp and Frequency Controllability of Audio Events in Text-to-audio Generation
Paper • 2407.02869 • Published • 18 -
Stable Audio Open
Paper • 2407.14358 • Published • 23
-
SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models
Paper • 2407.15841 • Published • 39 -
Stable Audio Open
Paper • 2407.14358 • Published • 23 -
PlacidDreamer: Advancing Harmony in Text-to-3D Generation
Paper • 2407.13976 • Published • 5 -
Efficient Audio Captioning with Encoder-Level Knowledge Distillation
Paper • 2407.14329 • Published • 4