-
AppAgent: Multimodal Agents as Smartphone Users
Paper • 2312.13771 • Published • 51 -
En3D: An Enhanced Generative Model for Sculpting 3D Humans from 2D Synthetic Data
Paper • 2401.01173 • Published • 11 -
Boosting Large Language Model for Speech Synthesis: An Empirical Study
Paper • 2401.00246 • Published • 10 -
Image Sculpting: Precise Object Editing with 3D Geometry Control
Paper • 2401.01702 • Published • 18
ruibing hou
flow2023
AI & ML interests
None yet
Organizations
None yet
Collections
9
-
TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones
Paper • 2312.16862 • Published • 30 -
Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision, Language, Audio, and Action
Paper • 2312.17172 • Published • 26 -
Towards Truly Zero-shot Compositional Visual Reasoning with LLMs as Programmers
Paper • 2401.01974 • Published • 5 -
From Audio to Photoreal Embodiment: Synthesizing Humans in Conversations
Paper • 2401.01885 • Published • 27
models
None public yet
datasets
None public yet