TextBoost: Towards One-Shot Personalization of Text-to-Image Models via Fine-tuning Text Encoder Paper • 2409.08248 • Published 22 days ago • 12
SciLitLLM: How to Adapt LLMs for Scientific Literature Understanding Paper • 2408.15545 • Published Aug 28 • 32
DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning Paper • 2406.11896 • Published Jun 14 • 18
RVT-2: Learning Precise Manipulation from Few Demonstrations Paper • 2406.08545 • Published Jun 12 • 7
RoboCasa: Large-Scale Simulation of Everyday Tasks for Generalist Robots Paper • 2406.02523 • Published Jun 4 • 9
CAT3D: Create Anything in 3D with Multi-View Diffusion Models Paper • 2405.10314 • Published May 16 • 43
LogoMotion: Visually Grounded Code Generation for Content-Aware Animation Paper • 2405.07065 • Published May 11 • 16
Video2Game: Real-time, Interactive, Realistic and Browser-Compatible Environment from a Single Video Paper • 2404.09833 • Published Apr 15 • 29
Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models Paper • 2404.07973 • Published Apr 11 • 30
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments Paper • 2404.07972 • Published Apr 11 • 43
ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback Paper • 2404.07987 • Published Apr 11 • 47
RealmDreamer: Text-Driven 3D Scene Generation with Inpainting and Depth Diffusion Paper • 2404.07199 • Published Apr 10 • 25
MiniGPT4-Video: Advancing Multimodal LLMs for Video Understanding with Interleaved Visual-Textual Tokens Paper • 2404.03413 • Published Apr 4 • 25
LVLM-Intrepret: An Interpretability Tool for Large Vision-Language Models Paper • 2404.03118 • Published Apr 3 • 23
Unsolvable Problem Detection: Evaluating Trustworthiness of Vision Language Models Paper • 2403.20331 • Published Mar 29 • 14
DreamPolisher: Towards High-Quality Text-to-3D Generation via Geometric Diffusion Paper • 2403.17237 • Published Mar 25 • 8
Garment3DGen: 3D Garment Stylization and Texture Generation Paper • 2403.18816 • Published Mar 27 • 20
Vid2Robot: End-to-end Video-conditioned Policy Learning with Cross-Attention Transformers Paper • 2403.12943 • Published Mar 19 • 14
Gemma: Open Models Based on Gemini Research and Technology Paper • 2403.08295 • Published Mar 13 • 47
Design2Code: How Far Are We From Automating Front-End Engineering? Paper • 2403.03163 • Published Mar 5 • 93
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection Paper • 2403.03507 • Published Mar 6 • 182
RT-Sketch: Goal-Conditioned Imitation Learning from Hand-Drawn Sketches Paper • 2403.02709 • Published Mar 5 • 7
Scaling Rectified Flow Transformers for High-Resolution Image Synthesis Paper • 2403.03206 • Published Mar 5 • 56
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits Paper • 2402.17764 • Published Feb 27 • 592
CyberDemo: Augmenting Simulated Human Demonstration for Real-World Dexterous Manipulation Paper • 2402.14795 • Published Feb 22 • 5
DiLightNet: Fine-grained Lighting Control for Diffusion-based Image Generation Paper • 2402.11929 • Published Feb 19 • 9
Learning to Learn Faster from Human Feedback with Language Model Predictive Control Paper • 2402.11450 • Published Feb 18 • 20
Universal Manipulation Interface: In-The-Wild Robot Teaching Without In-The-Wild Robots Paper • 2402.10329 • Published Feb 15 • 13
LAVE: LLM-Powered Agent Assistance and Language Augmentation for Video Editing Paper • 2402.10294 • Published Feb 15 • 22
DreamMatcher: Appearance Matching Self-Attention for Semantically-Consistent Text-to-Image Personalization Paper • 2402.09812 • Published Feb 15 • 12
Self-Play Fine-Tuning of Diffusion Models for Text-to-Image Generation Paper • 2402.10210 • Published Feb 15 • 29
PRDP: Proximal Reward Difference Prediction for Large-Scale Reward Finetuning of Diffusion Models Paper • 2402.08714 • Published Feb 13 • 10
L3GO: Language Agents with Chain-of-3D-Thoughts for Generating Unconventional Objects Paper • 2402.09052 • Published Feb 14 • 16
Animated Stickers: Bringing Stickers to Life with Video Diffusion Paper • 2402.06088 • Published Feb 8 • 9
InstaGen: Enhancing Object Detection by Training on Synthetic Dataset Paper • 2402.05937 • Published Feb 8 • 11
WebLINX: Real-World Website Navigation with Multi-Turn Dialogue Paper • 2402.05930 • Published Feb 8 • 39
InteractiveVideo: User-Centric Controllable Video Generation with Synergistic Multimodal Instructions Paper • 2402.03040 • Published Feb 5 • 17