Collections
Discover the best community collections!
Collections including paper arxiv:2404.11614
-
Analyzing Transformers in Embedding Space
Paper • 2209.02535 • Published • 3 -
Prompt-to-Prompt Image Editing with Cross Attention Control
Paper • 2208.01626 • Published • 2 -
Dynamic Typography: Bringing Words to Life
Paper • 2404.11614 • Published • 44 -
Transformer Language Models without Positional Encodings Still Learn Positional Information
Paper • 2203.16634 • Published • 5
-
Event Camera Demosaicing via Swin Transformer and Pixel-focus Loss
Paper • 2404.02731 • Published • 1 -
MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models
Paper • 2309.12284 • Published • 18 -
RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis
Paper • 2404.03204 • Published • 7 -
Adapting LLaMA Decoder to Vision Transformer
Paper • 2404.06773 • Published • 17
-
Video as the New Language for Real-World Decision Making
Paper • 2402.17139 • Published • 18 -
VideoCrafter1: Open Diffusion Models for High-Quality Video Generation
Paper • 2310.19512 • Published • 15 -
VideoMamba: State Space Model for Efficient Video Understanding
Paper • 2403.06977 • Published • 27 -
VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models
Paper • 2401.09047 • Published • 13
-
Compose and Conquer: Diffusion-Based 3D Depth Aware Composable Image Synthesis
Paper • 2401.09048 • Published • 9 -
Improving fine-grained understanding in image-text pre-training
Paper • 2401.09865 • Published • 16 -
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data
Paper • 2401.10891 • Published • 59 -
Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild
Paper • 2401.13627 • Published • 73
-
StreamDiffusion: A Pipeline-level Solution for Real-time Interactive Generation
Paper • 2312.12491 • Published • 69 -
Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs
Paper • 2401.11708 • Published • 30 -
Training-Free Consistent Text-to-Image Generation
Paper • 2402.03286 • Published • 65 -
PALP: Prompt Aligned Personalization of Text-to-Image Models
Paper • 2401.06105 • Published • 47
-
VideoBooth: Diffusion-based Video Generation with Image Prompts
Paper • 2312.00777 • Published • 21 -
MotionCtrl: A Unified and Flexible Motion Controller for Video Generation
Paper • 2312.03641 • Published • 20 -
GenTron: Delving Deep into Diffusion Transformers for Image and Video Generation
Paper • 2312.04557 • Published • 12 -
DreamVideo: Composing Your Dream Videos with Customized Subject and Motion
Paper • 2312.04433 • Published • 9
-
Random Field Augmentations for Self-Supervised Representation Learning
Paper • 2311.03629 • Published • 6 -
TEAL: Tokenize and Embed ALL for Multi-modal Large Language Models
Paper • 2311.04589 • Published • 18 -
GENOME: GenerativE Neuro-symbOlic visual reasoning by growing and reusing ModulEs
Paper • 2311.04901 • Published • 7 -
Q-Instruct: Improving Low-level Visual Abilities for Multi-modality Foundation Models
Paper • 2311.06783 • Published • 26