aigc and 3d - a kame062 Collection

One-2-3-45: Any Single Image to 3D Mesh in 45 Seconds without Per-Shape Optimization

Paper • 2306.16928 • Published Jun 29, 2023 • 38

DreamTime: An Improved Optimization Strategy for Text-to-3D Content Creation

Paper • 2306.12422 • Published Jun 21, 2023 • 12

DragDiffusion: Harnessing Diffusion Models for Interactive Point-based Image Editing

Paper • 2306.14435 • Published Jun 26, 2023 • 20

DreamDiffusion: Generating High-Quality Images from Brain EEG Signals

Paper • 2306.16934 • Published Jun 29, 2023 • 31

Magic123: One Image to High-Quality 3D Object Generation Using Both 2D and 3D Diffusion Priors

Paper • 2306.17843 • Published Jun 30, 2023 • 43

Generate Anything Anywhere in Any Scene

Paper • 2306.17154 • Published Jun 29, 2023 • 22

DisCo: Disentangled Control for Referring Human Dance Generation in Real World

Paper • 2307.00040 • Published Jun 30, 2023 • 25

LEDITS: Real Image Editing with DDPM Inversion and Semantic Guidance

Paper • 2307.00522 • Published Jul 2, 2023 • 32

SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis

Paper • 2307.01952 • Published Jul 4, 2023 • 82

DragonDiffusion: Enabling Drag-style Manipulation on Diffusion Models

Paper • 2307.02421 • Published Jul 5, 2023 • 34

InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation

Paper • 2307.06942 • Published Jul 13, 2023 • 22

Sketch-A-Shape: Zero-Shot Sketch-to-3D Shape Generation

Paper • 2307.03869 • Published Jul 8, 2023 • 22

AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning

Paper • 2307.04725 • Published Jul 10, 2023 • 64

HyperDreamBooth: HyperNetworks for Fast Personalization of Text-to-Image Models

Paper • 2307.06949 • Published Jul 13, 2023 • 50

DreamTeacher: Pretraining Image Backbones with Deep Generative Models

Paper • 2307.07487 • Published Jul 14, 2023 • 19

Text2Layer: Layered Image Generation using Latent Diffusion Model

Paper • 2307.09781 • Published Jul 19, 2023 • 14

FABRIC: Personalizing Diffusion Models with Iterative Feedback

Paper • 2307.10159 • Published Jul 19, 2023 • 30

TokenFlow: Consistent Diffusion Features for Consistent Video Editing

Paper • 2307.10373 • Published Jul 19, 2023 • 56

Subject-Diffusion:Open Domain Personalized Text-to-Image Generation without Test-time Fine-tuning

Paper • 2307.11410 • Published Jul 21, 2023 • 15

Interpolating between Images with Diffusion Models

Paper • 2307.12560 • Published Jul 24, 2023 • 19

ImageBrush: Learning Visual In-Context Instructions for Exemplar-Based Image Manipulation

Paper • 2308.00906 • Published Aug 2, 2023 • 13

ConceptLab: Creative Generation using Diffusion Prior Constraints

Paper • 2308.02669 • Published Aug 3, 2023 • 23

AvatarVerse: High-quality & Stable 3D Avatar Creation from Text and Pose

Paper • 2308.03610 • Published Aug 7, 2023 • 23

3D Gaussian Splatting for Real-Time Radiance Field Rendering

Paper • 2308.04079 • Published Aug 8, 2023 • 170

IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models

Paper • 2308.06721 • Published Aug 13, 2023 • 29

Dual-Stream Diffusion Net for Text-to-Video Generation

Paper • 2308.08316 • Published Aug 16, 2023 • 23

TeCH: Text-guided Reconstruction of Lifelike Clothed Humans

Paper • 2308.08545 • Published Aug 16, 2023 • 33

MVDream: Multi-view Diffusion for 3D Generation

Paper • 2308.16512 • Published Aug 31, 2023 • 102

VideoGen: A Reference-Guided Latent Diffusion Approach for High Definition Text-to-Video Generation

Paper • 2309.00398 • Published Sep 1, 2023 • 20

CityDreamer: Compositional Generative Model of Unbounded 3D Cities

Paper • 2309.00610 • Published Sep 1, 2023 • 18

PhotoVerse: Tuning-Free Image Customization with Text-to-Image Diffusion Models

Paper • 2309.05793 • Published Sep 11, 2023 • 50

InstaFlow: One Step is Enough for High-Quality Diffusion-Based Text-to-Image Generation

Paper • 2309.06380 • Published Sep 12, 2023 • 32

LAVIE: High-Quality Video Generation with Cascaded Latent Diffusion Models

Paper • 2309.15103 • Published Sep 26, 2023 • 42

Emu: Enhancing Image Generation Models Using Photogenic Needles in a Haystack

Paper • 2309.15807 • Published Sep 27, 2023 • 32

Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation

Paper • 2309.15818 • Published Sep 27, 2023 • 19

Text-to-3D using Gaussian Splatting

Paper • 2309.16585 • Published Sep 28, 2023 • 31

DreamGaussian: Generative Gaussian Splatting for Efficient 3D Content Creation

Paper • 2309.16653 • Published Sep 28, 2023 • 46

PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis

Paper • 2310.00426 • Published Sep 30, 2023 • 61

Conditional Diffusion Distillation

Paper • 2310.01407 • Published Oct 2, 2023 • 20

Kandinsky: an Improved Text-to-Image Synthesis with Image Prior and Latent Diffusion

Paper • 2310.03502 • Published Oct 5, 2023 • 77

Aligning Text-to-Image Diffusion Models with Reward Backpropagation

Paper • 2310.03739 • Published Oct 5, 2023 • 21

MotionDirector: Motion Customization of Text-to-Video Diffusion Models

Paper • 2310.08465 • Published Oct 12, 2023 • 14

GaussianDreamer: Fast Generation from Text to 3D Gaussian Splatting with Point Cloud Priors

Paper • 2310.08529 • Published Oct 12, 2023 • 17

HyperHuman: Hyper-Realistic Human Generation with Latent Structural Diffusion

Paper • 2310.08579 • Published Oct 12, 2023 • 14

4K4D: Real-Time 4D View Synthesis at 4K Resolution

Paper • 2310.11448 • Published Oct 17, 2023 • 36

Wonder3D: Single Image to 3D using Cross-Domain Diffusion

Paper • 2310.15008 • Published Oct 23, 2023 • 21

Matryoshka Diffusion Models

Paper • 2310.15111 • Published Oct 23, 2023 • 40

DEsignBench: Exploring and Benchmarking DALL-E 3 for Imagining Visual Design

Paper • 2310.15144 • Published Oct 23, 2023 • 13

A Picture is Worth a Thousand Words: Principled Recaptioning Improves Image Generation

Paper • 2310.16656 • Published Oct 25, 2023 • 40

DreamCraft3D: Hierarchical 3D Generation with Bootstrapped Diffusion Prior

Paper • 2310.16818 • Published Oct 25, 2023 • 30

CommonCanvas: An Open Diffusion Model Trained with Creative-Commons Images

Paper • 2310.16825 • Published Oct 25, 2023 • 31

VideoCrafter1: Open Diffusion Models for High-Quality Video Generation

Paper • 2310.19512 • Published Oct 30, 2023 • 15

Beyond U: Making Diffusion Models Faster & Lighter

Paper • 2310.20092 • Published Oct 31, 2023 • 11

De-Diffusion Makes Text a Strong Cross-Modal Interface

Paper • 2311.00618 • Published Nov 1, 2023 • 21

I2VGen-XL: High-Quality Image-to-Video Synthesis via Cascaded Diffusion Models

Paper • 2311.04145 • Published Nov 7, 2023 • 32

LCM-LoRA: A Universal Stable-Diffusion Acceleration Module

Paper • 2311.05556 • Published Nov 9, 2023 • 80

Instant3D: Fast Text-to-3D with Sparse-View Generation and Large Reconstruction Model

Paper • 2311.06214 • Published Nov 10, 2023 • 29

One-2-3-45++: Fast Single Image to 3D Objects with Consistent Multi-View Generation and 3D Diffusion

Paper • 2311.07885 • Published Nov 14, 2023 • 39

Instant3D: Instant Text-to-3D Generation

Paper • 2311.08403 • Published Nov 14, 2023 • 44

Drivable 3D Gaussian Avatars

Paper • 2311.08581 • Published Nov 14, 2023 • 46

DMV3D: Denoising Multi-View Diffusion using 3D Large Reconstruction Model

Paper • 2311.09217 • Published Nov 15, 2023 • 21

UFOGen: You Forward Once Large Scale Text-to-Image Generation via Diffusion GANs

Paper • 2311.09257 • Published Nov 14, 2023 • 45

The Chosen One: Consistent Characters in Text-to-Image Diffusion Models

Paper • 2311.10093 • Published Nov 16, 2023 • 57

MetaDreamer: Efficient Text-to-3D Creation With Disentangling Geometry and Texture

Paper • 2311.10123 • Published Nov 16, 2023 • 15

SelfEval: Leveraging the discriminative nature of generative models for evaluation

Paper • 2311.10708 • Published Nov 17, 2023 • 14

Emu Video: Factorizing Text-to-Video Generation by Explicit Image Conditioning

Paper • 2311.10709 • Published Nov 17, 2023 • 24

Text-to-Sticker: Style Tailoring Latent Diffusion Models for Human Expression

Paper • 2311.10794 • Published Nov 17, 2023 • 24

Make Pixels Dance: High-Dynamic Video Generation

Paper • 2311.10982 • Published Nov 18, 2023 • 68

AutoStory: Generating Diverse Storytelling Images with Minimal Human Effort

Paper • 2311.11243 • Published Nov 19, 2023 • 14

LucidDreamer: Towards High-Fidelity Text-to-3D Generation via Interval Score Matching

Paper • 2311.11284 • Published Nov 19, 2023 • 16

PF-LRM: Pose-Free Large Reconstruction Model for Joint Pose and Shape Prediction

Paper • 2311.12024 • Published Nov 20, 2023 • 18

MagicDance: Realistic Human Dance Video Generation with Motions & Facial Expressions Transfer

Paper • 2311.12052 • Published Nov 18, 2023 • 32

Concept Sliders: LoRA Adaptors for Precise Control in Diffusion Models

Paper • 2311.12092 • Published Nov 20, 2023 • 21

NeuroPrompts: An Adaptive Framework to Optimize Prompts for Text-to-Image Generation

Paper • 2311.12229 • Published Nov 20, 2023 • 26

Diffusion Model Alignment Using Direct Preference Optimization

Paper • 2311.12908 • Published Nov 21, 2023 • 47

FusionFrames: Efficient Architectural Aspects for Text-to-Video Generation Pipeline

Paper • 2311.13073 • Published Nov 22, 2023 • 56

Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model

Paper • 2311.13231 • Published Nov 22, 2023 • 26

LucidDreamer: Domain-free Generation of 3D Gaussian Splatting Scenes

Paper • 2311.13384 • Published Nov 22, 2023 • 50

ZipLoRA: Any Subject in Any Style by Effectively Merging LoRAs

Paper • 2311.13600 • Published Nov 22, 2023 • 42

VideoBooth: Diffusion-based Video Generation with Image Prompts

Paper • 2312.00777 • Published Dec 1, 2023 • 20

VideoSwap: Customized Video Subject Swapping with Interactive Semantic Point Correspondence

Paper • 2312.02087 • Published Dec 4, 2023 • 20

ImageDream: Image-Prompt Multi-view Diffusion for 3D Generation

Paper • 2312.02201 • Published Dec 2, 2023 • 31

X-Adapter: Adding Universal Compatibility of Plugins for Upgraded Diffusion Model

Paper • 2312.02238 • Published Dec 4, 2023 • 25

FaceStudio: Put Your Face Everywhere in Seconds

Paper • 2312.02663 • Published Dec 5, 2023 • 30

DiffiT: Diffusion Vision Transformers for Image Generation

Paper • 2312.02139 • Published Dec 4, 2023 • 13

VMC: Video Motion Customization using Temporal Attention Adaption for Text-to-Video Diffusion Models

Paper • 2312.00845 • Published Dec 1, 2023 • 36

DeepCache: Accelerating Diffusion Models for Free

Paper • 2312.00858 • Published Dec 1, 2023 • 21

Analyzing and Improving the Training Dynamics of Diffusion Models

Paper • 2312.02696 • Published Dec 5, 2023 • 31

Orthogonal Adaptation for Modular Customization of Diffusion Models

Paper • 2312.02432 • Published Dec 5, 2023 • 12

LivePhoto: Real Image Animation with Text-guided Motion Control

Paper • 2312.02928 • Published Dec 5, 2023 • 16

Fine-grained Controllable Video Generation via Object Appearance and Context

Paper • 2312.02919 • Published Dec 5, 2023 • 10

MotionCtrl: A Unified and Flexible Motion Controller for Video Generation

Paper • 2312.03641 • Published Dec 6, 2023 • 20

Controllable Human-Object Interaction Synthesis

Paper • 2312.03913 • Published Dec 6, 2023 • 22

AnimateZero: Video Diffusion Models are Zero-Shot Image Animators

Paper • 2312.03793 • Published Dec 6, 2023 • 17

PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding

Paper • 2312.04461 • Published Dec 7, 2023 • 56

HyperDreamer: Hyper-Realistic 3D Content Generation and Editing from a Single Image

Paper • 2312.04543 • Published Dec 7, 2023 • 21

Smooth Diffusion: Crafting Smooth Latent Spaces in Diffusion Models

Paper • 2312.04410 • Published Dec 7, 2023 • 14

DreaMoving: A Human Dance Video Generation Framework based on Diffusion Models

Paper • 2312.05107 • Published Dec 8, 2023 • 38

GenTron: Delving Deep into Diffusion Transformers for Image and Video Generation

Paper • 2312.04557 • Published Dec 7, 2023 • 12

Text-to-3D Generation with Bidirectional Diffusion using both 2D and 3D priors

Paper • 2312.04963 • Published Dec 7, 2023 • 16

Sherpa3D: Boosting High-Fidelity Text-to-3D Generation via Coarse 3D Prior

Paper • 2312.06655 • Published Dec 11, 2023 • 23

Photorealistic Video Generation with Diffusion Models

Paper • 2312.06662 • Published Dec 11, 2023 • 23

FreeInit: Bridging Initialization Gap in Video Diffusion Models

Paper • 2312.07537 • Published Dec 12, 2023 • 26

FreeControl: Training-Free Spatial Control of Any Text-to-Image Diffusion Model with Any Condition

Paper • 2312.07536 • Published Dec 12, 2023 • 16

DiffMorpher: Unleashing the Capability of Diffusion Models for Image Morphing

Paper • 2312.07409 • Published Dec 12, 2023 • 22

Clockwork Diffusion: Efficient Generation With Model-Step Distillation

Paper • 2312.08128 • Published Dec 13, 2023 • 12

VideoLCM: Video Latent Consistency Model

Paper • 2312.09109 • Published Dec 14, 2023 • 22

Mosaic-SDF for 3D Generative Models

Paper • 2312.09222 • Published Dec 14, 2023 • 15

DreamTalk: When Expressive Talking Head Generation Meets Diffusion Probabilistic Models

Paper • 2312.09767 • Published Dec 15, 2023 • 25

Faster Diffusion: Rethinking the Role of UNet Encoder in Diffusion Models

Paper • 2312.09608 • Published Dec 15, 2023 • 13

FineControlNet: Fine-level Text Control for Image Generation with Spatially Aligned Text Control Injection

Paper • 2312.09252 • Published Dec 14, 2023 • 9

SCEdit: Efficient and Controllable Image Diffusion Generation via Skip Connection Editing

Paper • 2312.11392 • Published Dec 18, 2023 • 19

Rich Human Feedback for Text-to-Image Generation

Paper • 2312.10240 • Published Dec 15, 2023 • 19

StreamDiffusion: A Pipeline-level Solution for Real-time Interactive Generation

Paper • 2312.12491 • Published Dec 19, 2023 • 69

InstructVideo: Instructing Video Diffusion Models with Human Feedback

Paper • 2312.12490 • Published Dec 19, 2023 • 17

Fairy: Fast Parallelized Instruction-Guided Video-to-Video Synthesis

Paper • 2312.13834 • Published Dec 20, 2023 • 26

DREAM-Talk: Diffusion-based Realistic Emotional Audio-driven Method for Single Image Talking Face Generation

Paper • 2312.13578 • Published Dec 21, 2023 • 26

Paint3D: Paint Anything 3D with Lighting-Less Texture Diffusion Models

Paper • 2312.13913 • Published Dec 21, 2023 • 22

HD-Painter: High-Resolution and Prompt-Faithful Text-Guided Image Inpainting with Diffusion Models

Paper • 2312.14091 • Published Dec 21, 2023 • 15

DreamTuner: Single Image is Enough for Subject-Driven Generation

Paper • 2312.13691 • Published Dec 21, 2023 • 26

Align Your Gaussians: Text-to-4D with Dynamic 3D Gaussians and Composed Diffusion Models

Paper • 2312.13763 • Published Dec 21, 2023 • 9

PIA: Your Personalized Image Animator via Plug-and-Play Modules in Text-to-Image Models

Paper • 2312.13964 • Published Dec 21, 2023 • 18

Make-A-Character: High Quality Text-to-3D Character Generation within Minutes

Paper • 2312.15430 • Published Dec 24, 2023 • 28

A Recipe for Scaling up Text-to-Video Generation with Text-free Videos

Paper • 2312.15770 • Published Dec 25, 2023 • 12

Unsupervised Universal Image Segmentation

Paper • 2312.17243 • Published Dec 28, 2023 • 19

DreamGaussian4D: Generative 4D Gaussian Splatting

Paper • 2312.17142 • Published Dec 28, 2023 • 18

FlowVid: Taming Imperfect Optical Flows for Consistent Video-to-Video Synthesis

Paper • 2312.17681 • Published Dec 29, 2023 • 18

VideoDrafter: Content-Consistent Multi-Scene Video Generation with LLM

Paper • 2401.01256 • Published Jan 2 • 19

Image Sculpting: Precise Object Editing with 3D Geometry Control

Paper • 2401.01702 • Published Jan 2 • 18

MagicVideo-V2: Multi-Stage High-Aesthetic Video Generation

Paper • 2401.04468 • Published Jan 9 • 47

PIXART-δ: Fast and Controllable Image Generation with Latent Consistency Models

Paper • 2401.05252 • Published Jan 10 • 45

InseRF: Text-Driven Generative Object Insertion in Neural 3D Scenes

Paper • 2401.05335 • Published Jan 10 • 26

PALP: Prompt Aligned Personalization of Text-to-Image Models

Paper • 2401.06105 • Published Jan 11 • 46

Parrot: Pareto-optimal Multi-Reward Reinforcement Learning Framework for Text-to-Image Generation

Paper • 2401.05675 • Published Jan 11 • 20

TRIPS: Trilinear Point Splatting for Real-Time Radiance Field Rendering

Paper • 2401.06003 • Published Jan 11 • 20

InstantID: Zero-shot Identity-Preserving Generation in Seconds

Paper • 2401.07519 • Published Jan 15 • 51

Towards A Better Metric for Text-to-Video Generation

Paper • 2401.07781 • Published Jan 15 • 14

UniVG: Towards UNIfied-modal Video Generation

Paper • 2401.09084 • Published Jan 17 • 15

GARField: Group Anything with Radiance Fields

Paper • 2401.09419 • Published Jan 17 • 17

Quantum Denoising Diffusion Models

Paper • 2401.07049 • Published Jan 13 • 12

DiffusionGPT: LLM-Driven Text-to-Image Generation System

Paper • 2401.10061 • Published Jan 18 • 27

WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens

Paper • 2401.09985 • Published Jan 18 • 14

Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data

Paper • 2401.10891 • Published Jan 19 • 58

Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs

Paper • 2401.11708 • Published Jan 22 • 29

EmerDiff: Emerging Pixel-level Semantic Knowledge in Diffusion Models

Paper • 2401.11739 • Published Jan 22 • 16

Synthesizing Moving People with 3D Control

Paper • 2401.10889 • Published Jan 19 • 12

Scalable High-Resolution Pixel-Space Image Synthesis with Hourglass Diffusion Transformers

Paper • 2401.11605 • Published Jan 21 • 21

Lumiere: A Space-Time Diffusion Model for Video Generation

Paper • 2401.12945 • Published Jan 23 • 86

Large-scale Reinforcement Learning for Diffusion Models

Paper • 2401.12244 • Published Jan 20 • 28

Deconstructing Denoising Diffusion Models for Self-Supervised Learning

Paper • 2401.14404 • Published Jan 25 • 16

Diffuse to Choose: Enriching Image Conditioned Inpainting in Latent Diffusion Models for Virtual Try-All

Paper • 2401.13795 • Published Jan 24 • 65

Motion-I2V: Consistent and Controllable Image-to-Video Generation with Explicit Motion Modeling

Paper • 2401.15977 • Published Jan 29 • 36

StableIdentity: Inserting Anybody into Anywhere at First Sight

Paper • 2401.15975 • Published Jan 29 • 16

BlockFusion: Expandable 3D Scene Generation using Latent Tri-plane Extrapolation

Paper • 2401.17053 • Published Jan 30 • 30

Advances in 3D Generation: A Survey

Paper • 2401.17807 • Published Jan 31 • 17

Anything in Any Scene: Photorealistic Video Object Insertion

Paper • 2401.17509 • Published Jan 30 • 16

ReplaceAnything3D:Text-Guided 3D Scene Editing with Compositional Neural Radiance Fields

Paper • 2401.17895 • Published Jan 31 • 15

AnimateLCM: Accelerating the Animation of Personalized Diffusion Models and Adapters with Decoupled Consistency Learning

Paper • 2402.00769 • Published Feb 1 • 20

Boximator: Generating Rich and Controllable Motions for Video Synthesis

Paper • 2402.01566 • Published Feb 2 • 26

Training-Free Consistent Text-to-Image Generation

Paper • 2402.03286 • Published Feb 5 • 64

LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation

Paper • 2402.05054 • Published Feb 7 • 25

ConsistI2V: Enhancing Visual Consistency for Image-to-Video Generation

Paper • 2402.04324 • Published Feb 6 • 23

Magic-Me: Identity-Specific Video Customized Diffusion

Paper • 2402.09368 • Published Feb 14 • 26

Self-Play Fine-Tuning of Diffusion Models for Text-to-Image Generation

Paper • 2402.10210 • Published Feb 15 • 29

DreamMatcher: Appearance Matching Self-Attention for Semantically-Consistent Text-to-Image Personalization

Paper • 2402.09812 • Published Feb 15 • 12

GaussianObject: Just Taking Four Images to Get A High-Quality 3D Object with Gaussian Splatting

Paper • 2402.10259 • Published Feb 15 • 13

Neural Network Diffusion

Paper • 2402.13144 • Published Feb 20 • 94

MVDiffusion++: A Dense High-resolution Multi-view Diffusion Model for Single or Sparse-view 3D Object Reconstruction

Paper • 2402.12712 • Published Feb 20 • 15

Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis

Paper • 2402.14797 • Published Feb 22 • 19

Gen4Gen: Generative Data Pipeline for Generative Multi-Concept Composition

Paper • 2402.15504 • Published Feb 23 • 21

Multi-LoRA Composition for Image Generation

Paper • 2402.16843 • Published Feb 26 • 28

Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models

Paper • 2402.17177 • Published Feb 27 • 88

DiffuseKronA: A Parameter Efficient Fine-tuning Method for Personalized Diffusion Model

Paper • 2402.17412 • Published Feb 27 • 21

ViewFusion: Towards Multi-View Consistency via Interpolated Denoising

Paper • 2402.18842 • Published Feb 29 • 13

AtomoVideo: High Fidelity Image-to-Video Generation

Paper • 2403.01800 • Published Mar 4 • 20

OOTDiffusion: Outfitting Fusion based Latent Diffusion for Controllable Virtual Try-on

Paper • 2403.01779 • Published Mar 4 • 27

Scaling Rectified Flow Transformers for High-Resolution Image Synthesis

Paper • 2403.03206 • Published Mar 5 • 56

ResAdapter: Domain Consistent Resolution Adapter for Diffusion Models

Paper • 2403.02084 • Published Mar 4 • 14

Finetuned Multimodal Language Models Are High-Quality Image-Text Data Filters

Paper • 2403.02677 • Published Mar 5 • 16

PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation

Paper • 2403.04692 • Published Mar 7 • 40

VideoElevator: Elevating Video Generation Quality with Versatile Text-to-Image Diffusion Models

Paper • 2403.05438 • Published Mar 8 • 18

CogView3: Finer and Faster Text-to-Image Generation via Relay Diffusion

Paper • 2403.05121 • Published Mar 8 • 22

ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment

Paper • 2403.05135 • Published Mar 8 • 42

V3D: Video Diffusion Models are Effective 3D Generators

Paper • 2403.06738 • Published Mar 11 • 28

VLOGGER: Multimodal Diffusion for Embodied Avatar Synthesis

Paper • 2403.08764 • Published Mar 13 • 34

Video Editing via Factorized Diffusion Distillation

Paper • 2403.09334 • Published Mar 14 • 21

StreamMultiDiffusion: Real-Time Interactive Generation with Region-Based Semantic Control

Paper • 2403.09055 • Published Mar 14 • 24

SV3D: Novel Multi-view Synthesis and 3D Generation from a Single Image using Latent Video Diffusion

Paper • 2403.12008 • Published Mar 18 • 19

Generic 3D Diffusion Adapter Using Controlled Multi-View Editing

Paper • 2403.12032 • Published Mar 18 • 14

LightIt: Illumination Modeling and Control for Diffusion Models

Paper • 2403.10615 • Published Mar 15 • 16

Fast High-Resolution Image Synthesis with Latent Adversarial Diffusion Distillation

Paper • 2403.12015 • Published Mar 18 • 64

GaussianFlow: Splatting Gaussian Dynamics for 4D Content Creation

Paper • 2403.12365 • Published Mar 19 • 10

AnimateDiff-Lightning: Cross-Model Diffusion Distillation

Paper • 2403.12706 • Published Mar 19 • 17

RadSplat: Radiance Field-Informed Gaussian Splatting for Robust Real-Time Rendering with 900+ FPS

Paper • 2403.13806 • Published Mar 20 • 18

DreamReward: Text-to-3D Generation with Human Preference

Paper • 2403.14613 • Published Mar 21 • 35

AnyV2V: A Plug-and-Play Framework For Any Video-to-Video Editing Tasks

Paper • 2403.14468 • Published Mar 21 • 22

ReNoise: Real Image Inversion Through Iterative Noising

Paper • 2403.14602 • Published Mar 21 • 19

Efficient Video Diffusion Models via Content-Frame Motion-Latent Decomposition

Paper • 2403.14148 • Published Mar 21 • 18

GRM: Large Gaussian Reconstruction Model for Efficient 3D Reconstruction and Generation

Paper • 2403.14621 • Published Mar 21 • 14

FlashFace: Human Image Personalization with High-fidelity Identity Preservation

Paper • 2403.17008 • Published Mar 25 • 19

Be Yourself: Bounded Attention for Multi-Subject Text-to-Image Generation

Paper • 2403.16990 • Published Mar 25 • 25

SDXS: Real-Time One-Step Latent Diffusion Models with Image Conditions

Paper • 2403.16627 • Published Mar 25 • 20

Gamba: Marry Gaussian Splatting with Mamba for single view 3D reconstruction

Paper • 2403.18795 • Published Mar 27 • 18

ObjectDrop: Bootstrapping Counterfactuals for Photorealistic Object Removal and Insertion

Paper • 2403.18818 • Published Mar 27 • 25

EgoLifter: Open-world 3D Segmentation for Egocentric Perception

Paper • 2403.18118 • Published Mar 26 • 10

GaussianCube: Structuring Gaussian Splatting using Optimal Transport for 3D Generative Modeling

Paper • 2403.19655 • Published Mar 28 • 18

Getting it Right: Improving Spatial Consistency in Text-to-Image Models

Paper • 2404.01197 • Published Apr 1 • 30

FlexiDreamer: Single Image-to-3D Generation with FlexiCubes

Paper • 2404.00987 • Published Apr 1 • 21

CosmicMan: A Text-to-Image Foundation Model for Humans

Paper • 2404.01294 • Published Apr 1 • 15

Segment Any 3D Object with Language

Paper • 2404.02157 • Published Apr 2 • 2

CameraCtrl: Enabling Camera Control for Text-to-Video Generation

Paper • 2404.02101 • Published Apr 2 • 22

3D Congealing: 3D-Aware Image Alignment in the Wild

Paper • 2404.02125 • Published Apr 2 • 7

Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction

Paper • 2404.02905 • Published Apr 3 • 64

On the Scalability of Diffusion-based Text-to-Image Generation

Paper • 2404.02883 • Published Apr 3 • 17

InstantStyle: Free Lunch towards Style-Preserving in Text-to-Image Generation

Paper • 2404.02733 • Published Apr 3 • 20

Cross-Attention Makes Inference Cumbersome in Text-to-Image Diffusion Models

Paper • 2404.02747 • Published Apr 3 • 11

CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching

Paper • 2404.03653 • Published Apr 4 • 33

PointInfinity: Resolution-Invariant Point Diffusion Models

Paper • 2404.03566 • Published Apr 4 • 13

Freditor: High-Fidelity and Transferable NeRF Editing by Frequency Decomposition

Paper • 2404.02514 • Published Apr 3 • 9

Robust Gaussian Splatting

Paper • 2404.04211 • Published Apr 5 • 8

ByteEdit: Boost, Comply and Accelerate Generative Image Editing

Paper • 2404.04860 • Published Apr 7 • 24

UniFL: Improve Stable Diffusion via Unified Feedback Learning

Paper • 2404.05595 • Published Apr 8 • 23

MagicTime: Time-lapse Video Generation Models as Metamorphic Simulators

Paper • 2404.05014 • Published Apr 7 • 53

SwapAnything: Enabling Arbitrary Object Swapping in Personalized Visual Editing

Paper • 2404.05717 • Published Apr 8 • 24

Aligning Diffusion Models by Optimizing Human Utility

Paper • 2404.04465 • Published Apr 6 • 13

BeyondScene: Higher-Resolution Human-Centric Scene Generation With Pretrained Diffusion

Paper • 2404.04544 • Published Apr 6 • 20

DATENeRF: Depth-Aware Text-based Editing of NeRFs

Paper • 2404.04526 • Published Apr 6 • 9

Hash3D: Training-free Acceleration for 3D Generation

Paper • 2404.06091 • Published Apr 9 • 12

Revising Densification in Gaussian Splatting

Paper • 2404.06109 • Published Apr 9 • 8

Reconstructing Hand-Held Objects in 3D

Paper • 2404.06507 • Published Apr 9 • 5

Magic-Boost: Boost 3D Generation with Mutli-View Conditioned Diffusion

Paper • 2404.06429 • Published Apr 9 • 6

DreamScene360: Unconstrained Text-to-3D Scene Generation with Panoramic Gaussian Splatting

Paper • 2404.06903 • Published Apr 10 • 17

RealmDreamer: Text-Driven 3D Scene Generation with Inpainting and Depth Diffusion

Paper • 2404.07199 • Published Apr 10 • 25

ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback

Paper • 2404.07987 • Published Apr 11 • 47

Applying Guidance in a Limited Interval Improves Sample and Distribution Quality in Diffusion Models

Paper • 2404.07724 • Published Apr 11 • 12

Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse Controls to Any Diffusion Model

Paper • 2404.09967 • Published Apr 15 • 20

HQ-Edit: A High-Quality Dataset for Instruction-based Image Editing

Paper • 2404.09990 • Published Apr 15 • 12

EdgeFusion: On-Device Text-to-Image Generation

Paper • 2404.11925 • Published Apr 18 • 21

PhysDreamer: Physics-Based Interaction with 3D Objects via Video Generation

Paper • 2404.13026 • Published Apr 19 • 23

Does Gaussian Splatting need SFM Initialization?

Paper • 2404.12547 • Published Apr 18 • 8

Hyper-SD: Trajectory Segmented Consistency Model for Efficient Image Synthesis

Paper • 2404.13686 • Published Apr 21 • 27

Align Your Steps: Optimizing Sampling Schedules in Diffusion Models

Paper • 2404.14507 • Published Apr 22 • 21

PuLID: Pure and Lightning ID Customization via Contrastive Alignment

Paper • 2404.16022 • Published Apr 24 • 19

Interactive3D: Create What You Want by Interactive 3D Generation

Paper • 2404.16510 • Published Apr 25 • 18

NeRF-XL: Scaling NeRFs with Multiple GPUs

Paper • 2404.16221 • Published Apr 24 • 12

Revisiting Text-to-Image Evaluation with Gecko: On Metrics, Prompts, and Human Ratings

Paper • 2404.16820 • Published Apr 25 • 15

ConsistentID: Portrait Generation with Multimodal Fine-Grained Identity Preserving

Paper • 2404.16771 • Published Apr 25 • 16

HaLo-NeRF: Learning Geometry-Guided Semantics for Exploring Unconstrained Photo Collections

Paper • 2404.16845 • Published Feb 14 • 6

Stylus: Automatic Adapter Selection for Diffusion Models

Paper • 2404.18928 • Published Apr 29 • 14

InstantFamily: Masked Attention for Zero-shot Multi-ID Image Generation

Paper • 2404.19427 • Published Apr 30 • 71

MotionLCM: Real-time Controllable Motion Generation via Latent Consistency Model

Paper • 2404.19759 • Published Apr 30 • 24

GS-LRM: Large Reconstruction Model for 3D Gaussian Splatting

Paper • 2404.19702 • Published Apr 30 • 18

SAGS: Structure-Aware 3D Gaussian Splatting

Paper • 2404.19149 • Published Apr 29 • 13

Paint by Inpaint: Learning to Add Image Objects by Removing Them First

Paper • 2404.18212 • Published Apr 28 • 27

Spectrally Pruned Gaussian Fields with Neural Compensation

Paper • 2405.00676 • Published May 1 • 8

StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation

Paper • 2405.01434 • Published May 2 • 51

Customizing Text-to-Image Models with a Single Image Pair

Paper • 2405.01536 • Published May 2 • 18

Coin3D: Controllable and Interactive 3D Assets Generation with Proxy-Guided Conditioning

Paper • 2405.08054 • Published May 13 • 21

Compositional Text-to-Image Generation with Dense Blob Representations

Paper • 2405.08246 • Published May 14 • 12

CAT3D: Create Anything in 3D with Multi-View Diffusion Models

Paper • 2405.10314 • Published May 16 • 43

Toon3D: Seeing Cartoons from a New Perspective

Paper • 2405.10320 • Published May 16 • 19

Dual3D: Efficient and Consistent Text-to-3D Generation with Dual-mode Multi-view Latent Diffusion

Paper • 2405.09874 • Published May 16 • 16

FIFO-Diffusion: Generating Infinite Videos from Text without Training

Paper • 2405.11473 • Published May 19 • 53

Dreamer XL: Towards High-Resolution Text-to-3D Generation via Trajectory Score Matching

Paper • 2405.11252 • Published May 18 • 12

Face Adapter for Pre-Trained Diffusion Models with Fine-Grained ID and Attribute Control

Paper • 2405.12970 • Published May 21 • 22

Diffusion for World Modeling: Visual Details Matter in Atari

Paper • 2405.12399 • Published May 20 • 27

ReVideo: Remake a Video with Motion and Content Control

Paper • 2405.13865 • Published May 22 • 22

I2VEdit: First-Frame-Guided Video Editing via Image-to-Video Diffusion Models

Paper • 2405.16537 • Published May 26 • 15

Human4DiT: Free-view Human Video Generation with 4D Diffusion Transformer

Paper • 2405.17405 • Published May 27 • 14

Vidu4D: Single Generated Video to High-Fidelity 4D Reconstruction with Dynamic Gaussian Surfels

Paper • 2405.16822 • Published May 27 • 11

Part123: Part-aware 3D Reconstruction from a Single-view Image

Paper • 2405.16888 • Published May 27 • 10

Phased Consistency Model

Paper • 2405.18407 • Published May 28 • 46

GFlow: Recovering 4D World from Monocular Video

Paper • 2405.18426 • Published May 28 • 15

3DitScene: Editing Any Scene via Language-guided Disentangled Gaussian Splatting

Paper • 2405.18424 • Published May 28 • 7

T2V-Turbo: Breaking the Quality Bottleneck of Video Consistency Model with Mixed Reward Feedback

Paper • 2405.18750 • Published May 29 • 20

MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model

Paper • 2405.20222 • Published May 30 • 10

Learning Temporally Consistent Video Depth from Video Diffusion Priors

Paper • 2406.01493 • Published Jun 3 • 17

I4VGen: Image as Stepping Stone for Text-to-Video Generation

Paper • 2406.02230 • Published Jun 4 • 15

Guiding a Diffusion Model with a Bad Version of Itself

Paper • 2406.02507 • Published Jun 4 • 15

Ouroboros3D: Image-to-3D Generation via 3D-aware Recursive Diffusion

Paper • 2406.03184 • Published Jun 5 • 18

Step-aware Preference Optimization: Aligning Preference with Denoising Performance at Each Step

Paper • 2406.04314 • Published Jun 6 • 26

SF-V: Single Forward Video Generation Model

Paper • 2406.04324 • Published Jun 6 • 23

VideoTetris: Towards Compositional Text-to-Video Generation

Paper • 2406.04277 • Published Jun 6 • 22

pOps: Photo-Inspired Diffusion Operators

Paper • 2406.01300 • Published Jun 3 • 16

GenAI Arena: An Open Evaluation Platform for Generative Models

Paper • 2406.04485 • Published Jun 6 • 19

Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation

Paper • 2406.06525 • Published Jun 10 • 64

Lighting Every Darkness with 3DGS: Fast Training and Real-Time Rendering for HDR View Synthesis

Paper • 2406.06216 • Published Jun 10 • 18

GTR: Improving Large 3D Reconstruction Models through Geometry and Texture Refinement

Paper • 2406.05649 • Published Jun 9 • 7

Zero-shot Image Editing with Reference Imitation

Paper • 2406.07547 • Published Jun 11 • 30

An Image is Worth 32 Tokens for Reconstruction and Generation

Paper • 2406.07550 • Published Jun 11 • 55

NaRCan: Natural Refined Canonical Image with Integration of Diffusion Prior for Video Editing

Paper • 2406.06523 • Published Jun 10 • 50

MotionClone: Training-Free Motion Cloning for Controllable Video Generation

Paper • 2406.05338 • Published Jun 8 • 39

Physics3D: Learning Physical Properties of 3D Gaussians via Video Diffusion

Paper • 2406.04338 • Published Jun 6 • 34

FontStudio: Shape-Adaptive Diffusion Model for Coherent and Consistent Font Effect Generation

Paper • 2406.08392 • Published Jun 12 • 18

Hierarchical Patch Diffusion Models for High-Resolution Video Generation

Paper • 2406.07792 • Published Jun 12 • 13

AV-DiT: Efficient Audio-Visual Diffusion Transformer for Joint Audio and Video Generation

Paper • 2406.07686 • Published Jun 11 • 14

3D-GRAND: A Million-Scale Dataset for 3D-LLMs with Better Grounding and Less Hallucination

Paper • 2406.05132 • Published Jun 7 • 27

Alleviating Distortion in Image Generation via Multi-Resolution Diffusion Models

Paper • 2406.09416 • Published Jun 13 • 28

DiTFastAttn: Attention Compression for Diffusion Transformer Models

Paper • 2406.08552 • Published Jun 12 • 22

EMMA: Your Text-to-Image Diffusion Model Can Secretly Accept Multi-Modal Prompts

Paper • 2406.09162 • Published Jun 13 • 13

Make It Count: Text-to-Image Generation with an Accurate Number of Objects

Paper • 2406.10210 • Published Jun 14 • 76

Training-free Camera Control for Video Generation

Paper • 2406.10126 • Published Jun 14 • 12

HumanSplat: Generalizable Single-Image Human Gaussian Splatting with Structure Priors

Paper • 2406.12459 • Published Jun 18 • 11