kame062
's Collections
aigc and 3d
updated
One-for-All: Generalized LoRA for Parameter-Efficient Fine-tuning
Paper
•
2306.07967
•
Published
•
24
Rerender A Video: Zero-Shot Text-Guided Video-to-Video Translation
Paper
•
2306.07954
•
Published
•
113
TryOnDiffusion: A Tale of Two UNets
Paper
•
2306.08276
•
Published
•
72
Seeing the World through Your Eyes
Paper
•
2306.09348
•
Published
•
32
DreamHuman: Animatable 3D Avatars from Text
Paper
•
2306.09329
•
Published
•
15
AvatarBooth: High-Quality and Customizable 3D Human Avatar Generation
Paper
•
2306.09864
•
Published
•
14
MagicBrush: A Manually Annotated Dataset for Instruction-Guided Image
Editing
Paper
•
2306.10012
•
Published
•
35
One-2-3-45: Any Single Image to 3D Mesh in 45 Seconds without Per-Shape
Optimization
Paper
•
2306.16928
•
Published
•
38
DreamTime: An Improved Optimization Strategy for Text-to-3D Content
Creation
Paper
•
2306.12422
•
Published
•
12
DragDiffusion: Harnessing Diffusion Models for Interactive Point-based
Image Editing
Paper
•
2306.14435
•
Published
•
20
DreamDiffusion: Generating High-Quality Images from Brain EEG Signals
Paper
•
2306.16934
•
Published
•
31
Magic123: One Image to High-Quality 3D Object Generation Using Both 2D
and 3D Diffusion Priors
Paper
•
2306.17843
•
Published
•
43
Generate Anything Anywhere in Any Scene
Paper
•
2306.17154
•
Published
•
22
DisCo: Disentangled Control for Referring Human Dance Generation in Real
World
Paper
•
2307.00040
•
Published
•
25
LEDITS: Real Image Editing with DDPM Inversion and Semantic Guidance
Paper
•
2307.00522
•
Published
•
32
SDXL: Improving Latent Diffusion Models for High-Resolution Image
Synthesis
Paper
•
2307.01952
•
Published
•
82
DragonDiffusion: Enabling Drag-style Manipulation on Diffusion Models
Paper
•
2307.02421
•
Published
•
34
InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding
and Generation
Paper
•
2307.06942
•
Published
•
22
Sketch-A-Shape: Zero-Shot Sketch-to-3D Shape Generation
Paper
•
2307.03869
•
Published
•
22
AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models
without Specific Tuning
Paper
•
2307.04725
•
Published
•
64
HyperDreamBooth: HyperNetworks for Fast Personalization of Text-to-Image
Models
Paper
•
2307.06949
•
Published
•
50
DreamTeacher: Pretraining Image Backbones with Deep Generative Models
Paper
•
2307.07487
•
Published
•
19
Text2Layer: Layered Image Generation using Latent Diffusion Model
Paper
•
2307.09781
•
Published
•
14
FABRIC: Personalizing Diffusion Models with Iterative Feedback
Paper
•
2307.10159
•
Published
•
30
TokenFlow: Consistent Diffusion Features for Consistent Video Editing
Paper
•
2307.10373
•
Published
•
56
Subject-Diffusion:Open Domain Personalized Text-to-Image Generation
without Test-time Fine-tuning
Paper
•
2307.11410
•
Published
•
15
Interpolating between Images with Diffusion Models
Paper
•
2307.12560
•
Published
•
19
ImageBrush: Learning Visual In-Context Instructions for Exemplar-Based
Image Manipulation
Paper
•
2308.00906
•
Published
•
13
ConceptLab: Creative Generation using Diffusion Prior Constraints
Paper
•
2308.02669
•
Published
•
23
AvatarVerse: High-quality & Stable 3D Avatar Creation from Text and Pose
Paper
•
2308.03610
•
Published
•
23
3D Gaussian Splatting for Real-Time Radiance Field Rendering
Paper
•
2308.04079
•
Published
•
170
IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image
Diffusion Models
Paper
•
2308.06721
•
Published
•
29
Dual-Stream Diffusion Net for Text-to-Video Generation
Paper
•
2308.08316
•
Published
•
23
TeCH: Text-guided Reconstruction of Lifelike Clothed Humans
Paper
•
2308.08545
•
Published
•
33
MVDream: Multi-view Diffusion for 3D Generation
Paper
•
2308.16512
•
Published
•
102
VideoGen: A Reference-Guided Latent Diffusion Approach for High
Definition Text-to-Video Generation
Paper
•
2309.00398
•
Published
•
20
CityDreamer: Compositional Generative Model of Unbounded 3D Cities
Paper
•
2309.00610
•
Published
•
18
PhotoVerse: Tuning-Free Image Customization with Text-to-Image Diffusion
Models
Paper
•
2309.05793
•
Published
•
50
InstaFlow: One Step is Enough for High-Quality Diffusion-Based
Text-to-Image Generation
Paper
•
2309.06380
•
Published
•
32
LAVIE: High-Quality Video Generation with Cascaded Latent Diffusion
Models
Paper
•
2309.15103
•
Published
•
42
Emu: Enhancing Image Generation Models Using Photogenic Needles in a
Haystack
Paper
•
2309.15807
•
Published
•
32
Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video
Generation
Paper
•
2309.15818
•
Published
•
19
Text-to-3D using Gaussian Splatting
Paper
•
2309.16585
•
Published
•
31
DreamGaussian: Generative Gaussian Splatting for Efficient 3D Content
Creation
Paper
•
2309.16653
•
Published
•
46
PixArt-α: Fast Training of Diffusion Transformer for
Photorealistic Text-to-Image Synthesis
Paper
•
2310.00426
•
Published
•
61
Conditional Diffusion Distillation
Paper
•
2310.01407
•
Published
•
20
Kandinsky: an Improved Text-to-Image Synthesis with Image Prior and
Latent Diffusion
Paper
•
2310.03502
•
Published
•
77
Aligning Text-to-Image Diffusion Models with Reward Backpropagation
Paper
•
2310.03739
•
Published
•
21
MotionDirector: Motion Customization of Text-to-Video Diffusion Models
Paper
•
2310.08465
•
Published
•
14
GaussianDreamer: Fast Generation from Text to 3D Gaussian Splatting with
Point Cloud Priors
Paper
•
2310.08529
•
Published
•
17
HyperHuman: Hyper-Realistic Human Generation with Latent Structural
Diffusion
Paper
•
2310.08579
•
Published
•
14
4K4D: Real-Time 4D View Synthesis at 4K Resolution
Paper
•
2310.11448
•
Published
•
36
Wonder3D: Single Image to 3D using Cross-Domain Diffusion
Paper
•
2310.15008
•
Published
•
21
Matryoshka Diffusion Models
Paper
•
2310.15111
•
Published
•
40
DEsignBench: Exploring and Benchmarking DALL-E 3 for Imagining Visual
Design
Paper
•
2310.15144
•
Published
•
13
A Picture is Worth a Thousand Words: Principled Recaptioning Improves
Image Generation
Paper
•
2310.16656
•
Published
•
40
DreamCraft3D: Hierarchical 3D Generation with Bootstrapped Diffusion
Prior
Paper
•
2310.16818
•
Published
•
30
CommonCanvas: An Open Diffusion Model Trained with Creative-Commons
Images
Paper
•
2310.16825
•
Published
•
31
VideoCrafter1: Open Diffusion Models for High-Quality Video Generation
Paper
•
2310.19512
•
Published
•
15
Beyond U: Making Diffusion Models Faster & Lighter
Paper
•
2310.20092
•
Published
•
11
De-Diffusion Makes Text a Strong Cross-Modal Interface
Paper
•
2311.00618
•
Published
•
21
I2VGen-XL: High-Quality Image-to-Video Synthesis via Cascaded Diffusion
Models
Paper
•
2311.04145
•
Published
•
32
LCM-LoRA: A Universal Stable-Diffusion Acceleration Module
Paper
•
2311.05556
•
Published
•
80
Instant3D: Fast Text-to-3D with Sparse-View Generation and Large
Reconstruction Model
Paper
•
2311.06214
•
Published
•
29
One-2-3-45++: Fast Single Image to 3D Objects with Consistent Multi-View
Generation and 3D Diffusion
Paper
•
2311.07885
•
Published
•
39
Instant3D: Instant Text-to-3D Generation
Paper
•
2311.08403
•
Published
•
44
Drivable 3D Gaussian Avatars
Paper
•
2311.08581
•
Published
•
46
DMV3D: Denoising Multi-View Diffusion using 3D Large Reconstruction
Model
Paper
•
2311.09217
•
Published
•
21
UFOGen: You Forward Once Large Scale Text-to-Image Generation via
Diffusion GANs
Paper
•
2311.09257
•
Published
•
45
The Chosen One: Consistent Characters in Text-to-Image Diffusion Models
Paper
•
2311.10093
•
Published
•
57
MetaDreamer: Efficient Text-to-3D Creation With Disentangling Geometry
and Texture
Paper
•
2311.10123
•
Published
•
15
SelfEval: Leveraging the discriminative nature of generative models for
evaluation
Paper
•
2311.10708
•
Published
•
14
Emu Video: Factorizing Text-to-Video Generation by Explicit Image
Conditioning
Paper
•
2311.10709
•
Published
•
24
Text-to-Sticker: Style Tailoring Latent Diffusion Models for Human
Expression
Paper
•
2311.10794
•
Published
•
24
Make Pixels Dance: High-Dynamic Video Generation
Paper
•
2311.10982
•
Published
•
68
AutoStory: Generating Diverse Storytelling Images with Minimal Human
Effort
Paper
•
2311.11243
•
Published
•
14
LucidDreamer: Towards High-Fidelity Text-to-3D Generation via Interval
Score Matching
Paper
•
2311.11284
•
Published
•
16
PF-LRM: Pose-Free Large Reconstruction Model for Joint Pose and Shape
Prediction
Paper
•
2311.12024
•
Published
•
18
MagicDance: Realistic Human Dance Video Generation with Motions & Facial
Expressions Transfer
Paper
•
2311.12052
•
Published
•
32
Concept Sliders: LoRA Adaptors for Precise Control in Diffusion Models
Paper
•
2311.12092
•
Published
•
21
NeuroPrompts: An Adaptive Framework to Optimize Prompts for
Text-to-Image Generation
Paper
•
2311.12229
•
Published
•
26
Diffusion Model Alignment Using Direct Preference Optimization
Paper
•
2311.12908
•
Published
•
47
FusionFrames: Efficient Architectural Aspects for Text-to-Video
Generation Pipeline
Paper
•
2311.13073
•
Published
•
56
Using Human Feedback to Fine-tune Diffusion Models without Any Reward
Model
Paper
•
2311.13231
•
Published
•
26
LucidDreamer: Domain-free Generation of 3D Gaussian Splatting Scenes
Paper
•
2311.13384
•
Published
•
50
ZipLoRA: Any Subject in Any Style by Effectively Merging LoRAs
Paper
•
2311.13600
•
Published
•
42
VideoBooth: Diffusion-based Video Generation with Image Prompts
Paper
•
2312.00777
•
Published
•
20
VideoSwap: Customized Video Subject Swapping with Interactive Semantic
Point Correspondence
Paper
•
2312.02087
•
Published
•
20
ImageDream: Image-Prompt Multi-view Diffusion for 3D Generation
Paper
•
2312.02201
•
Published
•
31
X-Adapter: Adding Universal Compatibility of Plugins for Upgraded
Diffusion Model
Paper
•
2312.02238
•
Published
•
25
FaceStudio: Put Your Face Everywhere in Seconds
Paper
•
2312.02663
•
Published
•
30
DiffiT: Diffusion Vision Transformers for Image Generation
Paper
•
2312.02139
•
Published
•
13
VMC: Video Motion Customization using Temporal Attention Adaption for
Text-to-Video Diffusion Models
Paper
•
2312.00845
•
Published
•
36
DeepCache: Accelerating Diffusion Models for Free
Paper
•
2312.00858
•
Published
•
21
Analyzing and Improving the Training Dynamics of Diffusion Models
Paper
•
2312.02696
•
Published
•
31
Orthogonal Adaptation for Modular Customization of Diffusion Models
Paper
•
2312.02432
•
Published
•
12
LivePhoto: Real Image Animation with Text-guided Motion Control
Paper
•
2312.02928
•
Published
•
16
Fine-grained Controllable Video Generation via Object Appearance and
Context
Paper
•
2312.02919
•
Published
•
10
MotionCtrl: A Unified and Flexible Motion Controller for Video
Generation
Paper
•
2312.03641
•
Published
•
20
Controllable Human-Object Interaction Synthesis
Paper
•
2312.03913
•
Published
•
22
AnimateZero: Video Diffusion Models are Zero-Shot Image Animators
Paper
•
2312.03793
•
Published
•
17
PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding
Paper
•
2312.04461
•
Published
•
56
HyperDreamer: Hyper-Realistic 3D Content Generation and Editing from a
Single Image
Paper
•
2312.04543
•
Published
•
21
Smooth Diffusion: Crafting Smooth Latent Spaces in Diffusion Models
Paper
•
2312.04410
•
Published
•
14
DreaMoving: A Human Dance Video Generation Framework based on Diffusion
Models
Paper
•
2312.05107
•
Published
•
38
GenTron: Delving Deep into Diffusion Transformers for Image and Video
Generation
Paper
•
2312.04557
•
Published
•
12
Text-to-3D Generation with Bidirectional Diffusion using both 2D and 3D
priors
Paper
•
2312.04963
•
Published
•
16
Sherpa3D: Boosting High-Fidelity Text-to-3D Generation via Coarse 3D
Prior
Paper
•
2312.06655
•
Published
•
23
Photorealistic Video Generation with Diffusion Models
Paper
•
2312.06662
•
Published
•
23
FreeInit: Bridging Initialization Gap in Video Diffusion Models
Paper
•
2312.07537
•
Published
•
26
FreeControl: Training-Free Spatial Control of Any Text-to-Image
Diffusion Model with Any Condition
Paper
•
2312.07536
•
Published
•
16
DiffMorpher: Unleashing the Capability of Diffusion Models for Image
Morphing
Paper
•
2312.07409
•
Published
•
22
Clockwork Diffusion: Efficient Generation With Model-Step Distillation
Paper
•
2312.08128
•
Published
•
12
VideoLCM: Video Latent Consistency Model
Paper
•
2312.09109
•
Published
•
22
Mosaic-SDF for 3D Generative Models
Paper
•
2312.09222
•
Published
•
15
DreamTalk: When Expressive Talking Head Generation Meets Diffusion
Probabilistic Models
Paper
•
2312.09767
•
Published
•
25
Faster Diffusion: Rethinking the Role of UNet Encoder in Diffusion
Models
Paper
•
2312.09608
•
Published
•
13
FineControlNet: Fine-level Text Control for Image Generation with
Spatially Aligned Text Control Injection
Paper
•
2312.09252
•
Published
•
9
SCEdit: Efficient and Controllable Image Diffusion Generation via Skip
Connection Editing
Paper
•
2312.11392
•
Published
•
19
Rich Human Feedback for Text-to-Image Generation
Paper
•
2312.10240
•
Published
•
19
StreamDiffusion: A Pipeline-level Solution for Real-time Interactive
Generation
Paper
•
2312.12491
•
Published
•
69
InstructVideo: Instructing Video Diffusion Models with Human Feedback
Paper
•
2312.12490
•
Published
•
17
Fairy: Fast Parallelized Instruction-Guided Video-to-Video Synthesis
Paper
•
2312.13834
•
Published
•
26
DREAM-Talk: Diffusion-based Realistic Emotional Audio-driven Method for
Single Image Talking Face Generation
Paper
•
2312.13578
•
Published
•
26
Paint3D: Paint Anything 3D with Lighting-Less Texture Diffusion Models
Paper
•
2312.13913
•
Published
•
22
HD-Painter: High-Resolution and Prompt-Faithful Text-Guided Image
Inpainting with Diffusion Models
Paper
•
2312.14091
•
Published
•
15
DreamTuner: Single Image is Enough for Subject-Driven Generation
Paper
•
2312.13691
•
Published
•
26
Align Your Gaussians: Text-to-4D with Dynamic 3D Gaussians and Composed
Diffusion Models
Paper
•
2312.13763
•
Published
•
9
PIA: Your Personalized Image Animator via Plug-and-Play Modules in
Text-to-Image Models
Paper
•
2312.13964
•
Published
•
18
Make-A-Character: High Quality Text-to-3D Character Generation within
Minutes
Paper
•
2312.15430
•
Published
•
28
A Recipe for Scaling up Text-to-Video Generation with Text-free Videos
Paper
•
2312.15770
•
Published
•
12
Unsupervised Universal Image Segmentation
Paper
•
2312.17243
•
Published
•
19
DreamGaussian4D: Generative 4D Gaussian Splatting
Paper
•
2312.17142
•
Published
•
18
FlowVid: Taming Imperfect Optical Flows for Consistent Video-to-Video
Synthesis
Paper
•
2312.17681
•
Published
•
18
VideoDrafter: Content-Consistent Multi-Scene Video Generation with LLM
Paper
•
2401.01256
•
Published
•
19
Image Sculpting: Precise Object Editing with 3D Geometry Control
Paper
•
2401.01702
•
Published
•
18
MagicVideo-V2: Multi-Stage High-Aesthetic Video Generation
Paper
•
2401.04468
•
Published
•
47
PIXART-δ: Fast and Controllable Image Generation with Latent
Consistency Models
Paper
•
2401.05252
•
Published
•
45
InseRF: Text-Driven Generative Object Insertion in Neural 3D Scenes
Paper
•
2401.05335
•
Published
•
26
PALP: Prompt Aligned Personalization of Text-to-Image Models
Paper
•
2401.06105
•
Published
•
46
Parrot: Pareto-optimal Multi-Reward Reinforcement Learning Framework for
Text-to-Image Generation
Paper
•
2401.05675
•
Published
•
20
TRIPS: Trilinear Point Splatting for Real-Time Radiance Field Rendering
Paper
•
2401.06003
•
Published
•
20
InstantID: Zero-shot Identity-Preserving Generation in Seconds
Paper
•
2401.07519
•
Published
•
51
Towards A Better Metric for Text-to-Video Generation
Paper
•
2401.07781
•
Published
•
14
UniVG: Towards UNIfied-modal Video Generation
Paper
•
2401.09084
•
Published
•
15
GARField: Group Anything with Radiance Fields
Paper
•
2401.09419
•
Published
•
17
Quantum Denoising Diffusion Models
Paper
•
2401.07049
•
Published
•
12
DiffusionGPT: LLM-Driven Text-to-Image Generation System
Paper
•
2401.10061
•
Published
•
27
WorldDreamer: Towards General World Models for Video Generation via
Predicting Masked Tokens
Paper
•
2401.09985
•
Published
•
14
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data
Paper
•
2401.10891
•
Published
•
58
Mastering Text-to-Image Diffusion: Recaptioning, Planning, and
Generating with Multimodal LLMs
Paper
•
2401.11708
•
Published
•
29
EmerDiff: Emerging Pixel-level Semantic Knowledge in Diffusion Models
Paper
•
2401.11739
•
Published
•
16
Synthesizing Moving People with 3D Control
Paper
•
2401.10889
•
Published
•
12
Scalable High-Resolution Pixel-Space Image Synthesis with Hourglass
Diffusion Transformers
Paper
•
2401.11605
•
Published
•
21
Lumiere: A Space-Time Diffusion Model for Video Generation
Paper
•
2401.12945
•
Published
•
86
Large-scale Reinforcement Learning for Diffusion Models
Paper
•
2401.12244
•
Published
•
28
Deconstructing Denoising Diffusion Models for Self-Supervised Learning
Paper
•
2401.14404
•
Published
•
16
Diffuse to Choose: Enriching Image Conditioned Inpainting in Latent
Diffusion Models for Virtual Try-All
Paper
•
2401.13795
•
Published
•
65
Motion-I2V: Consistent and Controllable Image-to-Video Generation with
Explicit Motion Modeling
Paper
•
2401.15977
•
Published
•
36
StableIdentity: Inserting Anybody into Anywhere at First Sight
Paper
•
2401.15975
•
Published
•
16
BlockFusion: Expandable 3D Scene Generation using Latent Tri-plane
Extrapolation
Paper
•
2401.17053
•
Published
•
30
Advances in 3D Generation: A Survey
Paper
•
2401.17807
•
Published
•
17
Anything in Any Scene: Photorealistic Video Object Insertion
Paper
•
2401.17509
•
Published
•
16
ReplaceAnything3D:Text-Guided 3D Scene Editing with Compositional Neural
Radiance Fields
Paper
•
2401.17895
•
Published
•
15
AnimateLCM: Accelerating the Animation of Personalized Diffusion Models
and Adapters with Decoupled Consistency Learning
Paper
•
2402.00769
•
Published
•
20
Boximator: Generating Rich and Controllable Motions for Video Synthesis
Paper
•
2402.01566
•
Published
•
26
Training-Free Consistent Text-to-Image Generation
Paper
•
2402.03286
•
Published
•
64
LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content
Creation
Paper
•
2402.05054
•
Published
•
25
ConsistI2V: Enhancing Visual Consistency for Image-to-Video Generation
Paper
•
2402.04324
•
Published
•
23
Magic-Me: Identity-Specific Video Customized Diffusion
Paper
•
2402.09368
•
Published
•
26
Self-Play Fine-Tuning of Diffusion Models for Text-to-Image Generation
Paper
•
2402.10210
•
Published
•
29
DreamMatcher: Appearance Matching Self-Attention for
Semantically-Consistent Text-to-Image Personalization
Paper
•
2402.09812
•
Published
•
12
GaussianObject: Just Taking Four Images to Get A High-Quality 3D Object
with Gaussian Splatting
Paper
•
2402.10259
•
Published
•
13
Paper
•
2402.13144
•
Published
•
94
MVDiffusion++: A Dense High-resolution Multi-view Diffusion Model for
Single or Sparse-view 3D Object Reconstruction
Paper
•
2402.12712
•
Published
•
15
Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video
Synthesis
Paper
•
2402.14797
•
Published
•
19
Gen4Gen: Generative Data Pipeline for Generative Multi-Concept
Composition
Paper
•
2402.15504
•
Published
•
21
Multi-LoRA Composition for Image Generation
Paper
•
2402.16843
•
Published
•
28
Sora: A Review on Background, Technology, Limitations, and Opportunities
of Large Vision Models
Paper
•
2402.17177
•
Published
•
88
DiffuseKronA: A Parameter Efficient Fine-tuning Method for Personalized
Diffusion Model
Paper
•
2402.17412
•
Published
•
21
ViewFusion: Towards Multi-View Consistency via Interpolated Denoising
Paper
•
2402.18842
•
Published
•
13
AtomoVideo: High Fidelity Image-to-Video Generation
Paper
•
2403.01800
•
Published
•
20
OOTDiffusion: Outfitting Fusion based Latent Diffusion for Controllable
Virtual Try-on
Paper
•
2403.01779
•
Published
•
27
Scaling Rectified Flow Transformers for High-Resolution Image Synthesis
Paper
•
2403.03206
•
Published
•
56
ResAdapter: Domain Consistent Resolution Adapter for Diffusion Models
Paper
•
2403.02084
•
Published
•
14
Finetuned Multimodal Language Models Are High-Quality Image-Text Data
Filters
Paper
•
2403.02677
•
Published
•
16
PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K
Text-to-Image Generation
Paper
•
2403.04692
•
Published
•
40
VideoElevator: Elevating Video Generation Quality with Versatile
Text-to-Image Diffusion Models
Paper
•
2403.05438
•
Published
•
18
CogView3: Finer and Faster Text-to-Image Generation via Relay Diffusion
Paper
•
2403.05121
•
Published
•
22
ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment
Paper
•
2403.05135
•
Published
•
42
V3D: Video Diffusion Models are Effective 3D Generators
Paper
•
2403.06738
•
Published
•
28
VLOGGER: Multimodal Diffusion for Embodied Avatar Synthesis
Paper
•
2403.08764
•
Published
•
34
Video Editing via Factorized Diffusion Distillation
Paper
•
2403.09334
•
Published
•
21
StreamMultiDiffusion: Real-Time Interactive Generation with Region-Based
Semantic Control
Paper
•
2403.09055
•
Published
•
24
SV3D: Novel Multi-view Synthesis and 3D Generation from a Single Image
using Latent Video Diffusion
Paper
•
2403.12008
•
Published
•
19
Generic 3D Diffusion Adapter Using Controlled Multi-View Editing
Paper
•
2403.12032
•
Published
•
14
LightIt: Illumination Modeling and Control for Diffusion Models
Paper
•
2403.10615
•
Published
•
16
Fast High-Resolution Image Synthesis with Latent Adversarial Diffusion
Distillation
Paper
•
2403.12015
•
Published
•
64
GaussianFlow: Splatting Gaussian Dynamics for 4D Content Creation
Paper
•
2403.12365
•
Published
•
10
AnimateDiff-Lightning: Cross-Model Diffusion Distillation
Paper
•
2403.12706
•
Published
•
17
RadSplat: Radiance Field-Informed Gaussian Splatting for Robust
Real-Time Rendering with 900+ FPS
Paper
•
2403.13806
•
Published
•
18
DreamReward: Text-to-3D Generation with Human Preference
Paper
•
2403.14613
•
Published
•
35
AnyV2V: A Plug-and-Play Framework For Any Video-to-Video Editing Tasks
Paper
•
2403.14468
•
Published
•
22
ReNoise: Real Image Inversion Through Iterative Noising
Paper
•
2403.14602
•
Published
•
19
Efficient Video Diffusion Models via Content-Frame Motion-Latent
Decomposition
Paper
•
2403.14148
•
Published
•
18
GRM: Large Gaussian Reconstruction Model for Efficient 3D Reconstruction
and Generation
Paper
•
2403.14621
•
Published
•
14
FlashFace: Human Image Personalization with High-fidelity Identity
Preservation
Paper
•
2403.17008
•
Published
•
19
Be Yourself: Bounded Attention for Multi-Subject Text-to-Image
Generation
Paper
•
2403.16990
•
Published
•
25
SDXS: Real-Time One-Step Latent Diffusion Models with Image Conditions
Paper
•
2403.16627
•
Published
•
20
Gamba: Marry Gaussian Splatting with Mamba for single view 3D
reconstruction
Paper
•
2403.18795
•
Published
•
18
ObjectDrop: Bootstrapping Counterfactuals for Photorealistic Object
Removal and Insertion
Paper
•
2403.18818
•
Published
•
25
EgoLifter: Open-world 3D Segmentation for Egocentric Perception
Paper
•
2403.18118
•
Published
•
10
GaussianCube: Structuring Gaussian Splatting using Optimal Transport for
3D Generative Modeling
Paper
•
2403.19655
•
Published
•
18
Getting it Right: Improving Spatial Consistency in Text-to-Image Models
Paper
•
2404.01197
•
Published
•
30
FlexiDreamer: Single Image-to-3D Generation with FlexiCubes
Paper
•
2404.00987
•
Published
•
21
CosmicMan: A Text-to-Image Foundation Model for Humans
Paper
•
2404.01294
•
Published
•
15
Segment Any 3D Object with Language
Paper
•
2404.02157
•
Published
•
2
CameraCtrl: Enabling Camera Control for Text-to-Video Generation
Paper
•
2404.02101
•
Published
•
22
3D Congealing: 3D-Aware Image Alignment in the Wild
Paper
•
2404.02125
•
Published
•
7
Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale
Prediction
Paper
•
2404.02905
•
Published
•
64
On the Scalability of Diffusion-based Text-to-Image Generation
Paper
•
2404.02883
•
Published
•
17
InstantStyle: Free Lunch towards Style-Preserving in Text-to-Image
Generation
Paper
•
2404.02733
•
Published
•
20
Cross-Attention Makes Inference Cumbersome in Text-to-Image Diffusion
Models
Paper
•
2404.02747
•
Published
•
11
CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept
Matching
Paper
•
2404.03653
•
Published
•
33
PointInfinity: Resolution-Invariant Point Diffusion Models
Paper
•
2404.03566
•
Published
•
13
Freditor: High-Fidelity and Transferable NeRF Editing by Frequency
Decomposition
Paper
•
2404.02514
•
Published
•
9
Robust Gaussian Splatting
Paper
•
2404.04211
•
Published
•
8
ByteEdit: Boost, Comply and Accelerate Generative Image Editing
Paper
•
2404.04860
•
Published
•
24
UniFL: Improve Stable Diffusion via Unified Feedback Learning
Paper
•
2404.05595
•
Published
•
23
MagicTime: Time-lapse Video Generation Models as Metamorphic Simulators
Paper
•
2404.05014
•
Published
•
53
SwapAnything: Enabling Arbitrary Object Swapping in Personalized Visual
Editing
Paper
•
2404.05717
•
Published
•
24
Aligning Diffusion Models by Optimizing Human Utility
Paper
•
2404.04465
•
Published
•
13
BeyondScene: Higher-Resolution Human-Centric Scene Generation With
Pretrained Diffusion
Paper
•
2404.04544
•
Published
•
20
DATENeRF: Depth-Aware Text-based Editing of NeRFs
Paper
•
2404.04526
•
Published
•
9
Hash3D: Training-free Acceleration for 3D Generation
Paper
•
2404.06091
•
Published
•
12
Revising Densification in Gaussian Splatting
Paper
•
2404.06109
•
Published
•
8
Reconstructing Hand-Held Objects in 3D
Paper
•
2404.06507
•
Published
•
5
Magic-Boost: Boost 3D Generation with Mutli-View Conditioned Diffusion
Paper
•
2404.06429
•
Published
•
6
DreamScene360: Unconstrained Text-to-3D Scene Generation with Panoramic
Gaussian Splatting
Paper
•
2404.06903
•
Published
•
17
RealmDreamer: Text-Driven 3D Scene Generation with Inpainting and Depth
Diffusion
Paper
•
2404.07199
•
Published
•
25
ControlNet++: Improving Conditional Controls with Efficient Consistency
Feedback
Paper
•
2404.07987
•
Published
•
47
Applying Guidance in a Limited Interval Improves Sample and Distribution
Quality in Diffusion Models
Paper
•
2404.07724
•
Published
•
12
Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse
Controls to Any Diffusion Model
Paper
•
2404.09967
•
Published
•
20
HQ-Edit: A High-Quality Dataset for Instruction-based Image Editing
Paper
•
2404.09990
•
Published
•
12
EdgeFusion: On-Device Text-to-Image Generation
Paper
•
2404.11925
•
Published
•
21
PhysDreamer: Physics-Based Interaction with 3D Objects via Video
Generation
Paper
•
2404.13026
•
Published
•
23
Does Gaussian Splatting need SFM Initialization?
Paper
•
2404.12547
•
Published
•
8
Hyper-SD: Trajectory Segmented Consistency Model for Efficient Image
Synthesis
Paper
•
2404.13686
•
Published
•
27
Align Your Steps: Optimizing Sampling Schedules in Diffusion Models
Paper
•
2404.14507
•
Published
•
21
PuLID: Pure and Lightning ID Customization via Contrastive Alignment
Paper
•
2404.16022
•
Published
•
19
Interactive3D: Create What You Want by Interactive 3D Generation
Paper
•
2404.16510
•
Published
•
18
NeRF-XL: Scaling NeRFs with Multiple GPUs
Paper
•
2404.16221
•
Published
•
12
Revisiting Text-to-Image Evaluation with Gecko: On Metrics, Prompts, and
Human Ratings
Paper
•
2404.16820
•
Published
•
15
ConsistentID: Portrait Generation with Multimodal Fine-Grained Identity
Preserving
Paper
•
2404.16771
•
Published
•
16
HaLo-NeRF: Learning Geometry-Guided Semantics for Exploring
Unconstrained Photo Collections
Paper
•
2404.16845
•
Published
•
6
Stylus: Automatic Adapter Selection for Diffusion Models
Paper
•
2404.18928
•
Published
•
14
InstantFamily: Masked Attention for Zero-shot Multi-ID Image Generation
Paper
•
2404.19427
•
Published
•
71
MotionLCM: Real-time Controllable Motion Generation via Latent
Consistency Model
Paper
•
2404.19759
•
Published
•
24
GS-LRM: Large Reconstruction Model for 3D Gaussian Splatting
Paper
•
2404.19702
•
Published
•
18
SAGS: Structure-Aware 3D Gaussian Splatting
Paper
•
2404.19149
•
Published
•
13
Paint by Inpaint: Learning to Add Image Objects by Removing Them First
Paper
•
2404.18212
•
Published
•
27
Spectrally Pruned Gaussian Fields with Neural Compensation
Paper
•
2405.00676
•
Published
•
8
StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video
Generation
Paper
•
2405.01434
•
Published
•
51
Customizing Text-to-Image Models with a Single Image Pair
Paper
•
2405.01536
•
Published
•
18
Coin3D: Controllable and Interactive 3D Assets Generation with
Proxy-Guided Conditioning
Paper
•
2405.08054
•
Published
•
21
Compositional Text-to-Image Generation with Dense Blob Representations
Paper
•
2405.08246
•
Published
•
12
CAT3D: Create Anything in 3D with Multi-View Diffusion Models
Paper
•
2405.10314
•
Published
•
43
Toon3D: Seeing Cartoons from a New Perspective
Paper
•
2405.10320
•
Published
•
19
Dual3D: Efficient and Consistent Text-to-3D Generation with Dual-mode
Multi-view Latent Diffusion
Paper
•
2405.09874
•
Published
•
16
FIFO-Diffusion: Generating Infinite Videos from Text without Training
Paper
•
2405.11473
•
Published
•
53
Dreamer XL: Towards High-Resolution Text-to-3D Generation via Trajectory
Score Matching
Paper
•
2405.11252
•
Published
•
12
Face Adapter for Pre-Trained Diffusion Models with Fine-Grained ID and
Attribute Control
Paper
•
2405.12970
•
Published
•
22
Diffusion for World Modeling: Visual Details Matter in Atari
Paper
•
2405.12399
•
Published
•
27
ReVideo: Remake a Video with Motion and Content Control
Paper
•
2405.13865
•
Published
•
22
I2VEdit: First-Frame-Guided Video Editing via Image-to-Video Diffusion
Models
Paper
•
2405.16537
•
Published
•
15
Human4DiT: Free-view Human Video Generation with 4D Diffusion
Transformer
Paper
•
2405.17405
•
Published
•
14
Vidu4D: Single Generated Video to High-Fidelity 4D Reconstruction with
Dynamic Gaussian Surfels
Paper
•
2405.16822
•
Published
•
11
Part123: Part-aware 3D Reconstruction from a Single-view Image
Paper
•
2405.16888
•
Published
•
10
Paper
•
2405.18407
•
Published
•
46
GFlow: Recovering 4D World from Monocular Video
Paper
•
2405.18426
•
Published
•
15
3DitScene: Editing Any Scene via Language-guided Disentangled Gaussian
Splatting
Paper
•
2405.18424
•
Published
•
7
T2V-Turbo: Breaking the Quality Bottleneck of Video Consistency Model
with Mixed Reward Feedback
Paper
•
2405.18750
•
Published
•
20
MOFA-Video: Controllable Image Animation via Generative Motion Field
Adaptions in Frozen Image-to-Video Diffusion Model
Paper
•
2405.20222
•
Published
•
10
Learning Temporally Consistent Video Depth from Video Diffusion Priors
Paper
•
2406.01493
•
Published
•
17
I4VGen: Image as Stepping Stone for Text-to-Video Generation
Paper
•
2406.02230
•
Published
•
15
Guiding a Diffusion Model with a Bad Version of Itself
Paper
•
2406.02507
•
Published
•
15
Ouroboros3D: Image-to-3D Generation via 3D-aware Recursive Diffusion
Paper
•
2406.03184
•
Published
•
18
Step-aware Preference Optimization: Aligning Preference with Denoising
Performance at Each Step
Paper
•
2406.04314
•
Published
•
26
SF-V: Single Forward Video Generation Model
Paper
•
2406.04324
•
Published
•
23
VideoTetris: Towards Compositional Text-to-Video Generation
Paper
•
2406.04277
•
Published
•
22
pOps: Photo-Inspired Diffusion Operators
Paper
•
2406.01300
•
Published
•
16
GenAI Arena: An Open Evaluation Platform for Generative Models
Paper
•
2406.04485
•
Published
•
19
Autoregressive Model Beats Diffusion: Llama for Scalable Image
Generation
Paper
•
2406.06525
•
Published
•
64
Lighting Every Darkness with 3DGS: Fast Training and Real-Time Rendering
for HDR View Synthesis
Paper
•
2406.06216
•
Published
•
18
GTR: Improving Large 3D Reconstruction Models through Geometry and
Texture Refinement
Paper
•
2406.05649
•
Published
•
7
Zero-shot Image Editing with Reference Imitation
Paper
•
2406.07547
•
Published
•
30
An Image is Worth 32 Tokens for Reconstruction and Generation
Paper
•
2406.07550
•
Published
•
55
NaRCan: Natural Refined Canonical Image with Integration of Diffusion
Prior for Video Editing
Paper
•
2406.06523
•
Published
•
50
MotionClone: Training-Free Motion Cloning for Controllable Video
Generation
Paper
•
2406.05338
•
Published
•
39
Physics3D: Learning Physical Properties of 3D Gaussians via Video
Diffusion
Paper
•
2406.04338
•
Published
•
34
FontStudio: Shape-Adaptive Diffusion Model for Coherent and Consistent
Font Effect Generation
Paper
•
2406.08392
•
Published
•
18
Hierarchical Patch Diffusion Models for High-Resolution Video Generation
Paper
•
2406.07792
•
Published
•
13
AV-DiT: Efficient Audio-Visual Diffusion Transformer for Joint Audio and
Video Generation
Paper
•
2406.07686
•
Published
•
14
3D-GRAND: A Million-Scale Dataset for 3D-LLMs with Better Grounding and
Less Hallucination
Paper
•
2406.05132
•
Published
•
27
Alleviating Distortion in Image Generation via Multi-Resolution
Diffusion Models
Paper
•
2406.09416
•
Published
•
28
DiTFastAttn: Attention Compression for Diffusion Transformer Models
Paper
•
2406.08552
•
Published
•
22
EMMA: Your Text-to-Image Diffusion Model Can Secretly Accept Multi-Modal
Prompts
Paper
•
2406.09162
•
Published
•
13
Make It Count: Text-to-Image Generation with an Accurate Number of
Objects
Paper
•
2406.10210
•
Published
•
76
Training-free Camera Control for Video Generation
Paper
•
2406.10126
•
Published
•
12
HumanSplat: Generalizable Single-Image Human Gaussian Splatting with
Structure Priors
Paper
•
2406.12459
•
Published
•
11