Models
Datasets
Spaces
Posts
Docs
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2409.11355

For Content Creator

Generative AI meets 3D: A Survey on Text-to-3D in AIGC Era

Paper • 2305.06131 • Published May 10, 2023 • 2
Perpetual Humanoid Control for Real-time Simulated Avatars

Paper • 2305.06456 • Published May 10, 2023 • 1
Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold

Paper • 2305.10973 • Published May 18, 2023 • 32
LDM3D: Latent Diffusion Model for 3D

Paper • 2305.10853 • Published May 18, 2023 • 10

Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think

Paper • 2409.11355 • Published Sep 17 • 28
Training Language Models to Self-Correct via Reinforcement Learning

Paper • 2409.12917 • Published Sep 19 • 134
Beyond Fine-tuning: Unleashing the Potential of Continuous Pretraining for Clinical LLMs

Paper • 2409.14988 • Published Sep 23 • 21

Diffusion Models

SwiftBrush v2: Make Your One-step Diffusion Model Better Than Its Teacher

Paper • 2408.14176 • Published Aug 26 • 59
Diffusion Models Are Real-Time Game Engines

Paper • 2408.14837 • Published Aug 27 • 121
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model

Paper • 2408.11039 • Published Aug 20 • 56
OD-VAE: An Omni-dimensional Video Compressor for Improving Latent Video Diffusion Model

Paper • 2409.01199 • Published Sep 2 • 12

AI Math: Diffusion

Controllable Text Generation for Large Language Models: A Survey

Paper • 2408.12599 • Published Aug 22 • 62
xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations

Paper • 2408.12590 • Published Aug 22 • 33
Real-Time Video Generation with Pyramid Attention Broadcast

Paper • 2408.12588 • Published Aug 22 • 14
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model

Paper • 2408.11039 • Published Aug 20 • 56

MambaVision: A Hybrid Mamba-Transformer Vision Backbone

Paper • 2407.08083 • Published Jul 10 • 27
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model

Paper • 2408.11039 • Published Aug 20 • 56
The Mamba in the Llama: Distilling and Accelerating Hybrid Models

Paper • 2408.15237 • Published Aug 27 • 36
Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think

Paper • 2409.11355 • Published Sep 17 • 28

Generative Model

SelfEval: Leveraging the discriminative nature of generative models for evaluation

Paper • 2311.10708 • Published Nov 17, 2023 • 14
OmniGen: Unified Image Generation

Paper • 2409.11340 • Published Sep 17 • 106
NVLM: Open Frontier-Class Multimodal LLMs

Paper • 2409.11402 • Published Sep 17 • 71
Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think

Paper • 2409.11355 • Published Sep 17 • 28

Alleviating Distortion in Image Generation via Multi-Resolution Diffusion Models

Paper • 2406.09416 • Published Jun 13 • 28
Wavelets Are All You Need for Autoregressive Image Generation

Paper • 2406.19997 • Published Jun 28 • 29
ViPer: Visual Personalization of Generative Models via Individual Preference Learning

Paper • 2407.17365 • Published Jul 24 • 11
MegaFusion: Extend Diffusion Models towards Higher-resolution Image Generation without Further Tuning

Paper • 2408.11001 • Published Aug 20 • 11

Stable Diffusion

about 17 hours ago

Zero-shot Image Editing with Reference Imitation

Paper • 2406.07547 • Published Jun 11 • 30
Scalable Autoregressive Image Generation with Mamba

Paper • 2408.12245 • Published Aug 22 • 23
Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think

Paper • 2409.11355 • Published Sep 17 • 28
Training-free Regional Prompting for Diffusion Transformers

Paper • 2411.02395 • Published 6 days ago • 22

MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training

Paper • 2311.17049 • Published Nov 28, 2023
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

Paper • 2405.04434 • Published May 7 • 13
A Study of Autoregressive Decoders for Multi-Tasking in Computer Vision

Paper • 2303.17376 • Published Mar 30, 2023
Sigmoid Loss for Language Image Pre-Training

Paper • 2303.15343 • Published Mar 27, 2023 • 4

LocalMamba: Visual State Space Model with Windowed Selective Scan

Paper • 2403.09338 • Published Mar 14 • 7
GiT: Towards Generalist Vision Transformer through Universal Language Interface

Paper • 2403.09394 • Published Mar 14 • 25
Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers

Paper • 2402.19479 • Published Feb 29 • 32
Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection

Paper • 2405.10300 • Published May 16 • 26

Previous
1
2
Next

Company

© Hugging Face

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs