Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2404.11614

Papers - Animation - Text

Dynamic Typography: Bringing Words to Life

Paper • 2404.11614 • Published Apr 17 • 44

Papers - University of Tel-Aviv

Analyzing Transformers in Embedding Space

Paper • 2209.02535 • Published Sep 6, 2022 • 3
Prompt-to-Prompt Image Editing with Cross Attention Control

Paper • 2208.01626 • Published Aug 2, 2022 • 2
Dynamic Typography: Bringing Words to Life

Paper • 2404.11614 • Published Apr 17 • 44
Transformer Language Models without Positional Encodings Still Learn Positional Information

Paper • 2203.16634 • Published Mar 30, 2022 • 5

Papers - University - Hong Kong University of Science and Te

Event Camera Demosaicing via Swin Transformer and Pixel-focus Loss

Paper • 2404.02731 • Published Apr 3 • 1
MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models

Paper • 2309.12284 • Published Sep 21, 2023 • 18
RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis

Paper • 2404.03204 • Published Apr 4 • 7
Adapting LLaMA Decoder to Vision Transformer

Paper • 2404.06773 • Published Apr 10 • 17

Noise Dimension of GAN: An Image Compression Perspective

Paper • 2403.09196 • Published Mar 14 • 2
Dynamic Typography: Bringing Words to Life

Paper • 2404.11614 • Published Apr 17 • 44

Video as the New Language for Real-World Decision Making

Paper • 2402.17139 • Published Feb 27 • 18
VideoCrafter1: Open Diffusion Models for High-Quality Video Generation

Paper • 2310.19512 • Published Oct 30, 2023 • 15
VideoMamba: State Space Model for Efficient Video Understanding

Paper • 2403.06977 • Published Mar 11 • 27
VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models

Paper • 2401.09047 • Published Jan 17 • 13

about 8 hours ago

Compose and Conquer: Diffusion-Based 3D Depth Aware Composable Image Synthesis

Paper • 2401.09048 • Published Jan 17 • 9
Improving fine-grained understanding in image-text pre-training

Paper • 2401.09865 • Published Jan 18 • 16
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data

Paper • 2401.10891 • Published Jan 19 • 59
Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild

Paper • 2401.13627 • Published Jan 24 • 73

StreamDiffusion: A Pipeline-level Solution for Real-time Interactive Generation

Paper • 2312.12491 • Published Dec 19, 2023 • 69
Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs

Paper • 2401.11708 • Published Jan 22 • 30
Training-Free Consistent Text-to-Image Generation

Paper • 2402.03286 • Published Feb 5 • 65
PALP: Prompt Aligned Personalization of Text-to-Image Models

Paper • 2401.06105 • Published Jan 11 • 47

VideoBooth: Diffusion-based Video Generation with Image Prompts

Paper • 2312.00777 • Published Dec 1, 2023 • 21
MotionCtrl: A Unified and Flexible Motion Controller for Video Generation

Paper • 2312.03641 • Published Dec 6, 2023 • 20
GenTron: Delving Deep into Diffusion Transformers for Image and Video Generation

Paper • 2312.04557 • Published Dec 7, 2023 • 12
DreamVideo: Composing Your Dream Videos with Customized Subject and Motion

Paper • 2312.04433 • Published Dec 7, 2023 • 9

Generative Multiple Modality

Random Field Augmentations for Self-Supervised Representation Learning

Paper • 2311.03629 • Published Nov 7, 2023 • 6
TEAL: Tokenize and Embed ALL for Multi-modal Large Language Models

Paper • 2311.04589 • Published Nov 8, 2023 • 18
GENOME: GenerativE Neuro-symbOlic visual reasoning by growing and reusing ModulEs

Paper • 2311.04901 • Published Nov 8, 2023 • 7
Q-Instruct: Improving Low-level Visual Abilities for Multi-modality Foundation Models

Paper • 2311.06783 • Published Nov 12, 2023 • 26

Previous
1
2
Next

Company

© Hugging Face

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs