MIMO: Controllable Character Video Synthesis with Spatial Decomposed Modeling Paper • 2409.16160 • Published 8 days ago • 28
StreamDiffusion: A Pipeline-level Solution for Real-time Interactive Generation Paper • 2312.12491 • Published Dec 19, 2023 • 69
DreamHOI: Subject-Driven Generation of 3D Human-Object Interactions with Diffusion Priors Paper • 2409.08278 • Published 20 days ago • 10
GST: Precise 3D Human Body from a Single Image with Gaussian Splatting Transformers Paper • 2409.04196 • Published 26 days ago • 11
view article Article Fine-tune Llama 3.1 Ultra-Efficiently with Unsloth By mlabonne • Jul 29 • 206
view article Article Fine-tuning Florence-2 - Microsoft's Cutting-edge Vision Language Models Jun 24 • 168
Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation Paper • 2406.06525 • Published Jun 10 • 64
Face Adapter for Pre-Trained Diffusion Models with Fine-Grained ID and Attribute Control Paper • 2405.12970 • Published May 21 • 22
FIFO-Diffusion: Generating Infinite Videos from Text without Training Paper • 2405.11473 • Published May 19 • 53
view article Article Powerful ASR + diarization + speculative decoding with Hugging Face Inference Endpoints May 1 • 63
Stylus: Automatic Adapter Selection for Diffusion Models Paper • 2404.18928 • Published Apr 29 • 14
view article Article StarCoder2-Instruct: Fully Transparent and Permissive Self-Alignment for Code Generation Apr 29 • 71
Align Your Steps: Optimizing Sampling Schedules in Diffusion Models Paper • 2404.14507 • Published Apr 22 • 21
Hyper-SD: Trajectory Segmented Consistency Model for Efficient Image Synthesis Paper • 2404.13686 • Published Apr 21 • 27
How Good Are Low-bit Quantized LLaMA3 Models? An Empirical Study Paper • 2404.14047 • Published Apr 22 • 43
RealmDreamer: Text-Driven 3D Scene Generation with Inpainting and Depth Diffusion Paper • 2404.07199 • Published Apr 10 • 25
SDXS: Real-Time One-Step Latent Diffusion Models with Image Conditions Paper • 2403.16627 • Published Mar 25 • 20
DepthFM: Fast Monocular Depth Estimation with Flow Matching Paper • 2403.13788 • Published Mar 20 • 16
mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding Paper • 2403.12895 • Published Mar 19 • 29
StreamMultiDiffusion: Real-Time Interactive Generation with Region-Based Semantic Control Paper • 2403.09055 • Published Mar 14 • 24
LN3Diff: Scalable Latent Neural Fields Diffusion for Speedy 3D Generation Paper • 2403.12019 • Published Mar 18 • 8
VFusion3D: Learning Scalable 3D Generative Models from Video Diffusion Models Paper • 2403.12034 • Published Mar 18 • 5
MoAI: Mixture of All Intelligence for Large Language and Vision Models Paper • 2403.07508 • Published Mar 12 • 75
V3D: Video Diffusion Models are Effective 3D Generators Paper • 2403.06738 • Published Mar 11 • 28
Adding NVMe SSDs to Enable and Accelerate 100B Model Fine-tuning on a Single GPU Paper • 2403.06504 • Published Mar 11 • 53
Divide-or-Conquer? Which Part Should You Distill Your LLM? Paper • 2402.15000 • Published Feb 22 • 22
FlashTex: Fast Relightable Mesh Texturing with LightControlNet Paper • 2402.13251 • Published Feb 20 • 13
StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback Paper • 2402.01391 • Published Feb 2 • 41
Agile But Safe: Learning Collision-Free High-Speed Legged Locomotion Paper • 2401.17583 • Published Jan 31 • 25
Media2Face: Co-speech Facial Animation Generation With Multi-Modality Guidance Paper • 2401.15687 • Published Jan 28 • 21
Unitxt: Flexible, Shareable and Reusable Data Preparation and Evaluation for Generative AI Paper • 2401.14019 • Published Jan 25 • 19
TextureDreamer: Image-guided Texture Synthesis through Geometry-aware Diffusion Paper • 2401.09416 • Published Jan 17 • 9
En3D: An Enhanced Generative Model for Sculpting 3D Humans from 2D Synthetic Data Paper • 2401.01173 • Published Jan 2 • 11
Understanding LLMs: A Comprehensive Overview from Training to Inference Paper • 2401.02038 • Published Jan 4 • 61
LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning Paper • 2401.01325 • Published Jan 2 • 26
DocLLM: A layout-aware generative language model for multimodal document understanding Paper • 2401.00908 • Published Dec 31, 2023 • 178
VideoDrafter: Content-Consistent Multi-Scene Video Generation with LLM Paper • 2401.01256 • Published Jan 2 • 19
Improving Text Embeddings with Large Language Models Paper • 2401.00368 • Published Dec 31, 2023 • 79
FlowVid: Taming Imperfect Optical Flows for Consistent Video-to-Video Synthesis Paper • 2312.17681 • Published Dec 29, 2023 • 18
LARP: Language-Agent Role Play for Open-World Games Paper • 2312.17653 • Published Dec 24, 2023 • 29
Compact Neural Graphics Primitives with Learned Hash Probing Paper • 2312.17241 • Published Dec 28, 2023 • 6
DreamGaussian4D: Generative 4D Gaussian Splatting Paper • 2312.17142 • Published Dec 28, 2023 • 18
Make-A-Character: High Quality Text-to-3D Character Generation within Minutes Paper • 2312.15430 • Published Dec 24, 2023 • 28
Human101: Training 100+FPS Human Gaussians in 100s from 1 View Paper • 2312.15258 • Published Dec 23, 2023 • 7
MagicScroll: Nontypical Aspect-Ratio Image Generation for Visual Storytelling via Multi-Layered Semantic-Aware Denoising Paper • 2312.10899 • Published Dec 18, 2023 • 14
VCoder: Versatile Vision Encoders for Multimodal Large Language Models Paper • 2312.14233 • Published Dec 21, 2023 • 15
Neural feels with neural fields: Visuo-tactile perception for in-hand manipulation Paper • 2312.13469 • Published Dec 20, 2023 • 10
ShowRoom3D: Text to High-Quality 3D Room Generation Using 3D Priors Paper • 2312.13324 • Published Dec 20, 2023 • 9
Zero-Shot Metric Depth with a Field-of-View Conditioned Diffusion Model Paper • 2312.13252 • Published Dec 20, 2023 • 27
UniSDF: Unifying Neural Representations for High-Fidelity 3D Reconstruction of Complex Scenes with Reflections Paper • 2312.13285 • Published Dec 20, 2023 • 5
Self-Evaluation Improves Selective Generation in Large Language Models Paper • 2312.09300 • Published Dec 14, 2023 • 14
Weight subcloning: direct initialization of transformers using larger pretrained ones Paper • 2312.09299 • Published Dec 14, 2023 • 17