Models
Datasets
Spaces
Posts
Docs
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2408.08459

Papers I want to read

Papers in my to-read list

RLHF Workflow: From Reward Modeling to Online RLHF

Paper • 2405.07863 • Published May 13 • 67
Chameleon: Mixed-Modal Early-Fusion Foundation Models

Paper • 2405.09818 • Published May 16 • 126
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models

Paper • 2405.15574 • Published May 24 • 53
An Introduction to Vision-Language Modeling

Paper • 2405.17247 • Published May 27 • 85

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6 • 25
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6 • 12
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7 • 38
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7 • 19

Vision Language Models

about 1 month ago

Qwen/Qwen2-VL-7B-Instruct

Image-Text-to-Text • Updated Sep 21 • 1.16M • • 800
vidore/colpali_train_set

Viewer • Updated Sep 4 • 119k • 1.86k • 65
vidore/colpali

Updated Sep 27 • 35.6k • 368
JPEG-LM: LLMs as Image Generators with Canonical Codec Representations

Paper • 2408.08459 • Published Aug 15 • 44

SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models

Paper • 2407.15841 • Published Jul 22 • 39
Stable Audio Open

Paper • 2407.14358 • Published Jul 19 • 23
PlacidDreamer: Advancing Harmony in Text-to-3D Generation

Paper • 2407.13976 • Published Jul 19 • 5
Efficient Audio Captioning with Encoder-Level Knowledge Distillation

Paper • 2407.14329 • Published Jul 19 • 4

Embodied useful

Genie: Generative Interactive Environments

Paper • 2402.15391 • Published Feb 23 • 70
RT-H: Action Hierarchies Using Language

Paper • 2403.01823 • Published Mar 4 • 7
JPEG-LM: LLMs as Image Generators with Canonical Codec Representations

Paper • 2408.08459 • Published Aug 15 • 44

Compose and Conquer: Diffusion-Based 3D Depth Aware Composable Image Synthesis

Paper • 2401.09048 • Published Jan 17 • 8
Improving fine-grained understanding in image-text pre-training

Paper • 2401.09865 • Published Jan 18 • 15
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data

Paper • 2401.10891 • Published Jan 19 • 58
Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild

Paper • 2401.13627 • Published Jan 24 • 72

Secrets of RLHF in Large Language Models Part II: Reward Modeling

Paper • 2401.06080 • Published Jan 11 • 25
JPEG-LM: LLMs as Image Generators with Canonical Codec Representations

Paper • 2408.08459 • Published Aug 15 • 44
TurboEdit: Instant text-based image editing

Paper • 2408.08332 • Published Aug 14 • 18

Emu Video: Factorizing Text-to-Video Generation by Explicit Image Conditioning

Paper • 2311.10709 • Published Nov 17, 2023 • 24
Face Adapter for Pre-Trained Diffusion Models with Fine-Grained ID and Attribute Control

Paper • 2405.12970 • Published May 21 • 22
FIFO-Diffusion: Generating Infinite Videos from Text without Training

Paper • 2405.11473 • Published May 19 • 53
stabilityai/stable-diffusion-3-medium

Text-to-Image • Updated Aug 12 • 60.3k • 4.56k

MADLAD-400: A Multilingual And Document-Level Large Audited Dataset

Paper • 2309.04662 • Published Sep 9, 2023 • 22
Neurons in Large Language Models: Dead, N-gram, Positional

Paper • 2309.04827 • Published Sep 9, 2023 • 16
Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs

Paper • 2309.05516 • Published Sep 11, 2023 • 9
DrugChat: Towards Enabling ChatGPT-Like Capabilities on Drug Molecule Graphs

Paper • 2309.03907 • Published May 18, 2023 • 8

Company

© Hugging Face

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs