3 6 6

Dongxu Li

dxli1

AI & ML interests

None yet

Recent Activity

upvoted a paper 3 days ago

OminiControl: Minimal and Universal Control for Diffusion Transformer

upvoted a paper 7 days ago

VideoAutoArena: An Automated Arena for Evaluating Large Multimodal Models in Video Analysis through User Simulation

liked a model about 1 month ago

rhymes-ai/Aria-torchao-int8wo

View all activity

Organizations

dxli1's activity

upvoted a paper 3 days ago

OminiControl: Minimal and Universal Control for Diffusion Transformer

Paper • 2411.15098 • Published 6 days ago • 38

upvoted a paper 7 days ago

VideoAutoArena: An Automated Arena for Evaluating Large Multimodal Models in Video Analysis through User Simulation

Paper • 2411.13281 • Published 8 days ago • 17

liked 2 models about 1 month ago

rhymes-ai/Aria-torchao-int8wo

Updated Oct 25 • 310 • 11

thwin27/Aria-sequential_mlp-FP8-dynamic

Image-Text-to-Text • Updated Oct 23 • 83 • 5

upvoted an article about 1 month ago

Article

Allegro: Advanced Video Generation Model

•

Oct 22

• 56

liked a model about 1 month ago

rhymes-ai/Allegro

Text-to-Video • Updated 28 days ago • 2.1k • 237

upvoted a paper about 1 month ago

Allegro: Open the Black Box of Commercial-Level Video Generation Model

Paper • 2410.15458 • Published Oct 20 • 40

liked a model about 2 months ago

rhymes-ai/Aria

Image-Text-to-Text • Updated 16 days ago • 15.7k • 587

upvoted a paper about 2 months ago

Aria: An Open Multimodal Native Mixture-of-Experts Model

Paper • 2410.05993 • Published Oct 8 • 107

authored 9 papers about 2 months ago

Word-level Deep Sign Language Recognition from Video: A New Large-scale Dataset and Methods Comparison

Paper • 1910.11006 • Published Oct 24, 2019

X-InstructBLIP: A Framework for aligning X-Modal instruction-aware representations to LLMs and Emergent Cross-modal Reasoning

Paper • 2311.18799 • Published Nov 30, 2023 • 1

Moonshot: Towards Controllable Video Generation and Editing with Multimodal Conditions

Paper • 2401.01827 • Published Jan 3 • 15

BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

Paper • 2201.12086 • Published Jan 28, 2022 • 3

cosFormer: Rethinking Softmax in Attention

Paper • 2202.08791 • Published Feb 17, 2022 • 1

The Devil in Linear Transformer

Paper • 2210.10340 • Published Oct 19, 2022 • 1

BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models

Paper • 2301.12597 • Published Jan 30, 2023 • 1

Aria: An Open Multimodal Native Mixture-of-Experts Model

Paper • 2410.05993 • Published Oct 8 • 107

Align and Prompt: Video-and-Language Pre-training with Entity Prompts

Paper • 2112.09583 • Published Dec 17, 2021

liked a dataset 4 months ago

longvideobench/LongVideoBench

Viewer • Updated Oct 14 • 6.68k • 6.14k • 13

liked a Space 4 months ago

Running

💻