4 18 12

Zesen Cheng

ClownRat

AI & ML interests

multi-modal foundation model; Segmentation, Detection, and Tracking;

Recent Activity

authored a paper 9 days ago

Large Language Models Can Self-Improve in Long-context Reasoning

upvoted a paper 10 days ago

EgoVid-5M: A Large-Scale Video-Action Dataset for Egocentric Video Generation

upvoted a paper 10 days ago

Large Language Models Can Self-Improve in Long-context Reasoning

View all activity

Organizations

ClownRat's activity

upvoted 2 papers 10 days ago

EgoVid-5M: A Large-Scale Video-Action Dataset for Egocentric Video Generation

Paper • 2411.08380 • Published 11 days ago • 24

Large Language Models Can Self-Improve in Long-context Reasoning

Paper • 2411.08147 • Published 12 days ago • 59

upvoted 2 papers 28 days ago

Enhancing Training Efficiency Using Packing with Flash Attention

Paper • 2407.09105 • Published Jul 12 • 14

Chain of Ideas: Revolutionizing Research in Novel Idea Development with LLM Agents

Paper • 2410.13185 • Published Oct 17 • 6

upvoted a paper 29 days ago

Why Does the Effective Context Length of LLMs Fall Short?

Paper • 2410.18745 • Published Oct 24 • 16

upvoted a paper about 1 month ago

Stabilize the Latent Space for Image Autoregressive Modeling: A Unified Perspective

Paper • 2410.12490 • Published Oct 16 • 8

upvoted 2 collections about 1 month ago

DiGIT

Collection

The corresponding checkpoints of [NeurIPS 2024] Stabilize the Latent Space for Image Autoregressive Modeling: A Unified Perspective • 2 items • Updated about 1 month ago • 1

Inf-CL

Collection

The corresponding demos/checkpoints/papers/datasets of Inf-CL. • 2 items • Updated about 1 month ago • 2

upvoted 2 papers about 1 month ago

Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss

Paper • 2410.17243 • Published Oct 22 • 88

The Curse of Multi-Modalities: Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio

Paper • 2410.12787 • Published Oct 16 • 30

upvoted a paper about 2 months ago

A Survey on the Honesty of Large Language Models

Paper • 2409.18786 • Published Sep 27 • 31

upvoted a paper 4 months ago

SeaLLMs 3: Open Foundation and Chat Multilingual Large Language Models for Southeast Asian Languages

Paper • 2407.19672 • Published Jul 29 • 55

upvoted 3 papers 5 months ago

upvoted a collection 5 months ago

VideoLLaMA 2

Collection

Optimized VideoLLaMA with improved spatial-temporal modeling and better audio understanding capability • 13 items • Updated 11 days ago • 21

upvoted a paper 6 months ago

VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs

Paper • 2406.07476 • Published Jun 11 • 32