10 8 12

Yuansheng Ni

yuanshengni

https://yuanshengni.github.io/

AI & ML interests

NLP

Recent Activity

upvoted a paper 12 days ago

OmniEdit: Building Image Editing Generalist Models Through Specialist Supervision

updated a dataset 14 days ago

MMMU/MMMU_Pro

liked a Space 25 days ago

TIGER-Lab/MEGA-Bench

View all activity

Organizations

yuanshengni's activity

upvoted a paper 12 days ago

OmniEdit: Building Image Editing Generalist Models Through Specialist Supervision

Paper • 2411.07199 • Published 12 days ago • 43

updated a dataset 14 days ago

MMMU/MMMU_Pro

Viewer • Updated 14 days ago • 5.19k • 1.72k • 15

liked a Space 25 days ago

Running

🥇

MEGA-Bench

A leaderboard for multimodal models

liked a dataset about 1 month ago

TIGER-Lab/MEGA-Bench

Viewer • Updated 4 days ago • 12.6k • 1.02k • 14

authored 2 papers about 1 month ago

II-Bench: An Image Implication Understanding Benchmark for Multimodal Large Language Models

Paper • 2406.05862 • Published Jun 9 • 4

MEGA-Bench: Scaling Multimodal Evaluation to over 500 Real-World Tasks

Paper • 2410.10563 • Published Oct 14 • 37

upvoted a paper about 1 month ago

MEGA-Bench: Scaling Multimodal Evaluation to over 500 Real-World Tasks

Paper • 2410.10563 • Published Oct 14 • 37

updated a dataset 2 months ago

MMMU/MMMU

Viewer • Updated Sep 19 • 11.6k • 13.1k • 194

New activity in MMMU/MMMU_Pro 3 months ago

Link dataset to paper

#3 opened 3 months ago by

nielsr

[bot] Conversion to Parquet

#1 opened 3 months ago by

parquet-converter

options of `validation_Accounting_29` are in the wrong format

#2 opened 3 months ago by

boxin-wbx

upvoted a paper 3 months ago

MMMU-Pro: A More Robust Multi-discipline Multimodal Understanding Benchmark

Paper • 2409.02813 • Published Sep 4 • 28

authored a paper 3 months ago

MMMU-Pro: A More Robust Multi-discipline Multimodal Understanding Benchmark

Paper • 2409.02813 • Published Sep 4 • 28

liked a dataset 3 months ago

MMMU/MMMU_Pro

Viewer • Updated 14 days ago • 5.19k • 1.72k • 15

upvoted a paper 4 months ago

Knowledge Mechanisms in Large Language Models: A Survey and Perspective

Paper • 2407.15017 • Published Jul 22 • 33

authored a paper 5 months ago

MantisScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation

Paper • 2406.15252 • Published Jun 21 • 14

liked a dataset 5 months ago

m-a-p/II-Bench

Viewer • Updated Jun 29 • 1.43k • 574 • 8

authored 2 papers 6 months ago

GenAI Arena: An Open Evaluation Platform for Generative Models

Paper • 2406.04485 • Published Jun 6 • 20

MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark

Paper • 2406.01574 • Published Jun 3 • 43

liked a Space 6 months ago

Running on CPU Upgrade

163

🥇

MMLU Pro

More advanced and challenging multi-task evaluation