Evaluation - a OliP Collection

OliP 's Collections

NewGen small LMs

Leading Leaderboards

2024 Papers of the year

2023 (and before) Papers of the Year

Vision-Language

Audio

Special LMs <10B

Coding

Evaluation

updated Sep 25

Self-Taught Evaluators

Paper • 2408.02666 • Published Aug 5 • 26
Michelangelo: Long Context Evaluations Beyond Haystacks via Latent Structure Queries

Paper • 2409.12640 • Published Sep 19 • 2
openai/MMMLU

Viewer • Updated Oct 16 • 393k • 1.88k • 423
HelloBench: Evaluating Long Text Generation Capabilities of Large Language Models

Paper • 2409.16191 • Published Sep 24 • 41