Levels of AGI: Operationalizing Progress on the Path to AGI Paper • 2311.02462 • Published Nov 4, 2023 • 33
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models Paper • 2206.04615 • Published Jun 9, 2022 • 5
Bring Your Own Data! Self-Supervised Evaluation for Large Language Models Paper • 2306.13651 • Published Jun 23, 2023 • 15
Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference Paper • 2403.04132 • Published Mar 7 • 38
τ-bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains Paper • 2406.12045 • Published Jun 17 • 6