Digital Socrates: Evaluating LLMs through explanation critiques Paper • 2311.09613 • Published Nov 16, 2023 • 1
PromptBench: A Unified Library for Evaluation of Large Language Models Paper • 2312.07910 • Published Dec 13, 2023 • 15