Large Language Models (GPT) Struggle to Answer Multiple-Choice Questions about Code Paper • 2303.08033 • Published Mar 9, 2023 • 1
CodeApex: A Bilingual Programming Evaluation Benchmark for Large Language Models Paper • 2309.01940 • Published Sep 5, 2023 • 1