OpenDevin: An Open Platform for AI Software Developers as Generalist Agents Paper • 2407.16741 • Published Jul 23 • 68
Prioritizing Safeguarding Over Autonomy: Risks of LLM Agents for Science Paper • 2402.04247 • Published Feb 6 • 1
Igniting Language Intelligence: The Hitchhiker's Guide From Chain-of-Thought Reasoning to Language Agents Paper • 2311.11797 • Published Nov 20, 2023 • 2
DocMath-Eval: Evaluating Numerical Reasoning Capabilities of LLMs in Understanding Long Documents with Tabular Data Paper • 2311.09805 • Published Nov 16, 2023 • 3
MedAgents: Large Language Models as Collaborators for Zero-shot Medical Reasoning Paper • 2311.10537 • Published Nov 16, 2023 • 3
Investigating Data Contamination in Modern Benchmarks for Large Language Models Paper • 2311.09783 • Published Nov 16, 2023 • 2
ML-Bench: Large Language Models Leverage Open-source Libraries for Machine Learning Tasks Paper • 2311.09835 • Published Nov 16, 2023 • 9
Struc-Bench: Are Large Language Models Really Good at Generating Complex Structured Data? Paper • 2309.08963 • Published Sep 16, 2023 • 9
BioCoder: A Benchmark for Bioinformatics Code Generation with Contextual Pragmatic Knowledge Paper • 2308.16458 • Published Aug 31, 2023 • 10