SciAgents: Automating scientific discovery through multi-agent intelligent graph reasoning Paper • 2409.05556 • Published Sep 9 • 1
Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers Paper • 2409.04109 • Published Sep 6 • 43
A Preliminary Study of o1 in Medicine: Are We Closer to an AI Doctor? Paper • 2409.15277 • Published Sep 23 • 34
Learning Task Decomposition to Assist Humans in Competitive Programming Paper • 2406.04604 • Published Jun 7 • 4
SciLitLLM: How to Adapt LLMs for Scientific Literature Understanding Paper • 2408.15545 • Published Aug 28 • 34
Scaling Synthetic Data Creation with 1,000,000,000 Personas Paper • 2406.20094 • Published Jun 28 • 94
BERTopic: Neural topic modeling with a class-based TF-IDF procedure Paper • 2203.05794 • Published Mar 11, 2022
Document Parsing Unveiled: Techniques, Challenges, and Prospects for Structured Information Extraction Paper • 2410.21169 • Published 8 days ago • 28
StructRAG: Boosting Knowledge Intensive Reasoning of LLMs via Inference-time Hybrid Information Structurization Paper • 2410.08815 • Published 25 days ago • 39
JudgeBench: A Benchmark for Evaluating LLM-based Judges Paper • 2410.12784 • Published 20 days ago • 40
Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free Paper • 2410.10814 • Published 22 days ago • 48
MSI-Agent: Incorporating Multi-Scale Insight into Embodied Agents for Superior Planning and Decision-Making Paper • 2409.16686 • Published Sep 25 • 8
DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines Paper • 2310.03714 • Published Oct 5, 2023 • 30
ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery Paper • 2410.05080 • Published 29 days ago • 19
The Imperative of Conversation Analysis in the Era of LLMs: A Survey of Tasks, Techniques, and Trends Paper • 2409.14195 • Published Sep 21 • 11
Enhancing Structured-Data Retrieval with GraphRAG: Soccer Data Case Study Paper • 2409.17580 • Published Sep 26 • 6
Style over Substance: Failure Modes of LLM Judges in Alignment Benchmarking Paper • 2409.15268 • Published Sep 23 • 11
LLMs Still Can't Plan; Can LRMs? A Preliminary Evaluation of OpenAI's o1 on PlanBench Paper • 2409.13373 • Published Sep 20 • 2
Inductive or Deductive? Rethinking the Fundamental Reasoning Abilities of LLMs Paper • 2408.00114 • Published Jul 31
Planning In Natural Language Improves LLM Search For Code Generation Paper • 2409.03733 • Published Sep 5
PingPong: A Benchmark for Role-Playing Language Models with User Emulation and Multi-Model Evaluation Paper • 2409.06820 • Published Sep 10 • 62
Strategic Chain-of-Thought: Guiding Accurate Reasoning in LLMs through Strategy Elicitation Paper • 2409.03271 • Published Sep 5 • 2
Authorship Attribution in the Era of LLMs: Problems, Methodologies, and Challenges Paper • 2408.08946 • Published Aug 16 • 10
Navigating the Unknown: A Chat-Based Collaborative Interface for Personalized Exploratory Tasks Paper • 2410.24032 • Published 5 days ago • 8