OpenScholar: Synthesizing Scientific Literature with Retrieval-augmented LMs
Abstract
Scientific progress depends on researchers' ability to synthesize the growing body of literature. Can large language models (LMs) assist scientists in this task? We introduce OpenScholar, a specialized retrieval-augmented LM that answers scientific queries by identifying relevant passages from 45 million open-access papers and synthesizing citation-backed responses. To evaluate OpenScholar, we develop ScholarQABench, the first large-scale multi-domain benchmark for literature search, comprising 2,967 expert-written queries and 208 long-form answers across computer science, physics, neuroscience, and biomedicine. On ScholarQABench, OpenScholar-8B outperforms GPT-4o by 5% and PaperQA2 by 7% in correctness, despite being a smaller, open model. While GPT4o hallucinates citations 78 to 90% of the time, OpenScholar achieves citation accuracy on par with human experts. OpenScholar's datastore, retriever, and self-feedback inference loop also improves off-the-shelf LMs: for instance, OpenScholar-GPT4o improves GPT-4o's correctness by 12%. In human evaluations, experts preferred OpenScholar-8B and OpenScholar-GPT4o responses over expert-written ones 51% and 70% of the time, respectively, compared to GPT4o's 32%. We open-source all of our code, models, datastore, data and a public demo.
Community
OpenScholar is a new retrieval-augmented LM designed for scientific literature synthesis. Built upon a datastore consisting of 45 million open-access papers, trained retriever, reranker and 8B LM, and self-feedback retreival-augmented genereation pipeline, it outperforms GPT4o as well as production systems such as Perplexity for literature synthesis. A public demo is available at https://openscholar.allen.ai/.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Sufficient Context: A New Lens on Retrieval Augmented Generation Systems (2024)
- Assessing the Answerability of Queries in Retrieval-Augmented Code Generation (2024)
- Open-RAG: Enhanced Retrieval-Augmented Reasoning with Open-Source Large Language Models (2024)
- Query Optimization for Parametric Knowledge Refinement in Retrieval-Augmented Large Language Models (2024)
- SciDQA: A Deep Reading Comprehension Dataset over Scientific Papers (2024)
- LLM-Ref: Enhancing Reference Handling in Technical Writing with Large Language Models (2024)
- Large Language Models Can Self-Improve in Long-context Reasoning (2024)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper