Papers
arxiv:2410.22349

Search Engines in an AI Era: The False Promise of Factual and Verifiable Source-Cited Responses

Published on Oct 15
Authors:
,
,

Abstract

Large Language Model (LLM)-based applications are graduating from research prototypes to products serving millions of users, influencing how people write and consume information. A prominent example is the appearance of Answer Engines: LLM-based generative search engines supplanting traditional search engines. Answer engines not only retrieve relevant sources to a user query but synthesize answer summaries that cite the sources. To understand these systems' limitations, we first conducted a study with 21 participants, evaluating interactions with answer vs. traditional search engines and identifying 16 answer engine limitations. From these insights, we propose 16 answer engine design recommendations, linked to 8 metrics. An automated evaluation implementing our metrics on three popular engines (You.com, Perplexity.ai, BingChat) quantifies common limitations (e.g., frequent hallucination, inaccurate citation) and unique features (e.g., variation in answer confidence), with results mirroring user study insights. We release our Answer Engine Evaluation benchmark (AEE) to facilitate transparent evaluation of LLM-based applications.

Community

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2410.22349 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2410.22349 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2410.22349 in a Space README.md to link it from this page.

Collections including this paper 1