arxiv:2309.08210

Investigating Answerability of LLMs for Long-Form Question Answering

Published on Sep 15, 2023

· Submitted by

akhaliq on Sep 17, 2023

#1 Paper of the day

Upvote

Authors:

Meghana Moorthy Bhat ,

Rui Meng ,

Yingbo Zhou ,

Semih Yavuz

Abstract

As we embark on a new era of LLMs, it becomes increasingly crucial to understand their capabilities, limitations, and differences. Toward making further progress in this direction, we strive to build a deeper understanding of the gaps between massive LLMs (e.g., ChatGPT) and smaller yet effective open-source LLMs and their distilled counterparts. To this end, we specifically focus on long-form question answering (LFQA) because it has several practical and impactful applications (e.g., troubleshooting, customer service, etc.) yet is still understudied and challenging for LLMs. We propose a question-generation method from abstractive summaries and show that generating follow-up questions from summaries of long documents can create a challenging setting for LLMs to reason and infer from long contexts. Our experimental results confirm that: (1) our proposed method of generating questions from abstractive summaries pose a challenging setup for LLMs and shows performance gaps between LLMs like ChatGPT and open-source LLMs (Alpaca, Llama) (2) open-source LLMs exhibit decreased reliance on context for generated questions from the original document, but their generation capabilities drop significantly on generated questions from summaries -- especially for longer contexts (>1024 tokens)

View arXiv page View PDF Add to collection

Community

ljhwild

Sep 19, 2023

The problem with long context, assuming we're not extending attention size is that data needs to be bached and similarity search query needs to be created.
I think the solution lies somewhere in building structure to the vector saved data. Metadata don't cut it.