Papers
arxiv:2309.13075

SCREWS: A Modular Framework for Reasoning with Revisions

Published on Sep 20, 2023
· Submitted by akhaliq on Sep 26, 2023
Authors:
,
,

Abstract

Large language models (LLMs) can improve their accuracy on various tasks through iteratively refining and revising their output based on feedback. We observe that these revisions can introduce errors, in which case it is better to roll back to a previous result. Further, revisions are typically homogeneous: they use the same reasoning method that produced the initial answer, which may not correct errors. To enable exploration in this space, we present SCREWS, a modular framework for reasoning with revisions. It is comprised of three main modules: Sampling, Conditional Resampling, and Selection, each consisting of sub-modules that can be hand-selected per task. We show that SCREWS not only unifies several previous approaches under a common framework, but also reveals several novel strategies for identifying improved reasoning chains. We evaluate our framework with state-of-the-art LLMs (ChatGPT and GPT-4) on a diverse set of reasoning tasks and uncover useful new reasoning strategies for each: arithmetic word problems, multi-hop question answering, and code debugging. Heterogeneous revision strategies prove to be important, as does selection between original and revised candidates.

Community

Here is an ML-generated summary

Objective
The paper proposes SCREWS, a modular framework for reasoning with revisions to improve the accuracy of large language models on reasoning tasks.

Insights

  • Selection is important to avoid errors introduced by resampling.
  • Heterogeneous resampling using a different method than sampling improves accuracy.
  • External knowledge is needed for resampling to correct incorrect facts.
  • No single strategy works best across all modules and tasks. Simple methods can outperform more complex ones.
  • Resampling with tools is effective but costly, so can be applied selectively.
  • Larger models like GPT-4 boost performance, and selection helps choose their best reasoning chain.

Implementation

  • The framework consists of three main modules - Sampling, Conditional Resampling, and Selection.
  • Sampling generates initial outputs using methods like chain of thought, subquestion decomposition, or answer only.
  • Conditional Resampling decides whether to generate a revised output conditioned on the initial sample. Methods include self-ask, tool-based, or changing to a different reasoning method.
  • Selection chooses between the initial and revised candidates using model-based or rule-based strategies like self-select, most recent, or majority vote.
  • The modules and strategies are combined in different ways for different tasks like arithmetic, QA, and code debugging.

Results
The paper shows that the SCREWS framework gives 10-15% accuracy improvements over vanilla sampling and resampling methods on arithmetic, QA, and code debugging tasks.

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2309.13075 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2309.13075 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2309.13075 in a Space README.md to link it from this page.

Collections including this paper 15