Papers
arxiv:2408.06195

Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers

Published on Aug 12
· Submitted by akhaliq on Aug 13
#2 Paper of the day
Authors:
,

Abstract

This paper introduces rStar, a self-play mutual reasoning approach that significantly improves reasoning capabilities of small language models (SLMs) without fine-tuning or superior models. rStar decouples reasoning into a self-play mutual generation-discrimination process. First, a target SLM augments the Monte Carlo Tree Search (MCTS) with a rich set of human-like reasoning actions to construct higher quality reasoning trajectories. Next, another SLM, with capabilities similar to the target SLM, acts as a discriminator to verify each trajectory generated by the target SLM. The mutually agreed reasoning trajectories are considered mutual consistent, thus are more likely to be correct. Extensive experiments across five SLMs demonstrate rStar can effectively solve diverse reasoning problems, including GSM8K, GSM-Hard, MATH, SVAMP, and StrategyQA. Remarkably, rStar boosts GSM8K accuracy from 12.51% to 63.91% for LLaMA2-7B, from 36.46% to 81.88% for Mistral-7B, from 74.53% to 91.13% for LLaMA3-8B-Instruct. Code will be available at https://github.com/zhentingqi/rStar.

Community

Paper submitter
·

404

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

In simple language, the process is a very sophisticated prompting. It involves generating multiple candidate solutions from a single LLM, which are then refined through feedback.

The candidate solutions are input back into the LLM, which uses a reward function to evaluate and refine its output. The reward function is designed to guide the generation of candidate solutions, providing a score or feedback that helps the LLM refine its output.

Once the candidate solutions are refined, they are fed back into the LLM, which uses the feedback to generate a final solution. This process allows the LLM to iteratively refine its output, producing a more accurate and effective solution.

·

Can you resend the link to your GitHub code. The above link dosent work

I have done an open source implementation of the technique in the optillm repo you can see here - https://github.com/codelion/optillm/blob/main/rstar.py

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2408.06195 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2408.06195 in a dataset README.md to link it from this page.

Spaces citing this paper 2

Collections including this paper 24