Papers
arxiv:2409.17169

REAL: Response Embedding-based Alignment for LLMs

Published on Sep 17
Authors:
,

Abstract

Aligning large language models (LLMs) to human preferences is a crucial step in building helpful and safe AI tools, which usually involve training on supervised datasets. Popular algorithms such as Direct Preference Optimization rely on pairs of AI-generated responses ranked according to human feedback. The labeling process is the most labor-intensive and costly part of the alignment pipeline, and improving its efficiency would have a meaningful impact on AI development. We propose a strategy for sampling a high-quality training dataset that focuses on acquiring the most informative response pairs for labeling out of a set of AI-generated responses. Experimental results on synthetic HH-RLHF benchmarks indicate that choosing dissimilar response pairs enhances the direct alignment of LLMs while reducing inherited labeling errors. We also applied our method to the real-world dataset SHP2, selecting optimal pairs from multiple responses. The model aligned on dissimilar response pairs obtained the best win rate on the dialogue task. Our findings suggest that focusing on less similar pairs can improve the efficiency of LLM alignment, saving up to 65% of annotators' work.

Community

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2409.17169 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2409.17169 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2409.17169 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.