arxiv:2309.11235

OpenChat: Advancing Open-source Language Models with Mixed-Quality Data

Published on Sep 20, 2023

Authors:

Sijie Cheng ,

Abstract

Nowadays, open-source large language models like LLaMA have emerged. Recent developments have incorporated supervised fine-tuning (SFT) and reinforcement learning fine-tuning (RLFT) to align these models with human goals. However, SFT methods treat all training data with mixed quality equally, while RLFT methods require high-quality pairwise or ranking-based preference data. In this study, we present a novel framework, named OpenChat, to advance open-source language models with mixed-quality data. Specifically, we consider the general SFT training data, consisting of a small amount of expert data mixed with a large proportion of sub-optimal data, without any preference labels. We propose the C(onditioned)-RLFT, which regards different data sources as coarse-grained reward labels and learns a class-conditioned policy to leverage complementary data quality information. Interestingly, the optimal policy in C-RLFT can be easily solved through single-stage, RL-free supervised learning, which is lightweight and avoids costly human preference labeling. Through extensive experiments on three standard benchmarks, our openchat-13b fine-tuned with C-RLFT achieves the highest average performance among all 13b open-source language models. Moreover, we use AGIEval to validate the model generalization performance, in which only openchat-13b surpasses the base model. Finally, we conduct a series of analyses to shed light on the effectiveness and robustness of OpenChat. Our code, data, and models are publicly available at https://github.com/imoneoi/openchat.

View arXiv page View PDF Add to collection

Community

parulduggar

Nov 3, 2023

ffff

datkai

Nov 7, 2023

•

edited Nov 7, 2023

No description provided.

datkai

Nov 7, 2023

bạn thấy gì ?

Aspie96

Nov 22, 2023

Great work, guys!

A quick note, however.

While OpenChat is genuinely open source (Apache 2.0 license), LLaMA is not and should not be referred as such, because of the restrictions in the license.
It's important not to reinforce Meta's misinformation, especially in actual open source projects such as OpenChat.