Asynchronous RLHF
Collection
Models and datasets for asynchronous rlhf paper, see code at https://github.com/mnoukhov/async_rlhf
•
10 items
•
Updated
This model is a fine-tuned version of mnoukhov/pythia1b-sft-tldr on an unknown dataset. It achieves the following results on the evaluation set:
More information needed
More information needed
More information needed
The following hyperparameters were used during training:
Training Loss | Epoch | Step | Validation Loss | Accuracy |
---|---|---|---|---|
0.5365 | 0.2006 | 291 | 0.5420 | 0.7271 |
0.4521 | 0.4011 | 582 | 0.5034 | 0.7485 |
0.3994 | 0.6017 | 873 | 0.4893 | 0.7577 |
0.3596 | 0.8022 | 1164 | 0.4685 | 0.7693 |
Base model
EleutherAI/pythia-1b-deduped