Asynchronous RLHF
Collection
Models and datasets for asynchronous rlhf paper, see code at https://github.com/mnoukhov/async_rlhf
•
10 items
•
Updated
This model is a fine-tuned version of EleutherAI/pythia-410m-deduped on an unknown dataset. It achieves the following results on the evaluation set:
More information needed
More information needed
More information needed
The following hyperparameters were used during training:
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
2.6789 | 0.2007 | 183 | 2.5844 |
2.5737 | 0.4013 | 366 | 2.5528 |
2.5499 | 0.6020 | 549 | 2.5367 |
2.5298 | 0.8026 | 732 | 2.5290 |
Base model
EleutherAI/pythia-410m-deduped