PKU-Alignment
/

beaver-7b-v1.0-cost

Reinforcement Learning

reinforcement-learning-from-human-feedback

Model card Files Files and versions Community

RuiyangSun commited on Jul 10, 2023

Commit

32e35c1

•

1 Parent(s): 588a9a4

docs: update readme

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -20,7 +20,7 @@ library_name: safe-rlhf
 ## Model Details
-The Beaver Cost model is a preference model trained using the [PKU-SafeRLHF](https://huggingface.co/datasets/PKU-Alignment/PKU-SafeRLHF dataset).
 It can play a role in the safe RLHF algorithm, helping the Beaver model become more safe and harmless.
 - **Developed by:** the [PKU-Alignment](https://github.com/PKU-Alignment) Team.

 ## Model Details
+The Beaver Cost model is a preference model trained using the [PKU-SafeRLHF](https://huggingface.co/datasets/PKU-Alignment/PKU-SafeRLHF) dataset.
 It can play a role in the safe RLHF algorithm, helping the Beaver model become more safe and harmless.
 - **Developed by:** the [PKU-Alignment](https://github.com/PKU-Alignment) Team.