line-corporation
/

sacpo

Reinforcement Learning

text-generation

reinforcement-learning-from-human-feedback

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

akifumiwachi commited on Jun 21

Commit

b596248

•

1 Parent(s): 15ddeb1

Update README.md

Files changed (1) hide show

README.md +1 -0

README.md CHANGED Viewed

@@ -31,6 +31,7 @@ tags:
 - **Fine-tuned from model:** [Alpaca (reprod.)](https://huggingface.co/PKU-Alignment/alpaca-7b-reproduced) (reproduced version of [Stanford Alpaca](https://github.com/tatsu-lab/stanford_alpaca))
 - **Dataset:** [PKU-SafeRLHF-30K](https://huggingface.co/datasets/PKU-Alignment/PKU-SafeRLHF-30K)
 - **SACPO Paper:** <https://arxiv.org/abs/2404.11049>
 - **Model Alias:** SACPO: DPO (H) -> DPO (S) 0.025
 ## Usage: How to Talk with the Model

 - **Fine-tuned from model:** [Alpaca (reprod.)](https://huggingface.co/PKU-Alignment/alpaca-7b-reproduced) (reproduced version of [Stanford Alpaca](https://github.com/tatsu-lab/stanford_alpaca))
 - **Dataset:** [PKU-SafeRLHF-30K](https://huggingface.co/datasets/PKU-Alignment/PKU-SafeRLHF-30K)
 - **SACPO Paper:** <https://arxiv.org/abs/2404.11049>
+- **GitHub:** <https://github.com/line/sacpo>
 - **Model Alias:** SACPO: DPO (H) -> DPO (S) 0.025
 ## Usage: How to Talk with the Model