hendrydong
commited on
Commit
•
87cf99e
1
Parent(s):
2e78e24
Update README.md
Browse files
README.md
CHANGED
@@ -178,10 +178,28 @@ Carbon emissions can be estimated using the [Machine Learning Impact calculator]
|
|
178 |
|
179 |
[More Information Needed]
|
180 |
|
181 |
-
##
|
182 |
-
|
183 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
184 |
|
185 |
-
##
|
186 |
|
187 |
-
|
|
|
178 |
|
179 |
[More Information Needed]
|
180 |
|
181 |
+
## References
|
182 |
+
|
183 |
+
If you found this helpful, please cite the following papers.
|
184 |
+
|
185 |
+
```bibtex
|
186 |
+
@article{dong2023raft,
|
187 |
+
title={Raft: Reward ranked finetuning for generative foundation model alignment},
|
188 |
+
author={Dong, Hanze and Xiong, Wei and Goyal, Deepanshu and Pan, Rui and Diao, Shizhe and Zhang, Jipeng and Shum, Kashun and Zhang, Tong},
|
189 |
+
journal={arXiv preprint arXiv:2304.06767},
|
190 |
+
year={2023}
|
191 |
+
}
|
192 |
+
|
193 |
+
@misc{xiong2024iterative,
|
194 |
+
title={Iterative Preference Learning from Human Feedback: Bridging Theory and Practice for RLHF under KL-Constraint},
|
195 |
+
author={Wei Xiong and Hanze Dong and Chenlu Ye and Ziqi Wang and Han Zhong and Heng Ji and Nan Jiang and Tong Zhang},
|
196 |
+
year={2024},
|
197 |
+
eprint={2312.11456},
|
198 |
+
archivePrefix={arXiv},
|
199 |
+
primaryClass={cs.LG}
|
200 |
+
}
|
201 |
+
```
|
202 |
|
203 |
+
## Contact
|
204 |
|
205 |
+
If you have any questions, please contact hanze dot dong AT salesforce dot com.
|