hendrydong
commited on
Commit
•
340e8d2
1
Parent(s):
eb0adc3
Update README.md
Browse files
README.md
CHANGED
@@ -1,7 +1,7 @@
|
|
1 |
|
2 |
This model is the RLHF version of `HuggingFaceH4/mistral-7b-sft-beta` without any external responses.
|
3 |
|
4 |
-
The external
|
5 |
|
6 |
|
7 |
**We obtain 35.95% win-rate on Alpaca Eval v2.** The win-rate of the base model is only 4.63%.
|
|
|
1 |
|
2 |
This model is the RLHF version of `HuggingFaceH4/mistral-7b-sft-beta` without any external responses.
|
3 |
|
4 |
+
We perform GSHF algorithm on SFT baseline. The external signals include (1) Reward model; (2) AI-generated Prompts.
|
5 |
|
6 |
|
7 |
**We obtain 35.95% win-rate on Alpaca Eval v2.** The win-rate of the base model is only 4.63%.
|