hendrydong commited on
Commit
612dc8f
1 Parent(s): 3053553

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -2
README.md CHANGED
@@ -1,12 +1,14 @@
1
 
2
- This model is the RLHF version of `HuggingFaceH4/mistral-7b-sft-beta` without any external responses.
 
 
3
 
4
 
5
  **We obtain 35.95% win-rate on Alpaca Eval v2.**
6
 
7
  ## Model Details
8
 
9
- We perform 3 iterations of GSHF algorithm on `HuggingFaceH4/mistral-7b-sft-beta`, where prompts are generated by ChatGPT with self-instruct type prompt augmentation.
10
 
11
  We use AI-generated 60K prompts in the training process.
12
 
 
1
 
2
+ This model is the RLHF version of `HuggingFaceH4/mistral-7b-sft-beta` without any external responses.
3
+
4
+ The external signal includes (1) Reward model; (2) AI-generated Prompts.
5
 
6
 
7
  **We obtain 35.95% win-rate on Alpaca Eval v2.**
8
 
9
  ## Model Details
10
 
11
+ We perform 3 iterations of GSHF algorithm on `HuggingFaceH4/mistral-7b-sft-beta` labeled by reward model, where prompts are generated by ChatGPT with self-instruct type prompt augmentation.
12
 
13
  We use AI-generated 60K prompts in the training process.
14