theblackcat102
/

roberta-base-webgpt-rm

Text Classification

Inference Endpoints

Model card Files Files and versions Metrics Training metrics Community

theblackcat102 commited on Dec 25, 2022

Commit

b1e1683

•

1 Parent(s): 57a3e97

Create README.md

Files changed (1) hide show

README.md +56 -0

README.md ADDED Viewed

	@@ -0,0 +1,56 @@

+---
+language:
+- en
+tags:
+- webgpt
+- regression
+- reward-model
+license: "apache-2.0"
+datasets:
+- openai/webgpt_comparisons
+metrics:
+- accuracy
+---
+# Reward Model pretrained on openai/webgpt_comparison
+Reward model finetuned from existing pretrain model.
+Things that aligned with the orignal papers
+* Overfits easily using rank loss
+* Small learning rate
+Different from the papers
+* Small model performs bad due to lack of world knowledge, since the validation accuracy doesn't even reach 60%. OpenAI RM had 6B parameters.
+* Train using a 80-20 train-validation split on torch AMP settings
+Other models I had tried
+* bloomz-560m : embedding size doesn't worth the training, since this dataset only contain english prompt
+* gpt2-large : not stable
+* gpt2-base : not stable
+# Performance on validation split
+| model  | val acc  | val loss (rank loss)  |
+|---|---|---|
+| [roberta-base](https://huggingface.co/theblackcat102/roberta-base-webgpt-rm)  | 56.21  |  0.71 |
+| [roberta-large](https://huggingface.co/theblackcat102/roberta-large-webgpt-rm)  | 57.89  |  0.67 |
+| [electra-base](https://huggingface.co/theblackcat102/electra-base-webgpt-rm)  | 57.02  | 0.70  |
+| [electra-large](https://huggingface.co/theblackcat102/electra-large-webgpt-rm)  | 58.75  | 0.69  |
+Tensorboard logs are located under runs/
+# Note:
+* You will have to reweight this model output such that the mean rewards equals to 0