README.md · theblackcat102/electra-large-reward-model at c135a1c39d22e9bdbb0a6dac0fe7e3f1e55253fe

metadata

language:
  - en
tags:
  - webgpt
  - regression
  - reward-model
license: apache-2.0
datasets:
  - openai/webgpt_comparisons
  - openai/summarize_from_feedback
metrics:
  - accuracy

Reward Model pretrained on openai/webgpt_comparison and humanfeedback summary. Unlike the other electra-large model this model is trained using rank loss with one more datasets.

On validation dataset the result is much more stable than usual.

You can refer to this wandb for more details

Slightly better than previous webgpt only model : electra-large