Clarity on the reward model

#5
by RishabBairi12 - opened

Could you kindly share how this reward model was trained. A little code insight . I am trying to use a custom trained reward model. But it is giving erratic outputs.

Sign up or log in to comment