Clarity on the reward model
#5
by
RishabBairi12
- opened
Could you kindly share how this reward model was trained. A little code insight . I am trying to use a custom trained reward model. But it is giving erratic outputs.