Llemma-reward-model / README.md

Update README.md

abea616 verified 2 months ago

269 Bytes

metadata

license: apache-2.0

This is a reward model finetuned on Llemma-34b. To score the steps, pass encoded text = question + solution as input.

rewards = model(text).mean(dim=-1).sigmoid()[index]

Where index is the positions for special end tokens of each step.