File size: 269 Bytes
abea616
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
---
license: apache-2.0
---
This is a reward model finetuned on Llemma-34b.
To score the steps, pass encoded text = question + solution as input.

    rewards = model(text).mean(dim=-1).sigmoid()[index]

Where index is the positions for special end tokens of each step.