metadata
library_name: transformers
tags: []
This is a Mistral-7B Reward Model trained on reciprocate/tinygsm_dpo
from transformers import pipeline
reward_fn = pipeline(
"text-classification",
model="reciprocate/mistral-7b-gsm8k-code-rm",
truncation=True,
max_length=4096,
function_to_apply="none"
)
prompt = """\
Consider the following grade-school math problem: Megan has read 32 books this year. Kelcie has read 1/4 the amount of books that Megan has read. Greg has read 9 more than twice the number of books that Kelcie has read. How many books total have Megan, Kelcie, and Greg read?
Solve this problem using code.
- Give the complete solution to solve the problem written in Python.
- The program should contain multiple lines of code and end with 'result = XXX'.
- Use markdown to format your response starting with '```python' and ending with '```'.
"""
output = """\
Let's solve this problem using Python code.
```python
books_megan = 32
books_kelcie = books_megan / 4
books_kelcie = int(books_kelcie)
books_greg = 2 * books_kelcie + 9
total_books = books_megan + books_kelcie + books_greg
result = total_books```
"""
chats = [[
{"role": "user", "content": prompt},
{"role": "assistant", "content": output}
]]
inputs = [reward_fn.tokenizer.apply_chat_template(chat, tokenize=False) for chat in chats]
output = reward_fn(inputs)
scores = [x["score"] for x in output]
print(scores)