README.md · reciprocate/mistral-7b-gsm8k-code-rm at main

metadata

library_name: transformers
tags: []

This is a Mistral-7B Reward Model trained on reciprocate/tinygsm_dpo

from transformers import pipeline

reward_fn = pipeline(
    "text-classification",
    model="reciprocate/mistral-7b-gsm8k-code-rm",
    truncation=True,
    max_length=4096,
    function_to_apply="none"
)

prompt = """\
Consider the following grade-school math problem: Megan has read 32 books this year. Kelcie has read 1/4 the amount of books that Megan has read. Greg has read 9 more than twice the number of books that Kelcie has read. How many books total have Megan, Kelcie, and Greg read?
Solve this problem using code.
- Give the complete solution to solve the problem written in Python.
- The program should contain multiple lines of code and end with 'result = XXX'.
- Use markdown to format your response starting with '```python' and ending with '```'.
"""

output = """\
Let's solve this problem using Python code.
```python
books_megan = 32
books_kelcie = books_megan / 4
books_kelcie = int(books_kelcie)
books_greg = 2 * books_kelcie + 9
total_books = books_megan + books_kelcie + books_greg
result = total_books```
"""

chats = [[
    {"role": "user", "content": prompt},
    {"role": "assistant", "content":  output}
]]

inputs = [reward_fn.tokenizer.apply_chat_template(chat, tokenize=False) for chat in chats]
output = reward_fn(inputs)
scores = [x["score"] for x in output]
print(scores)