--- library_name: transformers tags: [] --- This is a Mistral-7B Reward Model trained on [reciprocate/tinygsm_dpo](https://huggingface.co/datasets/reciprocate/tinygsm_dpo) ```python from transformers import pipeline reward_fn = pipeline( "text-classification", model="reciprocate/mistral-7b-gsm8k-code-rm", truncation=True, max_length=4096, function_to_apply="none" ) prompt = """\ Consider the following grade-school math problem: Megan has read 32 books this year. Kelcie has read 1/4 the amount of books that Megan has read. Greg has read 9 more than twice the number of books that Kelcie has read. How many books total have Megan, Kelcie, and Greg read? Solve this problem using code. - Give the complete solution to solve the problem written in Python. - The program should contain multiple lines of code and end with 'result = XXX'. - Use markdown to format your response starting with '```python' and ending with '```'. """ output = """\ Let's solve this problem using Python code. ```python books_megan = 32 books_kelcie = books_megan / 4 books_kelcie = int(books_kelcie) books_greg = 2 * books_kelcie + 9 total_books = books_megan + books_kelcie + books_greg result = total_books``` """ chats = [[ {"role": "user", "content": prompt}, {"role": "assistant", "content": output} ]] inputs = [reward_fn.tokenizer.apply_chat_template(chat, tokenize=False) for chat in chats] output = reward_fn(inputs) scores = [x["score"] for x in output] print(scores) ```