These files have been modified to work with Quiet-Star. I had to adjust the attention mask slightly, but it seems to work as intended. I plan to fine-tune it on a more generalized dataset to reduce its reliance on using math. Any input from you on the inference script would be appreciated.

Ready to merge
This branch is ready to get merged automatically.

Sign up or log in to comment