Repetition from tuning via https://huggingface.co/blog/mlabonne/orpo-llama-3
I followed the process tuning via the guide at https://huggingface.co/blog/mlabonne/orpo-llama-3, however in the end I get terrible repetition. I am using the configuration for generation on this model card, and downloaded and ran this model the same way and it was fine. Could it be I tuned from Nous Research weights, and there is an issue there? The tokenizer configs have the same md5 from my tune to this model, and the prompts look the same after constructing into messages. Any hints to the issue would be appreciated.
Edit: My buffer didn't paste correctly, I tuned from Nous Research weights. Fixed.
same, I run the script with the same parameters and my model generates garbage.
can't find the issue yet :(
I'm getting garbage repetition also from gguf conversion of the base model from https://huggingface.co/NousResearch/Meta-Llama-3-8B. This is from the new chat template and conversion merges in llama.cpp:
Support Llama 3 conversion #6745
Added llama-3 chat template #6751
So, another indicator of wonkiness from Nous Research!