gpt-3.5-turbo-0125
's JA performance, which is worth noting, and is tuned *exclusively* with the old shisa-v1 dataset (so it's chart position will be very short lived).shisa-ai/shisa-v1-llama3-70b
augmxnt/ultra-orca-boros-en-ja-v1
Join the community of Machine Learners and AI enthusiasts.
Sign Upgpt-3.5-turbo-0125
's JA performance, which is worth noting, and is tuned *exclusively* with the old shisa-v1 dataset (so it's chart position will be very short lived).I'll just add a note on the sampler parameters for testing that I found improved performance for virtually every model I tested: temperature 0.2, min_p 0.1, frequency_penalty 0.5 (a frequency/repetition penalty is required to minimize looping errors that otherwise creep into most of these models)
Also, I tested the new https://huggingface.co/DataPilot/ArrowPro-7B-KUJIRA model and it appears to be the real deal, very impressive performance, trained by a 15-yo (!) @Holy-fox - note that using my sampler settings detailed improved the score as well (as otherwise it suffered from looping errors as well).
I'll be aiming for beating that on the Llama 3 8B, and beating Command R Plus for the 70B in the coming days.