@mattshumer, CEO from Hyperwrite AI, had an idea he wanted to try out: why not fine-tune LLMs to always output their thoughts in specific parts, delineated by <thinking> tags?
Even better: inside of that, you could nest other sections, to reflect critically on previous output. Letโs name this part <reflection>. Planning is also put in a separate step.
He named the method โReflection tuningโ and set out to fine-tune a Llama-3.1-70B with it.
Well it turns out, it works mind-boggingly well!
๐คฏ Reflection-70B beats GPT-4o, Sonnet-3.5, and even the much bigger Llama-3.1-405B!
๐ง๐;๐๐ฅ ๐ฅ This new 70B open-weights model beats GPT-4o, Claude Sonnet, et al. โฐ 405B in training, coming soon ๐ Report coming next week โ๏ธ Uses GlaiveAI synthetic data ๐ค Available on HF!
Iโm starting an Inference Endpoint right now for this model to give it a spin!