Post
1912
๐คฏ ๐ ๐ป๐ฒ๐ ๐ณ๐ฌ๐ ๐ผ๐ฝ๐ฒ๐ป-๐๐ฒ๐ถ๐ด๐ต๐๐ ๐๐๐ ๐ฏ๐ฒ๐ฎ๐๐ ๐๐น๐ฎ๐๐ฑ๐ฒ-๐ฏ.๐ฑ-๐ฆ๐ผ๐ป๐ป๐ฒ๐ ๐ฎ๐ป๐ฑ ๐๐ฃ๐ง-๐ฐ๐ผ!
@mattshumer , CEO from Hyperwrite AI, had an idea he wanted to try out: why not fine-tune LLMs to always output their thoughts in specific parts, delineated by <thinking> tags?
Even better: inside of that, you could nest other sections, to reflect critically on previous output. Letโs name this part <reflection>. Planning is also put in a separate step.
He named the method โReflection tuningโ and set out to fine-tune a Llama-3.1-70B with it.
Well it turns out, it works mind-boggingly well!
๐คฏ Reflection-70B beats GPT-4o, Sonnet-3.5, and even the much bigger Llama-3.1-405B!
๐ง๐;๐๐ฅ
๐ฅ This new 70B open-weights model beats GPT-4o, Claude Sonnet, et al.
โฐ 405B in training, coming soon
๐ Report coming next week
โ๏ธ Uses GlaiveAI synthetic data
๐ค Available on HF!
Iโm starting an Inference Endpoint right now for this model to give it a spin!
Check it out ๐ mattshumer/Reflection-Llama-3.1-70B
@mattshumer , CEO from Hyperwrite AI, had an idea he wanted to try out: why not fine-tune LLMs to always output their thoughts in specific parts, delineated by <thinking> tags?
Even better: inside of that, you could nest other sections, to reflect critically on previous output. Letโs name this part <reflection>. Planning is also put in a separate step.
He named the method โReflection tuningโ and set out to fine-tune a Llama-3.1-70B with it.
Well it turns out, it works mind-boggingly well!
๐คฏ Reflection-70B beats GPT-4o, Sonnet-3.5, and even the much bigger Llama-3.1-405B!
๐ง๐;๐๐ฅ
๐ฅ This new 70B open-weights model beats GPT-4o, Claude Sonnet, et al.
โฐ 405B in training, coming soon
๐ Report coming next week
โ๏ธ Uses GlaiveAI synthetic data
๐ค Available on HF!
Iโm starting an Inference Endpoint right now for this model to give it a spin!
Check it out ๐ mattshumer/Reflection-Llama-3.1-70B