Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
m-ricย 
posted an update Sep 6
Post
1912
๐Ÿคฏ ๐—” ๐—ป๐—ฒ๐˜„ ๐Ÿณ๐Ÿฌ๐—• ๐—ผ๐—ฝ๐—ฒ๐—ป-๐˜„๐—ฒ๐—ถ๐—ด๐—ต๐˜๐˜€ ๐—Ÿ๐—Ÿ๐—  ๐—ฏ๐—ฒ๐—ฎ๐˜๐˜€ ๐—–๐—น๐—ฎ๐˜‚๐—ฑ๐—ฒ-๐Ÿฏ.๐Ÿฑ-๐—ฆ๐—ผ๐—ป๐—ป๐—ฒ๐˜ ๐—ฎ๐—ป๐—ฑ ๐—š๐—ฃ๐—ง-๐Ÿฐ๐—ผ!

@mattshumer , CEO from Hyperwrite AI, had an idea he wanted to try out: why not fine-tune LLMs to always output their thoughts in specific parts, delineated by <thinking> tags?

Even better: inside of that, you could nest other sections, to reflect critically on previous output. Letโ€™s name this part <reflection>. Planning is also put in a separate step.

He named the method โ€œReflection tuningโ€ and set out to fine-tune a Llama-3.1-70B with it.

Well it turns out, it works mind-boggingly well!

๐Ÿคฏ Reflection-70B beats GPT-4o, Sonnet-3.5, and even the much bigger Llama-3.1-405B!

๐—ง๐—Ÿ;๐——๐—ฅ
๐ŸฅŠ This new 70B open-weights model beats GPT-4o, Claude Sonnet, et al.
โฐ 405B in training, coming soon
๐Ÿ“š Report coming next week
โš™๏ธ Uses GlaiveAI synthetic data
๐Ÿค— Available on HF!

Iโ€™m starting an Inference Endpoint right now for this model to give it a spin!

Check it out ๐Ÿ‘‰ mattshumer/Reflection-Llama-3.1-70B

I would love to see models with Apache or MIT licences be able to reflect. Imagine just simulating QuietStar and still get a performance lift. LETS GO! Creating training data for "thinking" can be done using existing datasets and prompting. I have a prompt template and accompanying dataset for just such an occasion.

Yes, it's been tested, and it's false. It's even worse than the regular LLAMA 3.1 70b. It's even funny to compare it to Claude.
https://www.reddit.com/r/LocalLLaMA/s/BH5A2ngyui