@m-ric on Hugging Face: "🤯 𝗔 𝗻𝗲𝘄 𝟳𝟬𝗕 𝗼𝗽𝗲𝗻-𝘄𝗲𝗶𝗴𝗵𝘁𝘀 𝗟𝗟𝗠 𝗯𝗲𝗮𝘁𝘀…"

Hugging Face

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Back to feed

m-ric

posted an update Sep 6

Post

1912

🤯 𝗔 𝗻𝗲𝘄 𝟳𝟬𝗕 𝗼𝗽𝗲𝗻-𝘄𝗲𝗶𝗴𝗵𝘁𝘀 𝗟𝗟𝗠 𝗯𝗲𝗮𝘁𝘀 𝗖𝗹𝗮𝘂𝗱𝗲-𝟯.𝟱-𝗦𝗼𝗻𝗻𝗲𝘁 𝗮𝗻𝗱 𝗚𝗣𝗧-𝟰𝗼!

@mattshumer , CEO from Hyperwrite AI, had an idea he wanted to try out: why not fine-tune LLMs to always output their thoughts in specific parts, delineated by <thinking> tags?

Even better: inside of that, you could nest other sections, to reflect critically on previous output. Let’s name this part <reflection>. Planning is also put in a separate step.

He named the method “Reflection tuning” and set out to fine-tune a Llama-3.1-70B with it.

Well it turns out, it works mind-boggingly well!

🤯 Reflection-70B beats GPT-4o, Sonnet-3.5, and even the much bigger Llama-3.1-405B!

𝗧𝗟;𝗗𝗥
🥊 This new 70B open-weights model beats GPT-4o, Claude Sonnet, et al.
⏰ 405B in training, coming soon
📚 Report coming next week
⚙️ Uses GlaiveAI synthetic data
🤗 Available on HF!

I’m starting an Inference Endpoint right now for this model to give it a spin!

Check it out 👉 mattshumer/Reflection-Llama-3.1-70B

gr0010

Sep 7

Hi, I made a similar 8B version:
https://huggingface.co/AGI-0/Artificium-llama3.1-8B-001

trollek

Sep 7

I would love to see models with Apache or MIT licences be able to reflect. Imagine just simulating QuietStar and still get a performance lift. LETS GO! Creating training data for "thinking" can be done using existing datasets and prompting. I have a prompt template and accompanying dataset for just such an occasion.

raidhon

Sep 8

Yes, it's been tested, and it's false. It's even worse than the regular LLAMA 3.1 70b. It's even funny to compare it to Claude.
https://www.reddit.com/r/LocalLLaMA/s/BH5A2ngyui

In this post