Congrats!

#3
by mlabonne - opened

Hi, I'm the author of the article you mentioned in the model card. Thank you for this credit and congrats to you! It's amazing, I'll share it :)

I've written a linkedin post already discussing the hilarity of the situation (someone follows a high quality medium tutorial and accidentally makes the SOTA 7b parameter model). Have to echo the congrats because this is the kind of situation that any decent technical reporter would write a story about.

Thank Allen for writing the post - I made video on this - hoping I didn't butcher the names too bad and overly hype the story - https://www.instagram.com/p/C1usp0bAXXF/

it's an 8.99B model but yes, this is the highest performing model under 10B params and that makes this model a fantastic choice for MoE...hoping to see more Mistral models get tuned up to 8.99B params for this exact reason...you can only make MoEs of models with the same parameter count, so I need more 8.99B models

Haha I am reading along and must indeed admit that it's kinda funny 😁

My mindset basically was: "Hey let's take the current highest ranked model and then see what happens if I apply the tricks used in this Medium article to it." To be honest I never even expected them to benchmark the model in the first place, let alone that it would actually perform well haha.

I appreciate the kind words though! Never imagined I'd be featured in a video like this thanks to me hobbying one night! Hope that this turns out actually helping people who are actively pushing the boundaries of these models to improve them even more. That would be a great thing to know, to have played a small part in that progress :)!

I also once again would like to say thank you to the main author for the great writeup. As stated all credits go to him. (For all those that asked me very technical questions about this model, I'm kindly referring those to the guy as well. You know, since he actually knows what he is talking about!)

Also a huge thanks to the guy that made the video, I very much enjoyed it. I'm a doctor in my professional life and will soon start a PhD on implementing AI in Oncology treatment. I can't wait to show my future supervisors that I am a "well known entity in the field, look there is even videos about me" ;)

Cheers!
-CultriX-

congrats @CultriX ! (and @mlabonne )

I just wanted to drop in to tell you how thrilled I am to see someone who describes themself as a non-expert produce this really nice result.

And who cares whether you’re a hobbyist or you’re a professional with multiple degrees and a couple of decades of experience? Really, everybody knows that none of that matters. Don’t stop what you’re doing!

I just wanted to drop in to tell you how thrilled I am to see someone who describes themself as a non-expert produce this really nice result.

And who cares whether you’re a hobbyist or you’re a professional with multiple degrees and a couple of decades of experience? Really, everybody knows that none of that matters. Don’t stop what you’re doing!

Thank you, really appreciate the kind words! :)

Amazing !!. Congratulations to both @CultriX and @mlabonne . A shoutout to @Hellisotherpeople and @rajistics for bringing it to the attention of the community. I tried the GGUF version of the model using CTransformers and it works well.

Not sure if I should post this to be honest, because it's very likely that it won't work out the way I want it to...

Buttt as faith would have it I conjured up a "hacky" and probably very inefficiënt way to automatically generate datasets containing automatically generated questions on a wide variety of topics together with two automatically generated answers to that question which then get evaluated by GPT-4 to determine which one of those answers is best. I spent quite a bit of time tweaking the evaluation prompt to the point where I felt it's judgement very closely resembled that of my own, and feel that it evaluates prompts pretty accurately now. It's not a fast process, but I will be building this epic "private", "never-before-seen", "proprietary" (and probably pretty terrible) dataset over the coming days. Then, when the time is right, I shall give it another shot and try to train another even better model using that new and unique dataset (and the google colab provided to me so generously by @mlabonne )! Atleast we'll know for sure that it won't contaminate the model, so that's a major win already as far as I'm concerned!

Let me be clear: It probably took a lot of luck to get the current model as "good" as it turned out to be, I realise that.
Therefore I also accept that my chances at success are slim at best and non-existant at worst.
I'm a doctor, not a computer scientist after all... (although I do like my few raspberry pi's and old laptops that make up the impressive backbone of my, ehem, "state-of-the-art" homelab!).

That said: "those who don't try never succeed", "try and fail but never fail to try", "when you try you risk failure but when you don't you ensure it" (I could go on here but you get the point, I hope!).
Figured I might as well give it a shot. Never claimed to know what I'm doing, so I'll be the first to laugh should it turn out a complete failure :)

I'll leave a post when it's done! (unless it gets crushed in the benchmark, then I'll probably opt for the silent retreat instead).

I'll leave a post when it's done! (unless it gets crushed in the benchmark, then I'll probably opt for the silent retreat instead).

I'm working on a dataset, myself. Going to set it up in ChatML format it will be made up of books, movies, and video game mechanics in order to train a text based virtual reality simulation of the best characters and environments. Almost finished adding lord of the rings to it now...feel free to comment on my profile what titles you'd like me to add to the knowledge base

Sign up or log in to comment