@as-cle-bert on Hugging Face: "Hi HF community!🤗 Hope y'all are as excited as me for the release of Llama…"

very interseting as this time they DID update the codebase So it is a new model !
Forget the training !!!
most important is the codebase changes and context exteensions and sliding window implementaions as awell as rotary and scalled embeddings , they have not added the ring embeddings yet !

intesting againn is that ALL models are generally clones of the llama codebase !!
so they all enjoy incresed capabilitys :
mistral actually copied the llama codebase 100% with no changes !!!

Obviousy check out the codebases in the trnsformer library !

but in general the mistral 7b Will still outperform them as the NUMBERS are correct !
the llama 3 and all this models are released with BAD numbers with pure mismatches ! ( this is the trick when you want to release open source models and NOT share the capabilitys with the public ! ( in fact they are supposed to know the right numbers and generate a model for them self and pretrain yourself !
or they would be releasing a comerically READY! model !
the comerically Ready Models (guarded ) are kept on the company hosts !!!

So go and generate a model with the correct values and you will have a good model ! - ( mistral also realized this and released nemo ( 5120 hidden size (this is a bomb to the model )<<<< 5120 hidden size does noit follow ANY convention and cannot even factor down by 2 ! to a standard bit or byte size !<
hence all mathmatical operations (training and tensor calcs ) will be intensive and unnnatural breeding unnatural numbers for the model !( hence bad performance ) ----

So Pretrianing is a waste !

Join the conversation