Chopping a model ! :
Simular to starting a model from scratch ! - 3b -
Weights are not alligned and need to be fine tuned hence this model needs lots of tuning to be the basic model! but on some datasets there is a fit; so data weight is imortant so as much as possible will be installed ! ie the orca and the dolphin datasets are good datasets to begin allignment of a fresh model ! As well as the open webMath dataset!
STEP 1 : Next Word Prediction :
Text Corpuses are a good way to add generalized speech and text generation capabilitys . It may not generate what you expect at this time but it should be highly trained to create some form of inteligent speech : this is the first goal! :
Configuration
Perhaps i should have nstanciated a full mistral model instead ? (i did before but the training took hrs and hrs so! smaller is better ! later we can make experts ! for merging the same 3b models as that strategy has actually worked ie the merge/fine tune/remerge ) The following YAML configuration was used to produce this model:
- Downloads last month
- 7