Training run to compare Mixture-of-Depths, Bitnet

4 Models trained for 100k steps on Dolma

OLMo-50M - 50M parameter model
OLMo-50M-bitlinear - 50M parameter bitnet model
OLMo-50M-mod - 50M parameter mixture-of-depths model
OLMo-50M-mod-bitlinear - 50M parameter mixture-of-depths bitnet model

Repo has zip files which include training states and other files for each model. I am not the author of the mixture-of-depths implementation, it can be found here This is the first run. A few things might be broken, still a work in progress

0-hero
/

OLMo-50M-Mixture-of-Depths-Bitnet

Training run to compare Mixture-of-Depths, Bitnet

4 Models trained for 100k steps on Dolma

Dataset used to train 0-hero/OLMo-50M-Mixture-of-Depths-Bitnet