--- license: llama2 datasets: - EleutherAI/proof-pile-2 --- This is a one-layer base model with the LlaMA 2 architecture trained on 6B tokens of the algebraic-stack part of the Proof-pile 2 dataset. \ It's output distribution is thus mostly concerned with code. The tokenizer is the LlaMA 2 one. I used the following hyper parameters:\ dmodel = 512 \ dff = 2048 \ nheads = 4 \ nctx = 1024 For the training I used AdamW with weight decay = 0.05 and cosine annealing with 5000 warmup steps and maximum learning rate 1e-4. We used BF16 precision. \ \ Train loss: 2.6228\ Test loss: 2.7490