File size: 632 Bytes
ed9ef7d
4158638
780020f
 
b1d24a6
 
 
640fc4a
b1d24a6
b3649bb
 
 
57113d9
b3649bb
 
 
57113d9
b3649bb
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
---
license: llama2
datasets:
- EleutherAI/proof-pile-2
---

This is a one-layer base model with the LlaMA 2 architecture trained on 6B tokens of the algebraic-stack part of the Proof-pile 2 dataset. \
It's output distribution is thus mostly concerned with code. 
The tokenizer is the LlaMA 2 one. I used the following hyper parameters:\
d<sub>model</sub> = 512 \
d<sub>ff</sub> = 2048 \
n<sub>heads</sub> = 4 \
n<sub>ctx</sub> = 1024

For the training I used AdamW with weight decay = 0.05 and cosine annealing with 5000 warmup steps and maximum learning rate 1e-4. We used BF16 precision. \
\
Train loss: 2.6228\
Test loss: 2.7490