ArkaAbacus commited on
Commit
95d8600
1 Parent(s): 20ae9b0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -1
README.md CHANGED
@@ -16,6 +16,7 @@ Abacus.AI presents our longer-necked variant of Llama 3 70B!
16
 
17
  This model has an effective context length of approximately 128k.
18
 
 
19
  This is an initial release and we are hoping to improve the heatmap below further as we continue training.
20
 
21
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64c14f6b02e1f8f67c73bd05/_NVEuQ2ZT-sBtDBNjgmbt.png)
@@ -33,7 +34,11 @@ The scale factor for NTK is 4. Note that we also tried theta-scaling but this di
33
  We utilise Positional Skip-wise Training (PoSE) with the following parameters:
34
 
35
  - **Number of Chunks**: 5
36
- - **Max position ID **: 32768
 
 
 
 
37
 
38
  ### Hardware
39
 
 
16
 
17
  This model has an effective context length of approximately 128k.
18
 
19
+ We have currently trained on ~1B tokens.
20
  This is an initial release and we are hoping to improve the heatmap below further as we continue training.
21
 
22
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64c14f6b02e1f8f67c73bd05/_NVEuQ2ZT-sBtDBNjgmbt.png)
 
34
  We utilise Positional Skip-wise Training (PoSE) with the following parameters:
35
 
36
  - **Number of Chunks**: 5
37
+ - **Max position ID**: 32768
38
+
39
+ ### Data
40
+
41
+ We use on average ~8K long samples from [RedPajama](https://github.com/togethercomputer/RedPajama-Data).
42
 
43
  ### Hardware
44