ArkaAbacus
commited on
Commit
•
95d8600
1
Parent(s):
20ae9b0
Update README.md
Browse files
README.md
CHANGED
@@ -16,6 +16,7 @@ Abacus.AI presents our longer-necked variant of Llama 3 70B!
|
|
16 |
|
17 |
This model has an effective context length of approximately 128k.
|
18 |
|
|
|
19 |
This is an initial release and we are hoping to improve the heatmap below further as we continue training.
|
20 |
|
21 |
![image/png](https://cdn-uploads.huggingface.co/production/uploads/64c14f6b02e1f8f67c73bd05/_NVEuQ2ZT-sBtDBNjgmbt.png)
|
@@ -33,7 +34,11 @@ The scale factor for NTK is 4. Note that we also tried theta-scaling but this di
|
|
33 |
We utilise Positional Skip-wise Training (PoSE) with the following parameters:
|
34 |
|
35 |
- **Number of Chunks**: 5
|
36 |
-
- **Max position ID
|
|
|
|
|
|
|
|
|
37 |
|
38 |
### Hardware
|
39 |
|
|
|
16 |
|
17 |
This model has an effective context length of approximately 128k.
|
18 |
|
19 |
+
We have currently trained on ~1B tokens.
|
20 |
This is an initial release and we are hoping to improve the heatmap below further as we continue training.
|
21 |
|
22 |
![image/png](https://cdn-uploads.huggingface.co/production/uploads/64c14f6b02e1f8f67c73bd05/_NVEuQ2ZT-sBtDBNjgmbt.png)
|
|
|
34 |
We utilise Positional Skip-wise Training (PoSE) with the following parameters:
|
35 |
|
36 |
- **Number of Chunks**: 5
|
37 |
+
- **Max position ID**: 32768
|
38 |
+
|
39 |
+
### Data
|
40 |
+
|
41 |
+
We use on average ~8K long samples from [RedPajama](https://github.com/togethercomputer/RedPajama-Data).
|
42 |
|
43 |
### Hardware
|
44 |
|