abacusai
/

Llama-3-Giraffe-70B

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

ArkaAbacus commited on Apr 30

Commit

95d8600

•

1 Parent(s): 20ae9b0

Update README.md

Files changed (1) hide show

README.md +6 -1

README.md CHANGED Viewed

@@ -16,6 +16,7 @@ Abacus.AI presents our longer-necked variant of Llama 3 70B!
 This model has an effective context length of approximately 128k.
 This is an initial release and we are hoping to improve the heatmap below further as we continue training.
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64c14f6b02e1f8f67c73bd05/_NVEuQ2ZT-sBtDBNjgmbt.png)
@@ -33,7 +34,11 @@ The scale factor for NTK is 4. Note that we also tried theta-scaling but this di
 We utilise Positional Skip-wise Training (PoSE) with the following parameters:
 - **Number of Chunks**: 5
-- **Max position ID **: 32768
 ### Hardware

 This model has an effective context length of approximately 128k.
+We have currently trained on ~1B tokens.
 This is an initial release and we are hoping to improve the heatmap below further as we continue training.
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64c14f6b02e1f8f67c73bd05/_NVEuQ2ZT-sBtDBNjgmbt.png)
 We utilise Positional Skip-wise Training (PoSE) with the following parameters:
 - **Number of Chunks**: 5
+- **Max position ID**: 32768
+### Data
+We use on average ~8K long samples from [RedPajama](https://github.com/togethercomputer/RedPajama-Data).
 ### Hardware