princeton-nlp commited on
Commit
5d37d2c
1 Parent(s): cb604ca

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -21,7 +21,7 @@ Contact: `{tianyug, awettig}@princeton.edu`
21
 
22
  - Based on [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) (original max length: 8K), we produce a long-context instruction-tuned model that can stably handle up to 64K tokens. We also have a version that can process up to 512K tokens.
23
  - This model is trained on
24
- - 20B carefully curated data mixture of short and long data (max length 64K).
25
  - For the 512K version, we continue training the base model for 5B more tokens, with a mixture of short, long (64K), and ultra long (512K) data.
26
  - Then we fine-tuned them on [UltraChat](https://huggingface.co/datasets/stingning/ultrachat) to regain chat ability.
27
  - On a range of long-context tasks, our ProLong model achieves the top performance among models of similar sizes.
 
21
 
22
  - Based on [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) (original max length: 8K), we produce a long-context instruction-tuned model that can stably handle up to 64K tokens. We also have a version that can process up to 512K tokens.
23
  - This model is trained on
24
+ - 20B carefully curated data mixture of short and long data (max length 64K). You can find our base model [here](princeton_nlp/Llama-3-8B-ProLong-64k-Base).
25
  - For the 512K version, we continue training the base model for 5B more tokens, with a mixture of short, long (64K), and ultra long (512K) data.
26
  - Then we fine-tuned them on [UltraChat](https://huggingface.co/datasets/stingning/ultrachat) to regain chat ability.
27
  - On a range of long-context tasks, our ProLong model achieves the top performance among models of similar sizes.