princeton-nlp
commited on
Commit
•
5d37d2c
1
Parent(s):
cb604ca
Update README.md
Browse files
README.md
CHANGED
@@ -21,7 +21,7 @@ Contact: `{tianyug, awettig}@princeton.edu`
|
|
21 |
|
22 |
- Based on [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) (original max length: 8K), we produce a long-context instruction-tuned model that can stably handle up to 64K tokens. We also have a version that can process up to 512K tokens.
|
23 |
- This model is trained on
|
24 |
-
- 20B carefully curated data mixture of short and long data (max length 64K).
|
25 |
- For the 512K version, we continue training the base model for 5B more tokens, with a mixture of short, long (64K), and ultra long (512K) data.
|
26 |
- Then we fine-tuned them on [UltraChat](https://huggingface.co/datasets/stingning/ultrachat) to regain chat ability.
|
27 |
- On a range of long-context tasks, our ProLong model achieves the top performance among models of similar sizes.
|
|
|
21 |
|
22 |
- Based on [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) (original max length: 8K), we produce a long-context instruction-tuned model that can stably handle up to 64K tokens. We also have a version that can process up to 512K tokens.
|
23 |
- This model is trained on
|
24 |
+
- 20B carefully curated data mixture of short and long data (max length 64K). You can find our base model [here](princeton_nlp/Llama-3-8B-ProLong-64k-Base).
|
25 |
- For the 512K version, we continue training the base model for 5B more tokens, with a mixture of short, long (64K), and ultra long (512K) data.
|
26 |
- Then we fine-tuned them on [UltraChat](https://huggingface.co/datasets/stingning/ultrachat) to regain chat ability.
|
27 |
- On a range of long-context tasks, our ProLong model achieves the top performance among models of similar sizes.
|