princeton-nlp
/

Llama-3-8B-ProLong-64k-Instruct

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

princeton-nlp commited on Jul 22

Commit

5d37d2c

•

1 Parent(s): cb604ca

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -21,7 +21,7 @@ Contact: `{tianyug, awettig}@princeton.edu`
 - Based on [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) (original max length: 8K), we produce a long-context instruction-tuned model that can stably handle up to 64K tokens. We also have a version that can process up to 512K tokens.
 - This model is trained on
-    - 20B carefully curated data mixture of short and long data (max length 64K).
     - For the 512K version, we continue training the base model for 5B more tokens, with a mixture of short, long (64K), and ultra long (512K) data.
     - Then we fine-tuned them on [UltraChat](https://huggingface.co/datasets/stingning/ultrachat) to regain chat ability.
 - On a range of long-context tasks, our ProLong model achieves the top performance among models of similar sizes.

 - Based on [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) (original max length: 8K), we produce a long-context instruction-tuned model that can stably handle up to 64K tokens. We also have a version that can process up to 512K tokens.
 - This model is trained on
+    - 20B carefully curated data mixture of short and long data (max length 64K). You can find our base model [here](princeton_nlp/Llama-3-8B-ProLong-64k-Base).
     - For the 512K version, we continue training the base model for 5B more tokens, with a mixture of short, long (64K), and ultra long (512K) data.
     - Then we fine-tuned them on [UltraChat](https://huggingface.co/datasets/stingning/ultrachat) to regain chat ability.
 - On a range of long-context tasks, our ProLong model achieves the top performance among models of similar sizes.