instruction-pretrain
commited on
Commit
•
935d41e
1
Parent(s):
cd3fa9c
Update README.md
Browse files
README.md
CHANGED
@@ -26,6 +26,8 @@ We explore supervised multitask pre-training by proposing ***Instruction Pre-Tra
|
|
26 |
- Domain-Specific Models Pre-Trained from Llama3-8B:
|
27 |
- [Finance-Llama3-8B](https://huggingface.co/instruction-pretrain/finance-Llama3-8B)
|
28 |
- [Biomedicine-Llama3-8B](https://huggingface.co/instruction-pretrain/medicine-Llama3-8B)
|
|
|
|
|
29 |
|
30 |
## General Pre-Training From Scratch
|
31 |
We augment the [RefinedWeb corproa](https://huggingface.co/datasets/tiiuae/falcon-refinedweb) with instruction-response pairs generated by our [context-based instruction synthesizer](https://huggingface.co/instruction-pretrain/instruction-synthesizer) to pre-train general langauge models from scratch.
|
|
|
26 |
- Domain-Specific Models Pre-Trained from Llama3-8B:
|
27 |
- [Finance-Llama3-8B](https://huggingface.co/instruction-pretrain/finance-Llama3-8B)
|
28 |
- [Biomedicine-Llama3-8B](https://huggingface.co/instruction-pretrain/medicine-Llama3-8B)
|
29 |
+
- General Instruction-Augmented Corpora: [general-instruction-augmented-corpora](https://huggingface.co/datasets/instruction-pretrain/general-instruction-augmented-corpora)
|
30 |
+
- Domain-Specific Instruction-Augmented Corpora (no finance data to avoid ethical issues): [medicine-instruction-augmented-corpora](https://huggingface.co/datasets/instruction-pretrain/medicine-instruction-augmented-corpora)
|
31 |
|
32 |
## General Pre-Training From Scratch
|
33 |
We augment the [RefinedWeb corproa](https://huggingface.co/datasets/tiiuae/falcon-refinedweb) with instruction-response pairs generated by our [context-based instruction synthesizer](https://huggingface.co/instruction-pretrain/instruction-synthesizer) to pre-train general langauge models from scratch.
|