Update README.md
Browse files
README.md
CHANGED
@@ -17,15 +17,15 @@ datasets:
|
|
17 |
We have used our own [OpenOrca dataset](https://huggingface.co/datasets/Open-Orca/OpenOrca) to fine-tune LLaMA-13B.
|
18 |
This dataset is our attempt to reproduce the dataset generated for Microsoft Research's [Orca Paper](https://arxiv.org/abs/2306.02707).
|
19 |
|
20 |
-
Want to visualize our dataset? Check out our [Nomic Atlas Map.](https://atlas.nomic.ai/map/c1b88b47-2d9b-47e0-9002-b80766792582/2560fd25-52fe-42f1-a58f-ff5eccc890d2)
|
21 |
-
[<img src="https://i.ibb.co/vdd1XQg/image.png" alt="Atlas Nomic Dataset Map" width="400" height="400" />](https://atlas.nomic.ai/map/c1b88b47-2d9b-47e0-9002-b80766792582/2560fd25-52fe-42f1-a58f-ff5eccc890d2)
|
22 |
-
|
23 |
We have trained on less than 6% of our data, just to give a preview of what is possible while we further refine our dataset!
|
24 |
We trained a refined selection of 200k GPT-4 entries from OpenOrca.
|
25 |
We have filtered our GPT-4 augmentations to remove statements like, "As an AI language model..." and other responses which have been shown to harm model reasoning capabilities. Further details on our dataset curation practices will be forthcoming with our full model releases.
|
26 |
|
27 |
This release highlights that even a small portion of our training data can produce state of the art results in this model class with training costs <$200 in total.
|
28 |
|
|
|
|
|
|
|
29 |
We are in-process with training more models, so keep a look out on our org for releases coming soon with exciting partners.
|
30 |
|
31 |
We will also give sneak-peak announcements on our Discord, which you can find here:
|
|
|
17 |
We have used our own [OpenOrca dataset](https://huggingface.co/datasets/Open-Orca/OpenOrca) to fine-tune LLaMA-13B.
|
18 |
This dataset is our attempt to reproduce the dataset generated for Microsoft Research's [Orca Paper](https://arxiv.org/abs/2306.02707).
|
19 |
|
|
|
|
|
|
|
20 |
We have trained on less than 6% of our data, just to give a preview of what is possible while we further refine our dataset!
|
21 |
We trained a refined selection of 200k GPT-4 entries from OpenOrca.
|
22 |
We have filtered our GPT-4 augmentations to remove statements like, "As an AI language model..." and other responses which have been shown to harm model reasoning capabilities. Further details on our dataset curation practices will be forthcoming with our full model releases.
|
23 |
|
24 |
This release highlights that even a small portion of our training data can produce state of the art results in this model class with training costs <$200 in total.
|
25 |
|
26 |
+
Want to visualize our dataset? Check out our [Nomic Atlas Map.](https://atlas.nomic.ai/map/c1b88b47-2d9b-47e0-9002-b80766792582/2560fd25-52fe-42f1-a58f-ff5eccc890d2)
|
27 |
+
[<img src="https://i.ibb.co/vdd1XQg/image.png" alt="Atlas Nomic Dataset Map" width="400" height="400" />](https://atlas.nomic.ai/map/c1b88b47-2d9b-47e0-9002-b80766792582/2560fd25-52fe-42f1-a58f-ff5eccc890d2)
|
28 |
+
|
29 |
We are in-process with training more models, so keep a look out on our org for releases coming soon with exciting partners.
|
30 |
|
31 |
We will also give sneak-peak announcements on our Discord, which you can find here:
|