Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,50 @@
|
|
1 |
-
---
|
2 |
-
license: llama3
|
3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: llama3
|
3 |
+
language:
|
4 |
+
- pl
|
5 |
+
- en
|
6 |
+
- es
|
7 |
+
- de
|
8 |
+
---
|
9 |
+
|
10 |
+
|
11 |
+
### Intro
|
12 |
+
We have released a collection of radlab/pLLama3 models, which we have trained into Polish. The trained version is able to communicate more precisely with the user than the base version of meta-llama/Meta-Llama-3 models. As part of the collection, we provide models in 8B and 70B architecture.
|
13 |
+
We make models in the 8B architecture available in two configurations:
|
14 |
+
- radlab/pLLama3-8B-creator, a model that gives fairly short, specific answers to user queries;
|
15 |
+
- radlab/pLLama3-8B-chat, a model that is a chatty version that reflects the behavior of the original meta-llama/Meta-Llama-3-8B-Instruct model.
|
16 |
+
|
17 |
+
### Dataset
|
18 |
+
In addition to the instruction datasets publicly available for Polish, we developed our own dataset, which contains about 650,000 instructions. This data was semi-automatically generated using other publicly available datasets.
|
19 |
+
In addition, we developed a learning dataset for the DPO process, which contained 100k examples in which we taught the model to select correctly written versions of texts from those with language errors.
|
20 |
+
|
21 |
+
### Learning
|
22 |
+
The learning process was divided into two stages:
|
23 |
+
- Post-training on a set of 650k instructions in Polish, the fine-tuning time was set to 5 epochs.
|
24 |
+
- After the FT stage, we retrained the model using DPO on 100k instructions of correct writing in Polish, in this case we set the learning time to 15k steps.
|
25 |
+
|
26 |
+
The models we released are the ones after FT and the DPO process.
|
27 |
+
|
28 |
+
Post-FT learning metrics:
|
29 |
+
- `eval/loss`: `0.8690009713172913`
|
30 |
+
- `eval/runtime` :`464.5158`
|
31 |
+
- `eval/samples_per_second`: `8.611`
|
32 |
+
- `eval/steps_per_second`: `8.611`
|
33 |
+
|
34 |
+
Post-DPO learning metrics:
|
35 |
+
- `eval/logits/chosen`: `0.1370937079191208`
|
36 |
+
- `eval/logits/rejected`: `0.07430506497621536`
|
37 |
+
- `eval/logps/chosen`: `-454.11962890625`
|
38 |
+
- `eval/logps/rejected`: `-764.1261596679688`
|
39 |
+
- `eval/loss`: `0.05717926099896431`
|
40 |
+
- `eval/rewards/accuracies`: `0.9372459053993224`
|
41 |
+
- `eval/rewards/chosen`: `-26.75682830810547`
|
42 |
+
- `eval/rewards/margins`: `32.37759780883789`
|
43 |
+
- `eval/rewards/rejected`: `-59.134429931640625`
|
44 |
+
- `eval/runtime`: `1,386.3177`
|
45 |
+
- `eval/samples_per_second`: `2.838`
|
46 |
+
- `eval/steps_per_second`: `1.42`
|
47 |
+
|
48 |
+
### Outro
|
49 |
+
|
50 |
+
Enjoy!
|