pkedzia commited on
Commit
be3dbc3
1 Parent(s): f99068b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +50 -3
README.md CHANGED
@@ -1,3 +1,50 @@
1
- ---
2
- license: llama3
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: llama3
3
+ language:
4
+ - pl
5
+ - en
6
+ - es
7
+ - de
8
+ ---
9
+
10
+
11
+ ### Intro
12
+ We have released a collection of radlab/pLLama3 models, which we have trained into Polish. The trained version is able to communicate more precisely with the user than the base version of meta-llama/Meta-Llama-3 models. As part of the collection, we provide models in 8B and 70B architecture.
13
+ We make models in the 8B architecture available in two configurations:
14
+ - radlab/pLLama3-8B-creator, a model that gives fairly short, specific answers to user queries;
15
+ - radlab/pLLama3-8B-chat, a model that is a chatty version that reflects the behavior of the original meta-llama/Meta-Llama-3-8B-Instruct model.
16
+
17
+ ### Dataset
18
+ In addition to the instruction datasets publicly available for Polish, we developed our own dataset, which contains about 650,000 instructions. This data was semi-automatically generated using other publicly available datasets.
19
+ In addition, we developed a learning dataset for the DPO process, which contained 100k examples in which we taught the model to select correctly written versions of texts from those with language errors.
20
+
21
+ ### Learning
22
+ The learning process was divided into two stages:
23
+ - Post-training on a set of 650k instructions in Polish, the fine-tuning time was set to 5 epochs.
24
+ - After the FT stage, we retrained the model using DPO on 100k instructions of correct writing in Polish, in this case we set the learning time to 15k steps.
25
+
26
+ The models we released are the ones after FT and the DPO process.
27
+
28
+ Post-FT learning metrics:
29
+ - `eval/loss`: `0.8690009713172913`
30
+ - `eval/runtime` :`464.5158`
31
+ - `eval/samples_per_second`: `8.611`
32
+ - `eval/steps_per_second`: `8.611`
33
+
34
+ Post-DPO learning metrics:
35
+ - `eval/logits/chosen`: `0.1370937079191208`
36
+ - `eval/logits/rejected`: `0.07430506497621536`
37
+ - `eval/logps/chosen`: `-454.11962890625`
38
+ - `eval/logps/rejected`: `-764.1261596679688`
39
+ - `eval/loss`: `0.05717926099896431`
40
+ - `eval/rewards/accuracies`: `0.9372459053993224`
41
+ - `eval/rewards/chosen`: `-26.75682830810547`
42
+ - `eval/rewards/margins`: `32.37759780883789`
43
+ - `eval/rewards/rejected`: `-59.134429931640625`
44
+ - `eval/runtime`: `1,386.3177`
45
+ - `eval/samples_per_second`: `2.838`
46
+ - `eval/steps_per_second`: `1.42`
47
+
48
+ ### Outro
49
+
50
+ Enjoy!