radlab
/

pLLama3-8B-chat

Text Generation

text-generation-inference

Inference Endpoints

8-bit precision

Model card Files Files and versions Community

pkedzia commited on Aug 16

Commit

be3dbc3

•

1 Parent(s): f99068b

Update README.md

Files changed (1) hide show

README.md +50 -3

README.md CHANGED Viewed

@@ -1,3 +1,50 @@
----
-license: llama3
----

+---
+license: llama3
+language:
+- pl
+- en
+- es
+- de
+---
+### Intro
+We have released a collection of radlab/pLLama3 models, which we have trained into Polish. The trained version is able to communicate more precisely with the user than the base version of meta-llama/Meta-Llama-3 models. As part of the collection, we provide models in 8B and 70B architecture.
+We make models in the 8B architecture available in two configurations:
+- radlab/pLLama3-8B-creator, a model that gives fairly short, specific answers to user queries;
+- radlab/pLLama3-8B-chat, a model that is a chatty version that reflects the behavior of the original meta-llama/Meta-Llama-3-8B-Instruct model.
+### Dataset
+In addition to the instruction datasets publicly available for Polish, we developed our own dataset, which contains about 650,000 instructions. This data was semi-automatically generated using other publicly available datasets.
+In addition, we developed a learning dataset for the DPO process, which contained 100k examples in which we taught the model to select correctly written versions of texts from those with language errors.
+### Learning
+The learning process was divided into two stages:
+- Post-training on a set of 650k instructions in Polish, the fine-tuning time was set to 5 epochs.
+- After the FT stage, we retrained the model using DPO on 100k instructions of correct writing in Polish, in this case we set the learning time to 15k steps.
+The models we released are the ones after FT and the DPO process.
+Post-FT learning metrics:
+ - `eval/loss`: `0.8690009713172913`
+ - `eval/runtime` :`464.5158`
+ - `eval/samples_per_second`: `8.611`
+ - `eval/steps_per_second`: `8.611`
+Post-DPO learning metrics:
+ - `eval/logits/chosen`: `0.1370937079191208`
+ - `eval/logits/rejected`: `0.07430506497621536`
+ - `eval/logps/chosen`: `-454.11962890625`
+ - `eval/logps/rejected`: `-764.1261596679688`
+ - `eval/loss`: `0.05717926099896431`
+ - `eval/rewards/accuracies`: `0.9372459053993224`
+ - `eval/rewards/chosen`: `-26.75682830810547`
+ - `eval/rewards/margins`: `32.37759780883789`
+ - `eval/rewards/rejected`: `-59.134429931640625`
+ - `eval/runtime`: `1,386.3177`
+ - `eval/samples_per_second`: `2.838`
+ - `eval/steps_per_second`: `1.42`
+ ### Outro
+ Enjoy!