|
--- |
|
license: llama3.2 |
|
language: |
|
- pl |
|
- en |
|
- es |
|
- de |
|
base_model: |
|
- radlab/pLLama3.2-1B |
|
--- |
|
|
|
![image/png](https://cdn-uploads.huggingface.co/production/uploads/644addfe9279988e0cbc296b/By2Lf91tpMfxJGLH80BAa.png) |
|
|
|
### Intro |
|
We have released a collection of radlab/pLLama3.2 models, which we have trained into Polish. The trained version is able to communicate more precisely with the user than the base version of meta-llama/Meta-Llama-3.2 models. As part of the collection, we provide models in 1B and 3B architecture. |
|
Each model is available in two configurations: |
|
- radlab/pLLama3-1B, a model in architecture 1B only after fine-tuning |
|
- radlab/pLLama3-1B-DPO, a model in architecture 1B after fine-tuning and DPO process |
|
- radlab/pLLama3-3B, a model in architecture 3B only after fine-tuning |
|
- radlab/pLLama3-3B-DPO, a model in architecture 3B after fine-tuning and DPO process |
|
|
|
### Dataset |
|
In addition to the instruction datasets publicly available for Polish, we developed our own dataset, which contains about 650,000 instructions. This data was semi-automatically generated using other publicly available datasets. |
|
In addition, we developed a learning dataset for the DPO process, which contained 100k examples in which we taught the model to select correctly written versions of texts from those with language errors. |
|
|
|
### Learning |
|
The learning process was divided into two stages: |
|
- Post-training on a set of 650k instructions in Polish, the fine-tuning time was set to 5 epochs. |
|
- After the FT stage, we retrained the model using DPO on 100k instructions of correct writing in Polish, in this case we set the learning time to 15k steps. |
|
|
|
### Proposed parameters: |
|
* temperature: 0.6 |
|
* repetition_penalty: 1.0 |
|
|
|
### Outro |
|
Enjoy! |