File size: 3,451 Bytes
daf4ff7 4c869cc 0ec7f52 4c869cc 0ec7f52 bd91e8c 4c869cc daf4ff7 cce6f01 0cdb686 daf4ff7 4c869cc daf4ff7 ba3a216 daf4ff7 ba3a216 daf4ff7 ba3a216 daf4ff7 9d05c14 daf4ff7 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 |
---
language:
- ko
datasets:
- DopeorNope/DPO-Ko-Dataset
- DopeorNope/Orca_Near_Dedup-v2
library_name: transformers
pipeline_tag: text-generation
license: cc-by-nc-sa-4.0
---
**(์ฃผ)๋ฏธ๋์ด๊ทธ๋ฃน์ฌ๋๊ณผ์ฒ๊ณผ (์ฃผ)๋ง์ปค์ LLM ์ฐ๊ตฌ ์ปจ์์์์ผ๋ก ๊ฐ๋ฐ๋ ๋ชจ๋ธ์
๋๋ค**
**DopeorNope๊ฐ๋ฐ์๊ฐ ํ๋ จํ์ฌ ์
๋ก๋ํ ๋ชจ๋ธ์
๋๋ค**
**๊ฐ๋ฐ์ ๊ถํ์ DopeorNope(Seungyoo Lee)์๊ฒ ์์ผ๋ฉฐ, ๋ชจ๋ธ ๋ฌธ์์ฌํญ์ ์ปจํ ๋ฐ๋๋๋ค**
**The license is `cc-by-nc-sa-4.0`.**
# **๐ปโโ๏ธCOKAL-DPO_13b-v2๐ปโโ๏ธ**
![img](https://drive.google.com/uc?export=view&id=1YGBxz-UhQGHZ2K6cTXmTnB13fRgaQilX)
## Model Details
**Model Developers** Seungyoo Lee (DopeorNope)
**Input** Models input text only.
**Output** Models generate text only.
**Model Architecture**
COKAL-DPO_13b-v2 is an auto-regressive 13B language model based on the LLaMA2 transformer architecture.
**Base Model** [DopeorNope/COKAL_pre_DPO_Test_v2-13b](https://huggingface.co/DopeorNope/COKAL_pre_DPO_Test_v2-13b)
DopeorNope/COKAL_pre_DPO_Test_v2-13b is the SFT model to train with DPO methodology.
**Training Dataset**
- DPO training dataset: [DopeorNope/DPO-Ko-Dataset](private) - private
This dataset was constructed by directly collecting and reorganizing data by DopeorNope, obtaining insights from ["lvwerra/stack-exchange-paired"](https://huggingface.co/datasets/lvwerra/stack-exchange-paired) to create a paired dataset. (It means I do not use stack-exchange-paired; I just got an insight from it.)
- SFT training dataset: [DopeorNope/Orca_Near_Dedup-v2](private) - private
This dataset is based on ["kyujinpy/OpenOrca-KO"](https://huggingface.co/datasets/kyujinpy/OpenOrca-KO) and has been processed using the Near Dedup algorithm to remove items with a Jaccard Similarity threshold of 0.8 or higher. In addition, inconsistent inputs have been cleaned and modified.
**Training**
The difference between "DopeorNope/COKAL-DPO_test-v2" and this model is that this model has different hyper-parameters from the one in that setting regarding the final version.
I developed the model in an environment with four RTX 3090 GPUs running Ubuntu 18.04.
It seems that when uploading the model directly to a repository from a Linux server, there may be an issue causing the model to appear to have more parameters. However, this model is based on a 13B architecture.
**Reference papers**
- Data Strategy:
- [LIMA(Zhou et al., 2023)](https://arxiv.org/abs/2305.11206)
- [Near Dedup algorithm(Lee et al., 2022)](https://arxiv.org/abs/2107.06499)
- Model Architecture:
- [Llama2(Touvron et al., 2023)](https://arxiv.org/abs/2307.09288)
# Implementation Code
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
repo = "HumanF-MarkrAI/COKAL-DPO-13b-v2"
model = AutoModelForCausalLM.from_pretrained(
repo,
return_dict=True,
torch_dtype=torch.float16,
device_map='auto'
)
model_tokenizer = AutoTokenizer.from_pretrained(repo)
```
# Acknowledgement
์ด ๋ชจ๋ธ์ ๊ณผํ๊ธฐ์ ์ ๋ณดํต์ ๋ถยท๊ด์ฃผ๊ด์ญ์๊ฐ ๊ณต๋ ์ง์ํ '์ธ๊ณต์ง๋ฅ ์ค์ฌ ์ฐ์
์ตํฉ ์ง์ ๋จ์ง ์กฐ์ฑ์ฌ์
'์ผ๋ก ์ง์์ ๋ฐ์ ์ํ๋ ์ฐ๊ตฌ ๊ฒฐ๊ณผ์
๋๋ค.
This model was supported by Artificial intelligence industrial convergence cluster development project funded by the Ministry of Science and ICT(MSIT, Korea)&Gwangju Metropolitan City.
--- |