|
--- |
|
language: |
|
- ko |
|
license: apache-2.0 |
|
library_name: transformers |
|
tags: |
|
- text2text-generation |
|
datasets: |
|
- aihub |
|
metrics: |
|
- bleu |
|
- rouge |
|
|
|
|
|
model-index: |
|
- name: ko-barTNumText |
|
results: |
|
- task: |
|
type: text2text-generation |
|
name: text2text-generation |
|
metrics: |
|
- type: bleu |
|
value: 0.9313276940897475 |
|
name: eval_bleu |
|
verified: false |
|
- type: rouge1 |
|
value: 0.9607081256861959 |
|
name: eval_rouge1 |
|
verified: false |
|
- type: rouge2 |
|
value: 0.9394649136169404 |
|
name: eval_rouge2 |
|
verified: false |
|
- type: rougeL |
|
value: 0.9605735834651536 |
|
name: eval_rougeL |
|
verified: false |
|
- type: rougeLsum |
|
value: 0.9605993760190767 |
|
name: eval_rougeLsum |
|
verified: false |
|
--- |
|
|
|
# ko-barTNumText(TNT Model๐งจ): Try Number To Korean Reading(์ซ์๋ฅผ ํ๊ธ๋ก ๋ฐ๊พธ๋ ๋ชจ๋ธ) |
|
|
|
## Table of Contents |
|
- [ko-barTNumText(TNT Model๐งจ): Try Number To Korean Reading(์ซ์๋ฅผ ํ๊ธ๋ก ๋ฐ๊พธ๋ ๋ชจ๋ธ)](#ko-bartnumtexttnt-model-try-number-to-korean-reading์ซ์๋ฅผ-ํ๊ธ๋ก-๋ฐ๊พธ๋-๋ชจ๋ธ) |
|
- [Table of Contents](#table-of-contents) |
|
- [Model Details](#model-details) |
|
- [Uses](#uses) |
|
- [Evaluation](#evaluation) |
|
- [How to Get Started With the Model](#how-to-get-started-with-the-model) |
|
|
|
|
|
## Model Details |
|
- **Model Description:** |
|
๋ญ๊ฐ ์ฐพ์๋ด๋ ๋ชจ๋ธ์ด๋ ์๊ณ ๋ฆฌ์ฆ์ด ๋ฑํ ์์ด์ ๋ง๋ค์ด๋ณธ ๋ชจ๋ธ์
๋๋ค. <br /> |
|
BartForConditionalGeneration Fine-Tuning Model For Number To Korean <br /> |
|
BartForConditionalGeneration์ผ๋ก ํ์ธํ๋ํ, ์ซ์๋ฅผ ํ๊ธ๋ก ๋ณํํ๋ Task ์
๋๋ค. <br /> |
|
|
|
- Dataset use [Korea aihub](https://aihub.or.kr/aihubdata/data/list.do?currMenu=115&topMenu=100&srchDataRealmCode=REALM002&srchDataTy=DATA004) <br /> |
|
I can't open my fine-tuning datasets for my private issue <br /> |
|
๋ฐ์ดํฐ์
์ Korea aihub์์ ๋ฐ์์ ์ฌ์ฉํ์์ผ๋ฉฐ, ํ์ธํ๋์ ์ฌ์ฉ๋ ๋ชจ๋ ๋ฐ์ดํฐ๋ฅผ ์ฌ์ ์ ๊ณต๊ฐํด๋๋ฆด ์๋ ์์ต๋๋ค. <br /> |
|
|
|
- Korea aihub data is ONLY permit to Korean!!!!!!! <br /> |
|
aihub์์ ๋ฐ์ดํฐ๋ฅผ ๋ฐ์ผ์ค ๋ถ์ ํ๊ตญ์ธ์ผ ๊ฒ์ด๋ฏ๋ก, ํ๊ธ๋ก๋ง ์์ฑํฉ๋๋ค. <br /> |
|
์ ํํ๋ ์์ฑ์ ์ฌ๋ฅผ ์ฒ ์์ ์ฌ๋ก ๋ฒ์ญํ๋ ํํ๋ก ํ์ต๋ ๋ชจ๋ธ์
๋๋ค. (ETRI ์ ์ฌ๊ธฐ์ค) <br /> |
|
|
|
- In case, ten million, some people use 10 million or some people use 10000000, so this model is crucial for training datasets <br /> |
|
์ฒ๋ง์ 1000๋ง ํน์ 10000000์ผ๋ก ์ธ ์๋ ์๊ธฐ์, Training Datasets์ ๋ฐ๋ผ ๊ฒฐ๊ณผ๋ ์์ดํ ์ ์์ต๋๋ค. <br /> |
|
|
|
- **์๊ดํ์ฌ์ ์ ์์กด๋ช
์ฌ์ ๋์ด์ฐ๊ธฐ์ ๋ฐ๋ผ ๊ฒฐ๊ณผ๊ฐ ํ์ฐํ ๋ฌ๋ผ์ง ์ ์์ต๋๋ค. (์ฐ์ด, ์ฐ ์ด -> ์ฐ์ด, 50์ด)** https://eretz2.tistory.com/34 <br /> |
|
์ผ๋จ์ ๊ธฐ์ค์ ์ก๊ณ ์น์ฐ์น๊ฒ ํ์ต์ํค๊ธฐ์ ์ด๋ป๊ฒ ์ฌ์ฉ๋ ์ง ๋ชฐ๋ผ, ํ์ต ๋ฐ์ดํฐ ๋ถํฌ์ ๋งก๊ธฐ๋๋ก ํ์ต๋๋ค. (์ฐ ์ด์ด ๋ ๋ง์๊น ์ฐ์ด์ด ๋ ๋ง์๊น!?) |
|
- **Developed by:** Yoo SungHyun(https://github.com/YooSungHyun) |
|
- **Language(s):** Korean |
|
- **License:** apache-2.0 |
|
- **Parent Model:** See the [kobart-base-v2](https://huggingface.co/gogamza/kobart-base-v2) for more information about the pre-trained base model. |
|
|
|
|
|
## Uses |
|
Want see more detail follow this URL [KoGPT_num_converter](https://github.com/ddobokki/KoGPT_num_converter) <br /> and see `bart_inference.py` and `bart_train.py` |
|
|
|
## Evaluation |
|
Just using `evaluate-metric/bleu` and `evaluate-metric/rouge` in huggingface `evaluate` library <br /> |
|
[Training wanDB URL](https://wandb.ai/bart_tadev/BartForConditionalGeneration/runs/326xgytt?workspace=user-bart_tadev) |
|
|
|
## How to Get Started With the Model |
|
```python |
|
from transformers.pipelines import Text2TextGenerationPipeline |
|
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM |
|
texts = ["๊ทธ๋ฌ๊ฒ ๋๊ฐ 6์๊น์ง ์ ์ ๋ง์๋?"] |
|
tokenizer = AutoTokenizer.from_pretrained("lIlBrother/ko-barTNumText") |
|
model = AutoModelForSeq2SeqLM.from_pretrained("lIlBrother/ko-barTNumText") |
|
seq2seqlm_pipeline = Text2TextGenerationPipeline(model=model, tokenizer=tokenizer) |
|
kwargs = { |
|
"min_length": 0, |
|
"max_length": 1206, |
|
"num_beams": 100, |
|
"do_sample": False, |
|
"num_beam_groups": 1, |
|
} |
|
pred = seq2seqlm_pipeline(texts, **kwargs) |
|
print(pred) |
|
# ๊ทธ๋ฌ๊ฒ ๋๊ฐ ์ฌ์ฏ ์๊น์ง ์ ์ ๋ง์๋? |
|
``` |
|
|