language:
- ko
license: apache-2.0
library_name: transformers
tags:
- text2text-generation
datasets:
- aihub
metrics:
- bleu
- rouge
model-index:
- name: ko-barTNumText
results:
- task:
type: text2text-generation
name: text2text-generation
metrics:
- type: bleu
value: 0.9313276940897475
name: eval_bleu
verified: false
- type: rouge1
value: 0.9607081256861959
name: eval_rouge1
verified: false
- type: rouge2
value: 0.9394649136169404
name: eval_rouge2
verified: false
- type: rougeL
value: 0.9605735834651536
name: eval_rougeL
verified: false
- type: rougeLsum
value: 0.9605993760190767
name: eval_rougeLsum
verified: false
ko-barTNumText(TNT Model๐งจ): Try Number To Korean Reading(์ซ์๋ฅผ ํ๊ธ๋ก ๋ฐ๊พธ๋ ๋ชจ๋ธ)
Table of Contents
Model Details
Model Description: ๋ญ๊ฐ ์ฐพ์๋ด๋ ๋ชจ๋ธ์ด๋ ์๊ณ ๋ฆฌ์ฆ์ด ๋ฑํ ์์ด์ ๋ง๋ค์ด๋ณธ ๋ชจ๋ธ์ ๋๋ค.
BartForConditionalGeneration Fine-Tuning Model For Number To Korean
BartForConditionalGeneration์ผ๋ก ํ์ธํ๋ํ, ์ซ์๋ฅผ ํ๊ธ๋ก ๋ณํํ๋ Task ์ ๋๋ค.Dataset use Korea aihub
I can't open my fine-tuning datasets for my private issue
๋ฐ์ดํฐ์ ์ Korea aihub์์ ๋ฐ์์ ์ฌ์ฉํ์์ผ๋ฉฐ, ํ์ธํ๋์ ์ฌ์ฉ๋ ๋ชจ๋ ๋ฐ์ดํฐ๋ฅผ ์ฌ์ ์ ๊ณต๊ฐํด๋๋ฆด ์๋ ์์ต๋๋ค.Korea aihub data is ONLY permit to Korean!!!!!!!
aihub์์ ๋ฐ์ดํฐ๋ฅผ ๋ฐ์ผ์ค ๋ถ์ ํ๊ตญ์ธ์ผ ๊ฒ์ด๋ฏ๋ก, ํ๊ธ๋ก๋ง ์์ฑํฉ๋๋ค.
์ ํํ๋ ์์ฑ์ ์ฌ๋ฅผ ์ฒ ์์ ์ฌ๋ก ๋ฒ์ญํ๋ ํํ๋ก ํ์ต๋ ๋ชจ๋ธ์ ๋๋ค. (ETRI ์ ์ฌ๊ธฐ์ค)In case, ten million, some people use 10 million or some people use 10000000, so this model is crucial for training datasets
์ฒ๋ง์ 1000๋ง ํน์ 10000000์ผ๋ก ์ธ ์๋ ์๊ธฐ์, Training Datasets์ ๋ฐ๋ผ ๊ฒฐ๊ณผ๋ ์์ดํ ์ ์์ต๋๋ค.์๊ดํ์ฌ์ ์ ์์กด๋ช ์ฌ์ ๋์ด์ฐ๊ธฐ์ ๋ฐ๋ผ ๊ฒฐ๊ณผ๊ฐ ํ์ฐํ ๋ฌ๋ผ์ง ์ ์์ต๋๋ค. (์ฐ์ด, ์ฐ ์ด -> ์ฐ์ด, 50์ด) https://eretz2.tistory.com/34
์ผ๋จ์ ๊ธฐ์ค์ ์ก๊ณ ์น์ฐ์น๊ฒ ํ์ต์ํค๊ธฐ์ ์ด๋ป๊ฒ ์ฌ์ฉ๋ ์ง ๋ชฐ๋ผ, ํ์ต ๋ฐ์ดํฐ ๋ถํฌ์ ๋งก๊ธฐ๋๋ก ํ์ต๋๋ค. (์ฐ ์ด์ด ๋ ๋ง์๊น ์ฐ์ด์ด ๋ ๋ง์๊น!?)Developed by: Yoo SungHyun(https://github.com/YooSungHyun)
Language(s): Korean
License: apache-2.0
Parent Model: See the kobart-base-v2 for more information about the pre-trained base model.
Uses
Want see more detail follow this URL KoGPT_num_converter
and see bart_inference.py
and bart_train.py
Evaluation
Just using evaluate-metric/bleu
and evaluate-metric/rouge
in huggingface evaluate
library
Training wanDB URL
How to Get Started With the Model
from transformers.pipelines import Text2TextGenerationPipeline
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
texts = ["๊ทธ๋ฌ๊ฒ ๋๊ฐ 6์๊น์ง ์ ์ ๋ง์๋?"]
tokenizer = AutoTokenizer.from_pretrained("lIlBrother/ko-barTNumText")
model = AutoModelForSeq2SeqLM.from_pretrained("lIlBrother/ko-barTNumText")
seq2seqlm_pipeline = Text2TextGenerationPipeline(model=model, tokenizer=tokenizer)
kwargs = {
"min_length": 0,
"max_length": 1206,
"num_beams": 100,
"do_sample": False,
"num_beam_groups": 1,
}
pred = seq2seqlm_pipeline(texts, **kwargs)
print(pred)
# ๊ทธ๋ฌ๊ฒ ๋๊ฐ ์ฌ์ฏ ์๊น์ง ์ ์ ๋ง์๋?