sjbaek's picture
Update README.md
1ed4798 verified
|
raw
history blame
4.26 kB
---
library_name: transformers
license: mit
language:
- ko
base_model:
- google/gemma-2-2b-it
pipeline_tag: text-generation
---
# Model Card for Model ID
Gemma2 2b ํ•œ๊ตญ์–ด ๋ฐฉ์–ธ ํ†ต์—ญ๊ธฐ v0.2.0
## Model Description
Gemma2 2b ํ•œ๊ตญ์–ด ๋ฐฉ์–ธ ํ†ต์—ญ๊ธฐ๋Š” ํ•œ๊ตญ์–ด ์‚ฌํˆฌ๋ฆฌ๋ฅผ ํ‘œ์ค€์–ด๋กœ ๋ฒˆ์—ญํ•˜๊ฑฐ๋‚˜ ํ‘œ์ค€์–ด๋ฅผ ํ•œ๊ตญ์–ด ์‚ฌํˆฌ๋ฆฌ๋กœ ๋ณ€ํ™˜ํ•˜๋Š” ํ”„๋กœ์ ํŠธ์˜ ์ผํ™˜์œผ๋กœ ๊ฐœ๋ฐœ๋œ ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค.
ํ•ด๋‹น ๋ชจ๋ธ์€ Gemma2 2b it ๋ชจ๋ธ์„ QLoRa ๊ธฐ๋ฒ•์œผ๋กœ ํŒŒ์ธํŠœ๋‹ํ•˜์—ฌ ์ œ์ž‘ํ•˜์˜€์Šต๋‹ˆ๋‹ค.
## Uses
์ด ๋ชจ๋ธ์€ ํ•œ๊ตญ์–ด ๋ฐฉ์–ธ์„ ํ‘œ์ค€ ํ•œ๊ตญ์–ด๋กœ ๋ฒˆ์—ญํ•˜๊ฑฐ๋‚˜ ๊ทธ ๋ฐ˜๋Œ€๋กœ ๋ฒˆ์—ญํ•˜๋Š” ๋ฐ ์ง์ ‘ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์Œ์„ฑ ์ธ์‹ ๋ฐ ๋ฒˆ์—ญ ๋„๊ตฌ๋ฅผ ๊ฐœ๋ฐœํ•˜๋Š” ๊ต์œก์ž, ์–ธ์–ดํ•™์ž, ๊ธฐ์ˆ  ๊ฐœ๋ฐœ์ž์—๊ฒŒ ์œ ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
### ์˜ˆ์‹œ
| ์ž…๋ ฅ ๋ฌธ์žฅ | ๋ฐฉ์–ธ: ๊ฒŒ๋‚œ ์ € ์–ด๋จธ๋‹ˆ ๋” ๋‚˜์ด ๋จน์–ด๊ฐ€๊ธฐ ์ „์— ์—ฌ๊ธฐ ์™€์•ผ ๋  ๊ฑด๋”” | ํ‘œ์ค€์–ด: ๊ทธ๋Ÿฌ๋‹ˆ๊น ์ € ์–ด๋จธ๋‹ˆ ๋” ๋‚˜์ด ๋จน์–ด๊ฐ€๊ธฐ ์ „์— ์—ฌ๊ธฐ ์™€์•ผ ๋  ๊ฑด๋ฐ |
|:--------------------------|:----------------------------------------------------------:|:------------------------------------------:|
| ๋ฐฉ์–ธ ํ†ต์—ญ๊ธฐ ์ถœ๋ ฅ | ํ‘œ์ค€์–ด: ๊ทธ๋Ÿฌ๋‹ˆ๊นŒ ์ € ์–ด๋จธ๋‹ˆ ๋” ๋‚˜์ด ๋จน์–ด๊ฐ€๊ธฐ ์ „์— ์—ฌ๊ธฐ ์™€์•ผ ๋  ๊ฑด๋ฐ | ๋ฐฉ์–ธ: ๊ฒŒ๋‚œ ์ € ์–ด๋ฉ ๋” ๋‚˜์ด ๋จน์–ด๊ฐ€๊ธฐ ์ „์— ์—ฌ๊ธฐ ์™€์•ผ ๋  ๊ฑด๋”” |
| ์ž…๋ ฅ ๋ฌธ์žฅ | ๋ฐฉ์–ธ: ์ž์ด ํด์— ๋…์†” ๋ง‰ ๋‚œ ๊ฑฐ ๋ณด๋‚œ ์–ธ ์ƒ์ด์šฐ๋‹ค | ํ‘œ์ค€์–ด: ์žฌ ํŒ”์— ๋‹ญ์‚ด์ด ๋ง‰ ๋‚œ ๊ฑฐ ๋ณด๋‹ˆ, ์ถ”์šด ๋ชจ์–‘์ด๋‹ค |
|:--------------------------|:----------------------------------------------------------:|:-------------------------------------:|
| ๋ฐฉ์–ธ ํ†ต์—ญ๊ธฐ ์ถœ๋ ฅ | ํ‘œ์ค€์–ด: ์Ÿค ํŒ”์— ๋‹ญ์‚ด์ด ๋งŽ์ด ๋‚œ ๊ฑฐ ๋ณด๋‹ˆ๊นŒ ์ถ”์šด ๋ชจ์–‘์ž…๋‹ˆ๋‹ค | ๋ฐฉ์–ธ: ์žฌ ํด์— ๋…์†” ๋ง‰ ๋‚œ ๊ฑฐ ๋ณด๋‚œ ์–ธ ์ƒ์ด์šฐ๋‹ค |
## Bias, Risks, and Limitations
์ด ๋ชจ๋ธ์€ ํ˜„์žฌ์ œ์ฃผ ๋ฐฉ์–ธ์— ์ดˆ์ ์„ ๋งž์ถ˜ ํŠน์ • ๋ฐ์ดํ„ฐ ์„ธํŠธ์— ๋งž์ถฐ ๋ฏธ์„ธ ์กฐ์ •๋˜์—ˆ๊ธฐ ๋•Œ๋ฌธ์— ๋‹ค๋ฅธ ๋ฐฉ์–ธ์ด๋‚˜ ์–ธ์–ด์— ๋Œ€ํ•œ ์„ฑ๋Šฅ์ด ์ œํ•œ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
## How to Get Started with the Model
```
import transformers
import torch
model_id = "sjbaek/gemma2-2b-it-korean-dialect"
tokenizer = transformers.AutoTokenizer.from_pretrained(model_id, add_eos_token=True)
pipeline = transformers.pipeline(
"text-generation",
model=model_id,
tokenizer=tokenizer,
torch_dtype=torch.float16,
device_map="auto",
max_new_tokens = 512,
)
def dialect_to_standard(text, dialect_type):
return [
{
"role":"user",
"content": "Convert the following sentence or word which is {}'s dialect to standard Korean:\n\n{}".format(dialect_type, text)
}
]
def standard_to_dialect(text, dialect_type):
return [
{
"role":"user",
"content": "Convert the following sentence or word which is standard Korean to {}'s dialect :\n\n{}".format(dialect_type, text)
}
]
outputs = pipeline(
dialect_to_standard("์šฐ๋ฆฌ ๋™์ƒ๋„ ์š”๋ฒˆ์— ์›”์š”์ผ๋‚  ๋ฏธ๊นก ํƒ€์นด๋ถ€๋Œ„ ๋‚ด๋ ค์™”๋‹น ๋ชป ํƒ€๋‚œ", "์ œ์ฃผ๋„"),
do_sample=True,
temperature=0.1,
top_p=0.90,
add_special_tokens=True
)
print(outputs[0]["generated_text"][-1])
# {'role': 'assistant', 'content': '์šฐ๋ฆฌ ๋™์ƒ๋„ ์š”๋ฒˆ์— ์›”์š”์ผ๋‚  ๊ทค ํƒ€๊ณ  ์™”๋‹ค๊ฐ€ ๋ชป ํƒ€๋‹ˆ๊นŒ'}
outputs = pipeline(
standard_to_dialect("๊ทธ๋Ÿฌ๋‹ˆ๊น ์ € ์–ด๋จธ๋‹ˆ ๋” ๋‚˜์ด ๋จน์–ด๊ฐ€๊ธฐ ์ „์— ์—ฌ๊ธฐ ์™€์•ผ ๋  ๊ฑด๋ฐ", "์ œ์ฃผ๋„"),
do_sample=True,
temperature=0.1,
top_p=0.90,
add_special_tokens=True
)
print(outputs[0]["generated_text"][-1])
# {'role': 'assistant', 'content': '๊ทธ๋Ÿฌ๋‹ˆ๊น ์ € ์–ด๋จธ๋‹ˆ ๋” ๋‚˜์ด ๋จน์–ด๊ฐ€๊ธฐ ์ „์— ์—ฌ๊ธฐ ์™€์•ผ ๋  ๊ฑด๋ฐ'}
```
### Training Data
[AI_HUB ์ค‘ยท๋…ธ๋…„์ธต ํ•œ๊ตญ์–ด ๋ฐฉ์–ธ ๋ฐ์ดํ„ฐ (์ถฉ์ฒญ๋„, ์ „๋ผ๋„, ์ œ์ฃผ๋„)](https://aihub.or.kr/aihubdata/data/view.do?currMenu=115&topMenu=100&aihubDataSe=data&dataSetSn=71558)
## TODO
- ์ถฉ์ฒญ๋„ ๋ฐฉ์–ธ ๋ณ€ํ™˜ ๊ธฐ๋Šฅ (v0.3.0)
- ์ „๋ผ๋„ ๋ฐฉ์–ธ ๋ณ€ํ™˜ ๊ธฐ๋Šฅ (v0.4.0)
- ๊ฒฝ์ƒ๋„ ๋ฐฉ์–ธ ๋ณ€ํ™˜ ๊ธฐ๋Šฅ (v0.5.0)
- ๊ฐ•์›๋„ ๋ฐฉ์–ธ ๋ณ€ํ™˜ ๊ธฐ๋Šฅ (v1.0.0)