--- library_name: transformers license: mit language: - ko base_model: - google/gemma-2-2b-it pipeline_tag: text-generation --- # ๐Ÿ“„ Model Card for Model ID Gemma2 2b ํ•œ๊ตญ์–ด ๋ฐฉ์–ธ ํ†ต์—ญ๊ธฐ v0.2.0 ## ๐Ÿ“ Model Description Gemma2 2b ํ•œ๊ตญ์–ด ๋ฐฉ์–ธ ํ†ต์—ญ๊ธฐ๋Š” ํ•œ๊ตญ์–ด ์‚ฌํˆฌ๋ฆฌ๋ฅผ ํ‘œ์ค€์–ด๋กœ ๋ฒˆ์—ญํ•˜๊ฑฐ๋‚˜ ํ‘œ์ค€์–ด๋ฅผ ํ•œ๊ตญ์–ด ์‚ฌํˆฌ๋ฆฌ๋กœ ๋ณ€ํ™˜ํ•˜๋Š” ํ”„๋กœ์ ํŠธ์˜ ์ผํ™˜์œผ๋กœ ๊ฐœ๋ฐœ๋œ ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค. ํ•ด๋‹น ๋ชจ๋ธ์€ Gemma2 2b it ๋ชจ๋ธ์„ QLoRa ๊ธฐ๋ฒ•์œผ๋กœ ํŒŒ์ธํŠœ๋‹ํ•˜์—ฌ ์ œ์ž‘ํ•˜์˜€์Šต๋‹ˆ๋‹ค. ## ๐Ÿ“š Uses ์ด ๋ชจ๋ธ์€ ํ•œ๊ตญ์–ด ๋ฐฉ์–ธ์„ ํ‘œ์ค€ ํ•œ๊ตญ์–ด๋กœ ๋ฒˆ์—ญํ•˜๊ฑฐ๋‚˜ ๊ทธ ๋ฐ˜๋Œ€๋กœ ๋ฒˆ์—ญํ•˜๋Š” ๋ฐ ์ง์ ‘ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์Œ์„ฑ ์ธ์‹ ๋ฐ ๋ฒˆ์—ญ ๋„๊ตฌ๋ฅผ ๊ฐœ๋ฐœํ•˜๋Š” ๊ต์œก์ž, ์–ธ์–ดํ•™์ž, ๊ธฐ์ˆ  ๊ฐœ๋ฐœ์ž์—๊ฒŒ ์œ ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ### โœ๏ธ ์˜ˆ์‹œ | ์ž…๋ ฅ ๋ฌธ์žฅ | ๋ฐฉ์–ธ: ๊ฒŒ๋‚œ ์ € ์–ด๋จธ๋‹ˆ ๋” ๋‚˜์ด ๋จน์–ด๊ฐ€๊ธฐ ์ „์— ์—ฌ๊ธฐ ์™€์•ผ ๋  ๊ฑด๋”” | ํ‘œ์ค€์–ด: ๊ทธ๋Ÿฌ๋‹ˆ๊น ์ € ์–ด๋จธ๋‹ˆ ๋” ๋‚˜์ด ๋จน์–ด๊ฐ€๊ธฐ ์ „์— ์—ฌ๊ธฐ ์™€์•ผ ๋  ๊ฑด๋ฐ | |:--------------------------|:----------------------------------------------------------:|:------------------------------------------:| | ๋ฐฉ์–ธ ํ†ต์—ญ๊ธฐ ์ถœ๋ ฅ | ํ‘œ์ค€์–ด: ๊ทธ๋Ÿฌ๋‹ˆ๊นŒ ์ € ์–ด๋จธ๋‹ˆ ๋” ๋‚˜์ด ๋จน์–ด๊ฐ€๊ธฐ ์ „์— ์—ฌ๊ธฐ ์™€์•ผ ๋  ๊ฑด๋ฐ | ๋ฐฉ์–ธ: ๊ฒŒ๋‚œ ์ € ์–ด๋ฉ ๋” ๋‚˜์ด ๋จน์–ด๊ฐ€๊ธฐ ์ „์— ์—ฌ๊ธฐ ์™€์•ผ ๋  ๊ฑด๋”” | | ์ž…๋ ฅ ๋ฌธ์žฅ | ๋ฐฉ์–ธ: ์ž์ด ํด์— ๋…์†” ๋ง‰ ๋‚œ ๊ฑฐ ๋ณด๋‚œ ์–ธ ์ƒ์ด์šฐ๋‹ค | ํ‘œ์ค€์–ด: ์žฌ ํŒ”์— ๋‹ญ์‚ด์ด ๋ง‰ ๋‚œ ๊ฑฐ ๋ณด๋‹ˆ, ์ถ”์šด ๋ชจ์–‘์ด๋‹ค | |:--------------------------|:----------------------------------------------------------:|:-------------------------------------:| | ๋ฐฉ์–ธ ํ†ต์—ญ๊ธฐ ์ถœ๋ ฅ | ํ‘œ์ค€์–ด: ์Ÿค ํŒ”์— ๋‹ญ์‚ด์ด ๋งŽ์ด ๋‚œ ๊ฑฐ ๋ณด๋‹ˆ๊นŒ ์ถ”์šด ๋ชจ์–‘์ž…๋‹ˆ๋‹ค | ๋ฐฉ์–ธ: ์žฌ ํด์— ๋…์†” ๋ง‰ ๋‚œ ๊ฑฐ ๋ณด๋‚œ ์–ธ ์ƒ์ด์šฐ๋‹ค | ## โš ๏ธ Bias, Risks, and Limitations | ํ•œ๊ณ„์  - ์ด ๋ชจ๋ธ์€ ํ˜„์žฌ ์ œ์ฃผ ๋ฐฉ์–ธ์— ์ดˆ์ ์„ ๋งž์ถ˜ ํŠน์ • ๋ฐ์ดํ„ฐ ์„ธํŠธ์— ๋งž์ถฐ ๋ฏธ์„ธ ์กฐ์ •๋˜์—ˆ๊ธฐ ๋•Œ๋ฌธ์— ๋‹ค๋ฅธ ๋ฐฉ์–ธ์ด๋‚˜ ์–ธ์–ด์— ๋Œ€ํ•œ ์„ฑ๋Šฅ์ด ์ œํ•œ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. - ํ–ฅํ›„ ๋ฒ„์ „์—์„œ ๋‹ค์–‘ํ•œ ๋ฐฉ์–ธ์— ๋Œ€ํ•œ ์ง€์›์„ ์ถ”๊ฐ€ํ•  ์˜ˆ์ •์ž…๋‹ˆ๋‹ค. ## ๐Ÿš€ How to Get Started with the Model | ์‚ฌ์šฉ๋ฒ• ```python import transformers import torch model_id = "sjbaek/gemma2-2b-it-korean-dialect" tokenizer = transformers.AutoTokenizer.from_pretrained(model_id, add_eos_token=True) pipeline = transformers.pipeline( "text-generation", model=model_id, tokenizer=tokenizer, torch_dtype=torch.float16, device_map="auto", max_new_tokens = 512, ) def dialect_to_standard(text, dialect_type): return [ { "role":"user", "content": "Convert the following sentence or word which is {}'s dialect to standard Korean:\n\n{}".format(dialect_type, text) } ] def standard_to_dialect(text, dialect_type): return [ { "role":"user", "content": "Convert the following sentence or word which is standard Korean to {}'s dialect :\n\n{}".format(dialect_type, text) } ] outputs = pipeline( dialect_to_standard("์šฐ๋ฆฌ ๋™์ƒ๋„ ์š”๋ฒˆ์— ์›”์š”์ผ๋‚  ๋ฏธ๊นก ํƒ€์นด๋ถ€๋Œ„ ๋‚ด๋ ค์™”๋‹น ๋ชป ํƒ€๋‚œ", "์ œ์ฃผ๋„"), do_sample=True, temperature=0.1, top_p=0.90, add_special_tokens=True ) print(outputs[0]["generated_text"][-1]) # {'role': 'assistant', 'content': '์šฐ๋ฆฌ ๋™์ƒ๋„ ์š”๋ฒˆ์— ์›”์š”์ผ๋‚  ๊ทค ํƒ€๊ณ  ์™”๋‹ค๊ฐ€ ๋ชป ํƒ€๋‹ˆ๊นŒ'} outputs = pipeline( standard_to_dialect("๊ทธ๋Ÿฌ๋‹ˆ๊น ์ € ์–ด๋จธ๋‹ˆ ๋” ๋‚˜์ด ๋จน์–ด๊ฐ€๊ธฐ ์ „์— ์—ฌ๊ธฐ ์™€์•ผ ๋  ๊ฑด๋ฐ", "์ œ์ฃผ๋„"), do_sample=True, temperature=0.1, top_p=0.90, add_special_tokens=True ) print(outputs[0]["generated_text"][-1]) # {'role': 'assistant', 'content': '๊ทธ๋Ÿฌ๋‹ˆ๊น ์ € ์–ด๋จธ๋‹ˆ ๋” ๋‚˜์ด ๋จน์–ด๊ฐ€๊ธฐ ์ „์— ์—ฌ๊ธฐ ์™€์•ผ ๋  ๊ฑด๋ฐ'} ``` ## ๐Ÿ“Š Training Data | ์‚ฌ์šฉ ๋ฐ์ดํ„ฐ์…‹ - [AI_HUB ์ค‘ยท๋…ธ๋…„์ธต ํ•œ๊ตญ์–ด ๋ฐฉ์–ธ ๋ฐ์ดํ„ฐ (์ถฉ์ฒญ๋„, ์ „๋ผ๋„, ์ œ์ฃผ๋„)](https://aihub.or.kr/aihubdata/data/view.do?currMenu=115&topMenu=100&aihubDataSe=data&dataSetSn=71558) ## ๐Ÿ”œ TODO - ์ถฉ์ฒญ๋„ ๋ฐฉ์–ธ ๋ณ€ํ™˜ ๊ธฐ๋Šฅ (v0.3.0) - ์ „๋ผ๋„ ๋ฐฉ์–ธ ๋ณ€ํ™˜ ๊ธฐ๋Šฅ (v0.4.0) - ๊ฒฝ์ƒ๋„ ๋ฐฉ์–ธ ๋ณ€ํ™˜ ๊ธฐ๋Šฅ (v0.5.0) - ๊ฐ•์›๋„ ๋ฐฉ์–ธ ๋ณ€ํ™˜ ๊ธฐ๋Šฅ (v1.0.0)