Edit model card
YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Quantization made by Richard Erkhov.

Github

Discord

Request more models

ko-gemma-2-9b-it - GGUF

Original model description:

license: gemma library_name: transformers pipeline_tag: text-generation extra_gated_heading: Access Gemma on Hugging Face extra_gated_prompt: >- To access Gemma on Hugging Face, youโ€™re required to review and agree to Googleโ€™s usage license. To do this, please ensure youโ€™re logged in to Hugging Face and click below. Requests are processed immediately. extra_gated_button_content: Acknowledge license tags: - conversational base_model: - google/gemma-2-9b language: - ko

Model Details

Ko-Gemma-2-9B-IT

Ko-Gemma-2-9B-IT is a Korean-language conversational model that is part of the Gemma family of models. It is a text-to-text, decoder-only large language model, available in Korean. We fine-tuned this model on a carefully curated high-quality dataset using Supervised Fine-Tuning (SFT). And we use Direct Preference Optimization training specifically for Human Feedback. The datasets include:

Some of these datasets were partially used and translated for training. In particular, a lot of repetition occurred during the translation process, so preprocessing was performed based on N-gram.

Inputs and outputs

  • Input: Text string, such as a question, a prompt, or a document to be summarized.
  • Output: Generated Korean-language text in response to the input, such as an answer to a question, or a summary of a document.

Google Gemma 2

Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models. They are text-to-text, decoder-only large language models, available in English, with open weights for both pre-trained variants and instruction-tuned variants. Gemma models are well-suited for a variety of text generation tasks, including question answering, summarization, and reasoning. Their relatively small size makes it possible to deploy them in environments with limited resources such as a laptop, desktop or your own cloud infrastructure, democratizing access to state of the art AI models and helping foster innovation for everyone.

Benchmark Scores

We evaluated it internally using LogicKor code. While the public LogicKor code is assessed as GPT-4, our internal evaluation was conducted as GPT-4o. Public scores will be added as they are released. The scores below include only 0-shot evaluations.

Model Math Reasoning Writing Coding Understanding Grammar Single ALL Multi ALL Overall
rtzr/ko-gemma-2-9b-it 8.71 / 8.00 9.14 / 8.00 9.43 / 9.29 9.00 / 9.43 9.57 / 9.86 7.14 / 5.00 8.83 8.26 8.55
google/gemma-2-9b-it 8.57 / 7.71 8.86 / 7.00 9.29 / 9.29 9.29 / 9.57 8.57 / 8.29 6.86 / 3.86 8.57 7.62 8.10
MLP-KTLim/llama-3-Korean-Bllossom-8B 6.43 / 5.71 6.86 / 5.14 9.14 / 8.57 8.29 / 8.14 8.43 / 9.29 5.71 / 5.29 7.48 7.02 7.25
yanolja/EEVE-Korean-Instruct-10.8B-v1.0 5.57 / 4.29 8.14 / 5.14 8.29 / 6.29 6.43 / 7.86 9.29 / 8.57 6.57 / 3.71 7.38 5.98 6.68
allganize/Llama-3-Alpha-Ko-8B-Instruct 4.57 / 3.00 6.86 / 6.43 7.43 / 6.71 8.43 / 8.43 7.71 / 8.71 6.71 / 4.43 6.95 6.29 6.62

Usage

Install Dependencies

You must install transformers >= 4.42.3 for gemma2 models.

pip install transformers==4.42.3 accelerate

Python code with Pipeline

import transformers
import torch


model_id = "rtzr/ko-gemma-2-9b-it"

pipeline = transformers.pipeline(
    "text-generation",
    model=model_id,
    model_kwargs={"torch_dtype": torch.bfloat16},
    device_map="auto",
)

pipeline.model.eval()
instruction = "์„œ์šธ์˜ ์œ ๋ช…ํ•œ ๊ด€๊ด‘ ์ฝ”์Šค๋ฅผ ๋งŒ๋“ค์–ด์ค„๋ž˜?"

messages = [
    {"role": "user", "content": f"{instruction}"}
]

prompt = pipeline.tokenizer.apply_chat_template(
    messages, 
    tokenize=False, 
    add_generation_prompt=True
)

terminators = [
    pipeline.tokenizer.eos_token_id,
    pipeline.tokenizer.convert_tokens_to_ids("<end_of_turn>")
]

outputs = pipeline(
    prompt,
    max_new_tokens=2048,
    eos_token_id=terminators,
    do_sample=True,
    temperature=0.6,
    top_p=0.9,
)

print(outputs[0]["generated_text"][len(prompt):])
์„œ์šธ์€ ์—ญ์‚ฌ, ๋ฌธํ™”, ํ˜„๋Œ€์„ฑ์ด ์กฐํ™”๋ฅผ ์ด๋ฃฌ ๋งค๋ ฅ์ ์ธ ๋„์‹œ์ž…๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์„œ ์ฆ๊ธธ ์ˆ˜ ์žˆ๋Š” ๋‹ค์–‘ํ•œ ๊ด€๊ด‘์ง€์™€ ๋ช…์†Œ๋ฅผ ์†Œ๊ฐœํ•ฉ๋‹ˆ๋‹ค. ๋‹ค์Œ์€ ์„œ์šธ์˜ ์œ ๋ช…ํ•œ ๊ด€๊ด‘ ์ฝ”์Šค 3๊ฐ€์ง€์ž…๋‹ˆ๋‹ค.

**1. ์—ญ์‚ฌ์™€ ๋ฌธํ™”๋ฅผ ๋‘˜๋Ÿฌ์‹ผ ํ•œ๊ตญ๊ด€๊ด‘์ฝ”์Šค**

1. **๊ฒฝ๋ณต๊ถ**: ์กฐ์„  ์‹œ๋Œ€์˜ ์›…์žฅํ•œ ์™•๊ถ์„ ๋งŒ๋ฝํ•  ์ˆ˜ ์žˆ๋Š” ๊ณณ์ž…๋‹ˆ๋‹ค. ํŠนํžˆ ๋งค๋…„ ๋ด„์— ์—ด๋ฆฌ๋Š” '์ถ˜์ถ”์—ฐํšŒ'๋Š” ๊ฒฝ๋ณต๊ถ์˜ ์•„๋ฆ„๋‹ค์›€์„ ๋”์šฑ ๋‹๋ณด์ด๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค.
2. **๋ถ์ดŒ ํ•œ์˜ฅ๋งˆ์„**: ๊ณ ํ’์Šค๋Ÿฌ์šด ํ•œ์˜ฅ์ด ๋ชจ์—ฌ์žˆ๋Š” ๊ณณ์œผ๋กœ, ์ „ํ†ต ๋ฌธํ™” ์ฒดํ—˜์ด ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค. '๋ถ์ดŒ ํ•œ์˜ฅ๋งˆ์„ ๋ฌธํ™”์ฒดํ—˜๊ด€'์—์„œ๋Š” ํ•œ๋ณต ์ฒดํ—˜๋ถ€ํ„ฐ ์ข…์ด๋งŒํ™”, ํ•œ๊ธ€ ์“ฐ๊ธฐ ๋“ฑ ๋‹ค์–‘ํ•œ ํ”„๋กœ๊ทธ๋žจ์ด ์ค€๋น„๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.
3. **์ธ์‚ฌ๋™**: ์„œ์ , ๋ฏธ์ˆ ๊ด€, ํ•œ์‹๋‹น์ด ๋งŽ์€ ๊ณณ์ž…๋‹ˆ๋‹ค. ํŠนํžˆ '์ธ์‚ฌ๋™ ๋ฌธํ™”๊ด€'์—์„œ๋Š” ์„œ์šธ์˜ ์—ญ์‚ฌ์™€ ๋ฌธํ™”๋ฅผ ์ดํ•ดํ•˜๋Š” ๋ฐ ๋„์›€์ด ๋˜๋Š” ์ „์‹œ๋ฅผ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
4. **๊ด‘ํ™”๋ฌธ** ๋ฐ **๋ช…๋™**: ํ˜„๋Œ€์ ์ธ ์‡ผํ•‘๊ณผ ๋ ˆ์Šคํ† ๋ž‘์ด ์ฆ๋น„ํ•œ ๊ณณ์ž…๋‹ˆ๋‹ค. ๊ด‘ํ™”๋ฌธ์€ ํŠนํžˆ ์ Š์€์ด๋“ค์ด ๋งŽ์€ ๊ณณ์œผ๋กœ, ์ŠคํŠธ๋ฆฌํŠธ ํŒจ์…˜์„ ๊ด€์ฐฐํ•˜๊ฑฐ๋‚˜ ๋ฐค๊ฑฐ๋ฆฌ์—์„œ ํ™œ๊ธฐ๋ฅผ ๋Š๋‚„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

**2. ๋„์‹œ์˜ ๋ชจ์Šต์„ ๋ฐ”๋ผ๋ณด๋Š” ๋ทฐํˆฌ์–ด ์ฝ”์Šค**

1. **๋‚จ์‚ฐํƒ€์›Œ**: ์„œ์šธ์˜ ์ƒ์ง•์ ์ธ ๊ฑด๋ฌผ๋กœ, ๊ผญ๋Œ€๊ธฐ์—์„œ ํŽผ์ณ์ง€๋Š” 360๋„์˜ ๊ฒฝ์น˜๊ฐ€ ์••๋‹ˆ๋‹ค. ํŠนํžˆ ๋ฐค์ด ๋˜๋ฉด ์กฐ๋ช…์ด ์–ด์šฐ๋Ÿฌ์ ธ ๋”์šฑ ์•„๋ฆ„๋‹ค์›Œ์ง‘๋‹ˆ๋‹ค.
2. **์„œ์šธํƒ€์›Œ**: ๋‚จ์‚ฐํƒ€์›Œ์™€ ๋น„์Šทํ•œ ์œ„์น˜๋กœ, ๋†’์ด๊ฐ€ ๋” ๋†’๊ธฐ ๋•Œ๋ฌธ์— ๋” ๋„“์€ ์ „๋ง์„ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์„œ์šธํƒ€์›Œ ๋‚ด๋ถ€์—๋Š” ๋‹ค์–‘ํ•œ ์ „์‹œ๊ด€๊ณผ ๋ ˆ์Šคํ† ๋ž‘๋„ ์žˆ์Šต๋‹ˆ๋‹ค.
3. **๋ถ์•…์‚ฐ**: ์„œ์šธ์˜ ์ค‘์‹ฌ๋ถ€์— ์œ„์น˜ํ•œ ์‚ฐ์œผ๋กœ, ์„œ์šธ์˜ ๊ฒฝ์น˜๋ฅผ ์กฐ๊ธˆ ๋‹ค๋ฅธ ๊ด€์ ์—์„œ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ํŠนํžˆ ๋ถ์•…์‚ฐ ์ •์ƒ์ธ ๋ถ์•…์‚ฌ์—์„œ๋„ ์ข‹์€ ์ „๋ง์„ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
4. **์„œ์šธ์ˆฒ**: ๋…น์ง€ ๊ณต๊ฐ„์œผ๋กœ, ๋„์‹œ์˜ ํ˜ผ์žกํ•จ์—์„œ ๋ฒ—์–ด๋‚  ์ˆ˜ ์žˆ๋Š” ๊ณณ์ž…๋‹ˆ๋‹ค. ๋˜ํ•œ, ์„œ์šธ์ˆฒ ๋‚ด๋ถ€์—๋Š” '์„œ์šธ์ˆฒ ์•„ํŠธํ”„๋ ˆ์  ํŠธ'๋ผ๋Š” ๊ณต๊ฐ„์ด ์žˆ์–ด ์˜ˆ์ˆ ๊ณผ ์ž์—ฐ์„ ํ•จ๊ป˜ ์ฒดํ—˜ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

**3. ํ˜„๋Œ€ ๋ฌธํ™”๋ฅผ ๋งŒ๋‚˜๋Š” ์ฝ”์Šค**

1. **์‚ผ์„ฑ๋™**: ํ˜„๋Œ€ ๋ฏธ์ˆ ๊ด€์ด ๋งŽ์€ ๊ณณ์œผ๋กœ, '์‚ผ์„ฑ ๋ฏธ์ˆ ๊ด€', '์•„๋ชจ๋ฆฌ์นด๋‚˜์Šค ๊ฐค๋Ÿฌ๋ฆฌ' ๋“ฑ์ด ์žˆ์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ, '์ฝ”์—‘์Šค'๋‚˜ '์•„ํฌ์นด๋กœํฌ์Šค' ๋“ฑ์˜ ๋ช…์†Œ๋„ ๊ฐ€๊นŒ์šด ๊ณณ์— ์žˆ์Šต๋‹ˆ๋‹ค.
2. **์ดํƒœ์›**: ์™ธ๊ตญ์ธ๋“ค์ด ๋งŽ์€ ๊ณณ์œผ๋กœ, ๋‹ค์–‘ํ•œ ์™ธ๊ตญ ์Œ์‹์„ ์ฆ๊ธธ ์ˆ˜ ์žˆ๋Š” ๊ณณ์ž…๋‹ˆ๋‹ค. ๋˜ํ•œ, '์ดํƒœ์› ๊ธ€๋กœ์ปฌ๋ฌธํ™”์„ผํ„ฐ'์—์„œ๋Š” ์„ธ๊ณ„ ๊ฐ๊ตญ์˜ ๋ฌธํ™” ์ฒดํ—˜์ด ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.
3. **ํ™๋Œ€**: ์ Š์€์ด๋“ค์˜ ๋ฌธํ™”๊ฐ€ ๋„˜์น˜๋Š” ๊ณณ์ž…๋‹ˆ๋‹ค. 'ํ™๋Œ€ ๋กค๋งํ™€'์€ ํŠนํžˆ ๋งŽ์€ ์‚ฌ๋žŒ๋“ค์ด ๋ฐฉ๋ฌธํ•˜๋Š” ๊ณณ์ž…๋‹ˆ๋‹ค. ๋˜ํ•œ, 'ํ™๋Œ€ ์„œ์ ๊ฑฐ๋ฆฌ'์—์„œ๋Š” ๋…์„œ์™€ ๋ฌธํ™”๋ฅผ ๋งŒ๋‚  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
4. **๊ฐ•๋‚จ**: ์„œ์šธ์˜ ํ˜„๋Œ€์  ๋ชจ์Šต์„ ์ž˜ ๋ณด์—ฌ์ฃผ๋Š” ๊ณณ์ž…๋‹ˆ๋‹ค. '๊ฐ•๋‚จ์—ญ'์„ ์ค‘์‹ฌ์œผ๋กœ ๋งŽ์€ ๊ณ ๊ธ‰ ์‡ผํ•‘๋ชฐ๊ณผ ๋ ˆ์Šคํ† ๋ž‘์ด ์žˆ์Šต๋‹ˆ๋‹ค.

์ด๋Ÿฌํ•œ ์ฝ”์Šค๋ฅผ ํ†ตํ•ด ์„œ์šธ์˜ ๋‹ค์–‘ํ•œ ๋ชจ์Šต์„ ํ•œ ๋ฒˆ์— ๋งŒ๋‚˜๋ณผ ์ˆ˜ ์žˆ์„ ๊ฑฐ์˜ˆ์š”. ๊ฐ์ž์˜ ์ทจํ–ฅ์— ๋งž์ถฐ ์ฝ”์Šค๋ฅผ ์กฐ์ ˆํ•˜์‹œ๋ฉด ์ข‹๊ฒ ์Šต๋‹ˆ๋‹ค. ์ฆ๊ฑฐ์šด ์—ฌํ–‰ ๋˜์„ธ์š”!

Python code with AutoModel

import os
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM


model_id = "rtzr/ko-gemma-2-9b-it"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

model.eval()
instruction = "์„œ์šธ์˜ ์œ ๋ช…ํ•œ ๊ด€๊ด‘ ์ฝ”์Šค๋ฅผ ๋งŒ๋“ค์–ด์ค„๋ž˜?"

messages = [
    {"role": "user", "content": f"{instruction}"}
]

input_ids = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    return_tensors="pt"
).to(model.device)

terminators = [
    tokenizer.eos_token_id,
    tokenizer.convert_tokens_to_ids("<end_of_turn>")
]

outputs = model.generate(
    input_ids,
    max_new_tokens=2048,
    eos_token_id=terminators,
    do_sample=True,
    temperature=0.6,
    top_p=0.9,
)

print(tokenizer.decode(outputs[0][input_ids.shape[-1]:], skip_special_tokens=True))
์„œ์šธ ๊ด€๊ด‘ ์ฝ”์Šค๋ฅผ ์ œ์•ˆํ•ด๋“œ๋ฆด๊ฒŒ์š”. ํ•˜๋ฃจ ์ข…์ผ ์ฆ๊ฒ๊ฒŒ ์—ฌํ–‰ํ•  ์ˆ˜ ์žˆ๋Š” ๋ฃจํŠธ๋กœ ๊ตฌ์„ฑํ–ˆ์Šต๋‹ˆ๋‹ค.                                                    

### 1. ์„œ์šธ์—ญ์‚ฌ๊ด€ ๋ฐ ๋ถ์ดŒํ•œ์˜ฅ๋งˆ์„(์˜ค์ „)

- ์„œ์šธ์—ญ์‚ฌ๊ด€: ์„œ์šธ์˜ ์—ญ์‚ฌ์™€ ๋ฌธํ™”๋ฅผ ์ฒดํ—˜ํ•  ์ˆ˜ ์žˆ๋Š” ๊ณณ์ž…๋‹ˆ๋‹ค. ๋‹ค์–‘ํ•œ ์ „์‹œ๋ฌผ๊ณผ ์ƒ์„ค์ „์‹œ๋ฅผ ํ†ตํ•ด ์„œ์šธ์˜ ๋ณ€ํ™”๋ฅผ ์‚ดํŽด๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
- ๋ถ์ดŒํ•œ์˜ฅ๋งˆ์„: ์„œ์šธ์˜ ํ•œ์˜ฅ์„ ๋ณด์กดํ•˜๊ณ  ๊ด€๋ฆฌํ•˜๋Š” ๊ณณ์ž…๋‹ˆ๋‹ค. ์กฐ์„  ์‹œ๋Œ€์˜ ๋ถ„์œ„๊ธฐ๋ฅผ ๋Š๋‚„ ์ˆ˜ ์žˆ์œผ๋ฉฐ, ํ•œ์˜ฅ์—์„œ ๋ฌธํ™” ์ฝ˜ํ…์ธ ๋„ ์ œ๊ณตํ•˜๋Š” ๊ณณ๋„ ๋งŽ์Šต๋‹ˆ๋‹ค.

### 2. ๋ถ์•…์‚ฐ ์ž…์žฅ๊ณผ ๋ถ์•…์‚ฐ ๋“ฑ์‚ฐ(์˜ค์ „)

- ๋ถ์•…์‚ฐ์€ ์„œ์šธ์˜ ๋ถ์ชฝ์— ์œ„์น˜ํ•œ ์‚ฐ์œผ๋กœ, ์„œ์šธ ํ•œ๋ณตํŒ์—์„œ๋„ ์ž์—ฐ์„ ๋งŒ๋‚  ์ˆ˜ ์žˆ๋Š” ๊ณณ์ž…๋‹ˆ๋‹ค. ๋ถ์•…์‚ฐ ์ž…๊ตฌ์—์„œ ๋“ฑ์‚ฐ์„ ์‹œ์ž‘ํ•˜์—ฌ, ๋ถ์•…์‚ฐ ์ •์ƒ๊นŒ์ง€ ์˜ฌ๋ผ๊ฐ€๋ฉด ์„œ์šธ์˜ ์ „๊ฒฝ์„ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

### 3. ์ข…๋กœ ๋ช…๋™ ์‡ผํ•‘๊ณผ ๋ง›์ง‘ ํˆฌ์–ด(๋‚ฎ)

- ๋ช…๋™: ๋‹ค์–‘ํ•œ ์‡ผํ•‘๋ชฐ๊ณผ ๋งค์žฅ์ด ์žˆ๋Š” ๊ณณ์ž…๋‹ˆ๋‹ค. ๋ช…๋™ ์‡ผํ•‘ํƒ€์šด, ๋ฏธ์Šคํ„ฐํŠธ์œ„์Šคํ„ฐ, ๋ฏธ์Šคํ„ฐ๋ฆฌ๋งˆ์ผ“ ๋“ฑ์„ ๋ฐฉ๋ฌธํ•ด๋ณด์„ธ์š”.
- ๋ง›์ง‘ ํˆฌ์–ด: ๋ช…๋™์—๋Š” ๋‹ค์–‘ํ•œ ์ง€์—ญ ์Œ์‹์„ ๋จน์„ ์ˆ˜ ์žˆ๋Š” ๊ณณ์ด ๋งŽ์Šต๋‹ˆ๋‹ค. ๋–ก๋ณถ์ด, ์ˆœ๋Œ€, ๋‹ญ๊ฐ•์ • ๋“ฑ์„ ๋ง›๋ณผ ์ˆ˜ ์žˆ๋Š” ๊ณณ์„ ์ถ”์ฒœ๋“œ๋ฆฝ๋‹ˆ๋‹ค.

### 4. ์„œ์šธ์‹œ๋ฆฝ๋ฏธ์ˆ ๊ด€๊ณผ ๋•์ˆ˜๊ถ(์˜คํ›„)

- ์„œ์šธ์‹œ๋ฆฝ๋ฏธ์ˆ ๊ด€: ํ˜„๋Œ€๋ฏธ์ˆ ์„ ์ „์‹œํ•˜๋Š” ๊ณณ์ž…๋‹ˆ๋‹ค. ํŠน๋ณ„์ „์ด ์—ด๋ฆฐ๋‹ค๋ฉด ๋ฐฉ๋ฌธํ•ด ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
- ๋•์ˆ˜๊ถ: ์กฐ์„ ์‹œ๋Œ€์˜ ๊ถ๊ถ์ž…๋‹ˆ๋‹ค. ํŠนํžˆ ๋ด„์—๋Š” ๋ฒš๊ฝƒ์ด ์•„๋ฆ„๋‹ต๊ฒŒ ๋งŒ๋ฐœํ•ฉ๋‹ˆ๋‹ค.

### 5. ๋‚จ์‚ฐํƒ€์›Œ์™€ ๋‚จ์‚ฐ๊ณต์› ์‚ฐ์ฑ…(์˜คํ›„)

- ๋‚จ์‚ฐํƒ€์›Œ: ๋‚จ์‚ฐ์— ์žˆ๋Š” ๊ด€๋žŒ๋Œ€์ž…๋‹ˆ๋‹ค. ๋‚จ์‚ฐํƒ€์›Œ์— ์˜ฌ๋ผ๊ฐ€๋ฉด ์„œ์šธ์˜ 360๋„ ์ „๊ฒฝ์„ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
- ๋‚จ์‚ฐ๊ณต์›: ๋‚จ์‚ฐ์— ์žˆ๋Š” ๊ณต์›์ž…๋‹ˆ๋‹ค. ๋‹ค์–‘ํ•œ ํ…Œ๋งˆ ๊ณต์›๊ณผ ์กฐ๊ฒฝ์ด ์ž˜ ๋œ ๊ณณ์ž…๋‹ˆ๋‹ค. ๋‚จ์‚ฐ๊ณต์›์„ ์‚ฐ์ฑ…ํ•˜๋ฉฐ ํœด์‹์„ ์ทจํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

### 6. ๋ช…๋™ ๋˜๋Š” ์ดํƒœ์›์—์„œ์˜ ์ €๋… ์‹์‚ฌ์™€ ๋ฌธํ™” ํ™œ๋™(์ €๋…)

- ๋ช…๋™: ๋‹ค์–‘ํ•œ ์ „ํ†ต์ ์ธ ํ•œ๊ตญ ์Œ์‹์„ ๋จน์„ ์ˆ˜ ์žˆ๋Š” ๊ณณ์ž…๋‹ˆ๋‹ค. ๋˜ํ•œ, ๋ช…๋™์€ ๋ฐค์—๋„ ํ™œ๊ธฐ์ฐจ๊ฒŒ ํ™œ๋ฐœํ•œ ๋ฌธํ™” ์ƒํ™œ์„ ํ•  ์ˆ˜ ์žˆ๋Š” ๊ณณ์ž…๋‹ˆ๋‹ค.
- ์ดํƒœ์›: ์™ธ๊ตญ์ธ ๊ด€๊ด‘๊ฐ๋“ค์ด ๋งŽ์ด ์ฐพ๋Š” ๊ณณ์œผ๋กœ, ๋‹ค์–‘ํ•œ ์„ธ๊ณ„ ์Œ์‹์„ ๋จน์„ ์ˆ˜ ์žˆ์œผ๋ฉฐ, ํด๋Ÿฝ์ด๋‚˜ ๋ฐ”๊ฐ€ ๋งŽ์€ ๋ฌธํ™”์  ํ™œ๋™์ด ๊ฐ€๋Šฅํ•œ ๊ณณ์ž…๋‹ˆ๋‹ค.

์ด ์ฝ”์Šค๋Š” ํ•˜๋ฃจ ์ข…์ผ ํ™œ๋ฐœํ•˜๊ฒŒ ์—ฌํ–‰์„ ํ•  ์ˆ˜ ์žˆ๋„๋ก ๊ณ„ํšํ–ˆ์Šต๋‹ˆ๋‹ค. ๊ฐ ์ง€์—ญ์— ๋”ฐ๋ผ ์ด๋™ ์‹œ๊ฐ„์„ ๊ณ ๋ คํ•˜์‹œ๊ณ , ๊ฐœ์žฅ ์‹œ๊ฐ„๊ณผ ์ „์‹œ ์ผ์ • ๋“ฑ์„ ๋ฏธ๋ฆฌ ํ™•์ธํ•˜์‹œ๋Š” ๊ฒƒ์ด ์ข‹์Šต๋‹ˆ๋‹ค. ์ฆ๊ฑฐ์šด ์—ฌํ–‰ ๋˜์„ธ์š”!

Quantized Versions through bitsandbytes

  • Using 8-bit precision
  • Using 4-bit precision
# pip install bitsandbytes
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig


model_id = "rtzr/ko-gemma-2-9b-it"
quantization_config_8bit = BitsAndBytesConfig(load_in_8bit=True)
# quantization_config_4bit = BitsAndBytesConfig(load_in_4bit=True)

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    quantization_config=quantization_config_8bit,
    # quantization_config=quantization_config_4bit,
    low_cpu_mem_usage=True,
)

model.eval()
instruction = "์„œ์šธ์˜ ์œ ๋ช…ํ•œ ๊ด€๊ด‘ ์ฝ”์Šค๋ฅผ ๋งŒ๋“ค์–ด์ค„๋ž˜?"

messages = [
    {"role": "user", "content": f"{instruction}"}
]

input_ids = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    return_tensors="pt"
).to(model.device)

terminators = [
    tokenizer.eos_token_id,
    tokenizer.convert_tokens_to_ids("<end_of_turn>")
]

outputs = model.generate(
    input_ids,
    max_new_tokens=2048,
    eos_token_id=terminators,
    do_sample=True,
    temperature=0.6,
    top_p=0.9,
)

print(tokenizer.decode(outputs[0][input_ids.shape[-1]:], skip_special_tokens=True))

VLLM Usage

When we use vllm==0.5.1, the gemma2 model cannot be loaded yet and the following issue occurs. So it is recommended to use vllm/vllm-openai:latest docker or vllm==0.5.0.post1.

#!/bin/bash

VLLM_ATTENTION_BACKEND=FLASHINFER
MODEL_NAME="rtzr/ko-gemma-2-9b-it"

MODEL_PATH="YOUR_PATH/${MODEL_NAME}"
docker run --rm --gpus all   \
    -p 8000:8000  \
    --shm-size=12gb --ulimit memlock=-1 --ulimit stack=67108864 \
    -e VLLM_ATTENTION_BACKEND=${VLLM_ATTENTION_BACKEND} \
    -v $MODEL_PATH:/vllm-workspace/${MODEL_NAME} \
    vllm/vllm-openai:latest \
        --model ${MODEL_NAME} --dtype auto \
        --gpu-memory-utilization 0.8

License

Gemma 2 License: https://ai.google.dev/gemma/terms

Citation

@article{RTZR,
  title={ko-gemma-2-9b-it},
  author={Return Zero Team},
  year={2024},
  url={https://huggingface.co/rtzr/ko-gemma-2-9b-it}
}
@article{gemma_2024,
    title={Gemma},
    url={https://www.kaggle.com/m/3301},
    DOI={10.34740/KAGGLE/M/3301},
    publisher={Kaggle},
    author={Gemma Team},
    year={2024}
}
Downloads last month
551
GGUF
Model size
9.24B params
Architecture
gemma2

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

Inference API
Unable to determine this model's library. Check the docs .