Model Name : SungBeom/whisper-small-ko
Description
- ํ์ธํ๋ ๋ฐ์ดํฐ์ : maxseats/aihub-464-preprocessed-680GB-set-0
- AI hub์ ์ฃผ์ ์์ญ๋ณ ํ์ ์์ฑ ๋ฐ์ดํฐ์ 680GB ์ค ์ฒซ๋ฒ์งธ ๋ฐ์ดํฐ(10GB)๋ฅผ ํ์ธํ๋ํ ๋ชจ๋ธ์ ๋๋ค.
- ๋ฐ์ดํฐ์ ๋งํฌ : https://huggingface.co/datasets/maxseats/aihub-464-preprocessed-680GB-set-0
ํ๋ผ๋ฏธํฐ
model_name = "SungBeom/whisper-small-ko" # ๋์ : "SungBeom/whisper-small-ko"
dataset_name = "maxseats/aihub-464-preprocessed-680GB-set-0" # ๋ถ๋ฌ์ฌ ๋ฐ์ดํฐ์
(ํ๊น
ํ์ด์ค ๊ธฐ์ค)
CACHE_DIR = '/mnt/a/maxseats/.finetuning_cache' # ์บ์ ๋๋ ํ ๋ฆฌ ์ง์
is_test = False # True: ์๋์ ์ํ ๋ฐ์ดํฐ๋ก ํ
์คํธ, False: ์ค์ ํ์ธํ๋
token = "hf_" # ํ๊น
ํ์ด์ค ํ ํฐ ์
๋ ฅ
training_args = Seq2SeqTrainingArguments(
output_dir=model_dir, # ์ํ๋ ๋ฆฌํฌ์งํ ๋ฆฌ ์ด๋ฆ์ ์
๋ ฅํ๋ค.
per_device_train_batch_size=16,
gradient_accumulation_steps=2, # ๋ฐฐ์น ํฌ๊ธฐ๊ฐ 2๋ฐฐ ๊ฐ์ํ ๋๋ง๋ค 2๋ฐฐ์ฉ ์ฆ๊ฐ
learning_rate=1e-5,
warmup_steps=1000,
# max_steps=2, # epoch ๋์ ์ค์
num_train_epochs=1, # epoch ์ ์ค์ / max_steps์ ์ด๊ฒ ์ค ํ๋๋ง ์ค์
gradient_checkpointing=True,
fp16=True,
evaluation_strategy="steps",
per_device_eval_batch_size=16,
predict_with_generate=True,
generation_max_length=225,
save_steps=1000,
eval_steps=1000,
logging_steps=25,
report_to=["tensorboard"],
load_best_model_at_end=True,
metric_for_best_model="cer", # ํ๊ตญ์ด์ ๊ฒฝ์ฐ 'wer'๋ณด๋ค๋ 'cer'์ด ๋ ์ ํฉํ ๊ฒ
greater_is_better=False,
push_to_hub=True,
save_total_limit=5, # ์ต๋ ์ ์ฅํ ๋ชจ๋ธ ์ ์ง์
)
- Downloads last month
- 2
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.