--- language: ko tags: - intent-classification datasets: - kor_3i4k license: cc-by-nc-4.0 --- ## Finetuning - Pretrain Model : [klue/roberta-small](https://github.com/KLUE-benchmark/KLUE) - Dataset for fine-tuning : [3i4k](https://github.com/warnikchow/3i4k) - Train : 46,863 - Validation : 8,271 (15% of Train) - Test : 6,121 - Label info - 0: "fragment", - 1: "statement", - 2: "question", - 3: "command", - 4: "rhetorical question", - 5: "rhetorical command", - 6: "intonation-dependent utterance" - Parameters of Training ``` { "epochs": 3 (setting 10 but early stopped), "batch_size":32, "optimizer_class": "", "optimizer_params": { "lr": 5e-05 }, "min_delta": 0.01 } ``` ## Usage ``` python from transformers import RobertaTokenizerFast, TFRobertaForSequenceClassification, TextClassificationPipeline # Load fine-tuned MRC model by HuggingFace Model Hub HUGGINGFACE_MODEL_PATH = "bespin-global/klue-roberta-small-3i4k-intent-classification" loaded_tokenizer = RobertaTokenizerFast.from_pretrained(MODEL_SAVE_PATH) loaded_model = TFRobertaForSequenceClassification.from_pretrained(MODEL_SAVE_PATH) # using Pipeline text_classifier = TextClassificationPipeline( tokenizer=loaded_tokenizer, model=loaded_model, framework='tf', return_all_scores=True ) # predict text = "your text" preds_list = text_classifier(text) best_pred = preds_list[0] print(f"Label of Best Intentatioin: {preds_list['label']}") print(f"Score of Best Intentatioin: {preds_list['score']}") ``` ## Evaluation ``` precision recall f1-score support command 0.89 0.92 0.90 1296 fragment 0.98 0.96 0.97 600 intonation-depedent utterance 0.71 0.69 0.70 327 question 0.95 0.97 0.96 1786 rhetorical command 0.87 0.64 0.74 108 rhetorical question 0.61 0.63 0.62 174 statement 0.91 0.89 0.90 1830 accuracy 0.90 6121 macro avg 0.85 0.81 0.83 6121 weighted avg 0.90 0.90 0.90 6121 ``` ## Citing & Authors [Jaehyeong](https://huggingface.co/jaehyeong) at [Bespin Global](https://www.bespinglobal.com/)