File size: 2,545 Bytes

ffaf9e2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
432b138
ffaf9e2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ef4be91
ffaf9e2
282ed82
ffaf9e2
ef4be91
 
ffaf9e2
 
 
 
 
 
 
 
b5d9fe8
 
 
ffaf9e2
 
cd3371a
 
ffaf9e2
 
6e3e6ef
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ffaf9e2

---
language: ko
tags:
- intent-classification
datasets:
- kor_3i4k
license: cc-by-nc-4.0
---

## Finetuning
- Pretrain Model : [klue/roberta-small](https://github.com/KLUE-benchmark/KLUE)
- Dataset for fine-tuning : [3i4k](https://github.com/warnikchow/3i4k) 
  - Train : 46,863
  - Validation : 8,271 (15% of Train)
  - Test : 6,121
- Label info 
  - 0: "fragment",
  - 1: "statement",
  - 2: "question",
  - 3: "command",
  - 4: "rhetorical question",
  - 5: "rhetorical command",
  - 6: "intonation-dependent utterance"
- Parameters of Training
```
{
    "epochs": 3 (setting 10 but early stopped),
    "batch_size":32,
    "optimizer_class": "<keras.optimizer_v2.adam.Adam'>",
    "optimizer_params": {
        "lr": 5e-05
    },
    "min_delta": 0.01
}
```

## Usage
``` python
from transformers import RobertaTokenizerFast, RobertaForSequenceClassification, TextClassificationPipeline

# Load fine-tuned model by HuggingFace Model Hub
HUGGINGFACE_MODEL_PATH = "bespin-global/klue-roberta-small-3i4k-intent-classification"
loaded_tokenizer = RobertaTokenizerFast.from_pretrained(HUGGINGFACE_MODEL_PATH )
loaded_model = RobertaForSequenceClassification.from_pretrained(HUGGINGFACE_MODEL_PATH )

# using Pipeline
text_classifier = TextClassificationPipeline(
    tokenizer=loaded_tokenizer, 
    model=loaded_model, 
    return_all_scores=True
)

# predict
text = "your text"

preds_list = text_classifier(text)
best_pred = preds_list[0]
print(f"Label of Best Intentatioin: {best_pred['label']}")
print(f"Score of Best Intentatioin: {best_pred['score']}")
```

## Evaluation
```
                               precision    recall  f1-score   support

                      command       0.89      0.92      0.90      1296
                     fragment       0.98      0.96      0.97       600
intonation-depedent utterance       0.71      0.69      0.70       327
                     question       0.95      0.97      0.96      1786
           rhetorical command       0.87      0.64      0.74       108
          rhetorical question       0.61      0.63      0.62       174
                    statement       0.91      0.89      0.90      1830

                     accuracy                           0.90      6121
                    macro avg       0.85      0.81      0.83      6121
                 weighted avg       0.90      0.90      0.90      6121
```


## Citing & Authors
<!--- Describe where people can find more information -->
[Jaehyeong](https://huggingface.co/jaehyeong) at [Bespin Global](https://www.bespinglobal.com/)