metadata

base_model: klue/roberta-small
tags:
  - generated_from_trainer
  - korean
  - klue
widget:
  - text: >-
      저는 김철수입니다. 집은 서울특별시 강남대로이고 전화번호는 010-1234-5678, 주민등록번호는 123456-1234567입니다.
      메일주소는 [email protected]입니다. 저는 10월 25일에 출국할 예정입니다.
metrics:
  - precision
  - recall
  - f1
  - accuracy
model-index:
  - name: klue_roberta_small_ner_identified
    results: []
language:
  - ko
pipeline_tag: token-classification

klue-roberta-small-ner-identified

This model is a fine-tuned version of vitus9988/klue-roberta-small-ner-identified on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.0082
Precision: 0.9930
Recall: 0.9988
F1: 0.9959
Accuracy: 0.9988

Model description

개인정보 비식별을 위해 아래 항목에 대한 개체명 인식을 제공합니다.

사람이름 [PS]
주소 (구 주소 및 도로명 주소) [AD]
카드번호 [CN]
계좌번호 [BN]
운전면허번호 [DN]
주민등록번호 [RN]
여권번호 [PN]
전화번호 [PH]
이메일 주소 [EM]
날짜 [DT]

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 64
eval_batch_size: 64
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 3

Training results

Training Loss	Epoch	Step	Validation Loss	Precision	Recall	F1	Accuracy
No log	1.0	61	0.0128	0.9871	0.9929	0.9900	0.9979
No log	2.0	122	0.0098	0.9895	0.9976	0.9935	0.9987
No log	3.0	183	0.0082	0.9930	0.9988	0.9959	0.9988

Framework versions

Transformers 4.40.2
Pytorch 2.3.0+cu118
Datasets 2.19.1
Tokenizers 0.19.1

Use

from transformers import AutoTokenizer, AutoModelForTokenClassification
from transformers import pipeline

tokenizer = AutoTokenizer.from_pretrained("vitus9988/klue-roberta-small-ner-identified")
model = AutoModelForTokenClassification.from_pretrained("vitus9988/klue-roberta-small-ner-identified")

nlp = pipeline("ner", model=model, tokenizer=tokenizer, aggregation_strategy="simple")
example = """
저는 김철수입니다. 집은 서울특별시 강남대로이고 전화번호는 010-1234-5678, 주민등록번호는 123456-1234567입니다. 메일주소는 [email protected]입니다. 저는 10월 25일에 출국할 예정입니다.
"""

ner_results = nlp(example)
for i in ner_results:
    print(i)

#{'entity_group': 'PS', 'score': 0.9617835, 'word': '김철수', 'start': 3, 'end': 6}
#{'entity_group': 'AD', 'score': 0.9839702, 'word': '서울특별시 강남대로', 'start': 14, 'end': 24}
#{'entity_group': 'PH', 'score': 0.9906756, 'word': '010 - 1234 - 5678', 'start': 33, 'end': 46}
#{'entity_group': 'RN', 'score': 0.9904553, 'word': '123456 - 1234567', 'start': 56, 'end': 70}
#{'entity_group': 'EM', 'score': 0.99022245, 'word': 'hugging @ face. com', 'start': 81, 'end': 97}
#{'entity_group': 'DT', 'score': 0.985629, 'word': '10월 25일', 'start': 105, 'end': 112}