Hungarian Named Entity Recognition Model with huBERT

For further models, scripts and details, see our demo site.

Pretrained model used: SZTAKI-HLT/hubert-base-cc
Finetuned on NYTK-NerKor
NE categories are: PER, LOC, MISC, ORG

Limitations

max_seq_length = 128

Results

F-score: 90.18%

Usage with pipeline

from transformers import pipeline

ner = pipeline(task="ner", model="NYTK/named-entity-recognition-nerkor-hubert-hungarian")
input_text = "A Kovácsné Nagy Erzsébet nagyon jól érzi magát a Nokiánál, azonban a Németországból érkezett Kovács Péter nehezen boldogul a beilleszkedéssel."

print(ner(input_text, aggregation_strategy="simple"))

Citation

If you use this model, please cite the following paper:

@inproceedings {yang-language-models,
    title = {Training language models with low resources: RoBERTa, BART and ELECTRA experimental models for Hungarian},
    booktitle = {Proceedings of 12th IEEE International Conference on Cognitive Infocommunications (CogInfoCom 2021)},
    year = {2021},
    publisher = {IEEE},
    address = {Online},
    author = {Yang, Zijian Győző and Váradi, Tamás},
    pages = {279--285}
}