File size: 2,306 Bytes
b1e85b1 4719f52 b1e85b1 4719f52 b1e85b1 4719f52 b1e85b1 c278f53 b1e85b1 4719f52 b1e85b1 4719f52 507e934 b1e85b1 4719f52 b1e85b1 4719f52 b1e85b1 4719f52 b1e85b1 4719f52 40f1c22 b1e85b1 4719f52 b1e85b1 4719f52 b1e85b1 4719f52 507e934 c278f53 507e934 c278f53 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 |
---
license: cc-by-nc-4.0
language:
- hu
metrics:
- accuracy
- f1
model-index:
- name: Hun_RoBERTa_large_Plain
results:
- task:
type: text-classification
metrics:
- type: accuracy
value: 0.79
- type: f1
value: 0.79
widget:
- text: "A tanúsítvány meghatározott adatainak a 2008/118/EK irányelv IV. fejezete szerinti szállításához szükséges adminisztratív okmányban..."
example_title: "Incomprehensible"
- text: "Az AEO-engedély birtokosainak listáján – keresésre – megjelenő információk: az engedélyes neve, az engedélyt kibocsátó ország..."
example_title: "Comprehensible"
---
## Model description
Cased fine-tuned XLM-RoBERTa-large model for Hungarian, trained on a dataset (~13k sentences) provided by National Tax and Customs Administration - Hungary (NAV): Public Accessibilty Programme.
## Intended uses & limitations
The model is designed to classify sentences as either "comprehensible" or "not comprehensible" (according to Plain Language guidelines):
* **Label_0** - "comprehensible" - The sentence is in Plain Language.
* **Label_1** - "not comprehensible" - The sentence is **not** in Plain Language.
## Training
Fine-tuned version of the original `xlm-roberta-large` model, trained on a dataset of Hungarian legal and administrative texts.
## Eval results
| Class | Precision | Recall | F-Score |
| ----- | --------- | ------ | ------- |
| **Comprehensible / Label_0** | **0.74** | **0.65** | **0.70** |
| **Not comprehensible / Label_1** | **0.71** | **0.79** | **0.74** |
| **accuracy** | | | **0.72** |
| **macro avg** | **0.73** | **0.72** | **0.72** |
| **weighted avg** | **0.72** | **0.72** | **0.72** |
## Usage
```py
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("uvegesistvan/Hun_RoBERTa_large_Plain")
model = AutoModelForSequenceClassification.from_pretrained("uvegesistvan/Hun_RoBERTa_large_Plain")
```
# Citation
Bibtex:
```bibtex
@PhDThesis{ Uveges:2024,
author = {{"U}veges, Istv{\'a}n},
title = {K{\"o}z{\'e}rthet{\"o} és automatiz{\'a}ci{\'o} - k{\'i}s{\'e}rletek a jog, term{\'e}szetesnyelv-feldolgoz{\'a}s {\'e}s informatika hat{\'a}r{\'a}n.},
year = {2024},
school = {Szegedi Tudom{\'a}nyegyetem}
}
``` |