TajaKuzman commited on
Commit
baf40ba
1 Parent(s): 4b28311

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +26 -24
README.md CHANGED
@@ -134,7 +134,7 @@ The model can be used for classification into topic labels from the
134
  applied to any news text in a language, supported by the `xlm-roberta-large`.
135
 
136
  Based on a manually-annotated test set (in Croatian, Slovenian, Catalan and Greek),
137
- the model achieves micro-F1 score of 0.733, macro-F1 score of 0.745 and accuracy of 0.733,
138
  and outperforms the GPT-4o model (version `gpt-4o-2024-05-13`) used in a zero-shot setting.
139
  If we use only labels that are predicted with a confidence score equal or higher than 0.90,
140
  the model achieves micro-F1 and macro-F1 of 0.80.
@@ -248,37 +248,39 @@ The model was evaluated on a manually-annotated test set in four languages (Croa
248
  consisting of 1,129 instances.
249
  The test set contains similar amounts of texts from the four languages and is more or less balanced across labels.
250
 
251
- The model was shown to achieve micro-F1 score of 0.733, and macro-F1 score of 0.745. The results for the entire test set and per language:
252
 
253
  | | Micro-F1 | Macro-F1 | Accuracy | No. of instances |
254
  |:---|-----------:|-----------:|-----------:|-----------:|
255
- | All (combined) | 0.733392 | 0.744633 | 0.733392 | 1129 |
256
- | Croatian | 0.728522 | 0.733725 | 0.728522 | 291 |
257
  | Catalan | 0.715356 | 0.722304 | 0.715356 | 267 |
258
  | Slovenian | 0.758865 | 0.764784 | 0.758865 | 282 |
259
- | Greek | 0.730104 | 0.742099 | 0.730104 | 289 |
 
260
 
261
  Performance per label:
262
 
263
- | | precision | recall | f1-score | support |
264
- |:------------------------------------------|------------:|---------:|-----------:|----------:|
265
- | arts, culture, entertainment and media | 0.602 | 0.875 | 0.713 | 64 |
266
- | conflict, war and peace | 0.611 | 0.917 | 0.733 | 36 |
267
- | crime, law and justice | 0.862 | 0.812 | 0.836 | 69 |
268
- | disaster, accident and emergency incident | 0.691 | 0.887 | 0.777 | 53 |
269
- | economy, business and finance | 0.779 | 0.508 | 0.615 | 118 |
270
- | education | 0.847 | 0.735 | 0.787 | 68 |
271
- | environment | 0.589 | 0.754 | 0.662 | 57 |
272
- | health | 0.797 | 0.797 | 0.797 | 59 |
273
- | human interest | 0.552 | 0.673 | 0.607 | 55 |
274
- | labour | 0.855 | 0.831 | 0.843 | 71 |
275
- | lifestyle and leisure | 0.769 | 0.465 | 0.58 | 86 |
276
- | politics | 0.568 | 0.735 | 0.641 | 68 |
277
- | religion | 0.842 | 0.941 | 0.889 | 51 |
278
- | science and technology | 0.638 | 0.8 | 0.71 | 55 |
279
- | society | 0.918 | 0.5 | 0.647 | 112 |
280
- | sport | 0.824 | 0.968 | 0.891 | 63 |
281
- | weather | 0.932 | 0.932 | 0.932 | 44 |
 
282
 
283
  For downstream tasks, **we advise you to use only labels that were predicted with confidence score
284
  higher or equal to 0.90 which further improves the performance**.
 
134
  applied to any news text in a language, supported by the `xlm-roberta-large`.
135
 
136
  Based on a manually-annotated test set (in Croatian, Slovenian, Catalan and Greek),
137
+ the model achieves macro-F1 score of 0.746, micro-F1 score of 0.734, and accuracy of 0.734,
138
  and outperforms the GPT-4o model (version `gpt-4o-2024-05-13`) used in a zero-shot setting.
139
  If we use only labels that are predicted with a confidence score equal or higher than 0.90,
140
  the model achieves micro-F1 and macro-F1 of 0.80.
 
248
  consisting of 1,129 instances.
249
  The test set contains similar amounts of texts from the four languages and is more or less balanced across labels.
250
 
251
+ The model was shown to achieve micro-F1 score of 0.734, and macro-F1 score of 0.746. The results for the entire test set and per language:
252
 
253
  | | Micro-F1 | Macro-F1 | Accuracy | No. of instances |
254
  |:---|-----------:|-----------:|-----------:|-----------:|
255
+ | All (combined) | 0.734278 | 0.745864 | 0.734278 | 1129 |
256
+ | Croatian | 0.728522 | 0.733725 | 0.728522 | 291 |
257
  | Catalan | 0.715356 | 0.722304 | 0.715356 | 267 |
258
  | Slovenian | 0.758865 | 0.764784 | 0.758865 | 282 |
259
+ | Greek | 0.733564 | 0.747129 | 0.733564 | 289 |
260
+
261
 
262
  Performance per label:
263
 
264
+ | | precision | recall | f1-score | support |
265
+ |:------------------------------------------|------------:|---------:|-----------:|------------:|
266
+ | arts, culture, entertainment and media | 0.602151 | 0.875 | 0.713376 | 64 |
267
+ | conflict, war and peace | 0.611111 | 0.916667 | 0.733333 | 36 |
268
+ | crime, law and justice | 0.861538 | 0.811594 | 0.835821 | 69 |
269
+ | disaster, accident and emergency incident | 0.691176 | 0.886792 | 0.77686 | 53 |
270
+ | economy, business and finance | 0.779221 | 0.508475 | 0.615385 | 118 |
271
+ | education | 0.847458 | 0.735294 | 0.787402 | 68 |
272
+ | environment | 0.589041 | 0.754386 | 0.661538 | 57 |
273
+ | health | 0.79661 | 0.79661 | 0.79661 | 59 |
274
+ | human interest | 0.552239 | 0.672727 | 0.606557 | 55 |
275
+ | labour | 0.855072 | 0.830986 | 0.842857 | 71 |
276
+ | lifestyle and leisure | 0.773585 | 0.476744 | 0.589928 | 86 |
277
+ | politics | 0.568182 | 0.735294 | 0.641026 | 68 |
278
+ | religion | 0.842105 | 0.941176 | 0.888889 | 51 |
279
+ | science and technology | 0.637681 | 0.8 | 0.709677 | 55 |
280
+ | society | 0.918033 | 0.5 | 0.647399 | 112 |
281
+ | sport | 0.824324 | 0.968254 | 0.890511 | 63 |
282
+ | weather | 0.953488 | 0.931818 | 0.942529 | 44 |
283
+
284
 
285
  For downstream tasks, **we advise you to use only labels that were predicted with confidence score
286
  higher or equal to 0.90 which further improves the performance**.