classla
/

multilingual-IPTC-news-topic-classifier

@@ -134,7 +134,7 @@ The model can be used for classification into topic labels from the
  applied to any news text in a language, supported by the `xlm-roberta-large`.
 Based on a manually-annotated test set (in Croatian, Slovenian, Catalan and Greek),
-the model achieves micro-F1 score of 0.733, macro-F1 score of 0.745 and accuracy of 0.733,
 and outperforms the GPT-4o model (version `gpt-4o-2024-05-13`) used in a zero-shot setting.
  If we use only labels that are predicted with a confidence score equal or higher than 0.90,
  the model achieves micro-F1 and macro-F1 of 0.80.
@@ -248,37 +248,39 @@ The model was evaluated on a manually-annotated test set in four languages (Croa
  consisting of 1,129 instances.
 The test set contains similar amounts of texts from the four languages and is more or less balanced across labels.
-The model was shown to achieve micro-F1 score of 0.733, and macro-F1 score of 0.745. The results for the entire test set and per language:
 |    |   Micro-F1 |   Macro-F1 |   Accuracy | No. of instances |
 |:---|-----------:|-----------:|-----------:|-----------:|
-| All (combined)    |   0.733392 |   0.744633 | 0.733392 |  1129 |
-| Croatian |   0.728522 |   0.733725 |   0.728522 | 291 |
 | Catalan |   0.715356 |   0.722304 |   0.715356 | 267 |
 | Slovenian |   0.758865 |   0.764784 |   0.758865 | 282 |
-| Greek |   0.730104 |   0.742099 |   0.730104 | 289 |
 Performance per label:
-|                                           |   precision |   recall |   f1-score |   support |
-|:------------------------------------------|------------:|---------:|-----------:|----------:|
-| arts, culture, entertainment and media    |       0.602 |    0.875 |      0.713 |    64     |
-| conflict, war and peace                   |       0.611 |    0.917 |      0.733 |    36     |
-| crime, law and justice                    |       0.862 |    0.812 |      0.836 |    69     |
-| disaster, accident and emergency incident |       0.691 |    0.887 |      0.777 |    53     |
-| economy, business and finance             |       0.779 |    0.508 |      0.615 |   118     |
-| education                                 |       0.847 |    0.735 |      0.787 |    68     |
-| environment                               |       0.589 |    0.754 |      0.662 |    57     |
-| health                                    |       0.797 |    0.797 |      0.797 |    59     |
-| human interest                            |       0.552 |    0.673 |      0.607 |    55     |
-| labour                                    |       0.855 |    0.831 |      0.843 |    71     |
-| lifestyle and leisure                     |       0.769 |    0.465 |      0.58  |    86     |
-| politics                                  |       0.568 |    0.735 |      0.641 |    68     |
-| religion                                  |       0.842 |    0.941 |      0.889 |    51     |
-| science and technology                    |       0.638 |    0.8   |      0.71  |    55     |
-| society                                   |       0.918 |    0.5   |      0.647 |   112     |
-| sport                                     |       0.824 |    0.968 |      0.891 |    63     |
-| weather                                   |       0.932 |    0.932 |      0.932 |    44     |
 For downstream tasks, **we advise you to use only labels that were predicted with confidence score
 higher or equal to 0.90 which further improves the performance**.

  applied to any news text in a language, supported by the `xlm-roberta-large`.
 Based on a manually-annotated test set (in Croatian, Slovenian, Catalan and Greek),
+the model achieves macro-F1 score of 0.746, micro-F1 score of 0.734, and accuracy of 0.734,
 and outperforms the GPT-4o model (version `gpt-4o-2024-05-13`) used in a zero-shot setting.
  If we use only labels that are predicted with a confidence score equal or higher than 0.90,
  the model achieves micro-F1 and macro-F1 of 0.80.
  consisting of 1,129 instances.
 The test set contains similar amounts of texts from the four languages and is more or less balanced across labels.
+The model was shown to achieve micro-F1 score of 0.734, and macro-F1 score of 0.746. The results for the entire test set and per language:
 |    |   Micro-F1 |   Macro-F1 |   Accuracy | No. of instances |
 |:---|-----------:|-----------:|-----------:|-----------:|
+| All (combined)    |   0.734278 |   0.745864 | 0.734278 |  1129 |
+| Croatian |  0.728522 |   0.733725 |   0.728522 | 291 |
 | Catalan |   0.715356 |   0.722304 |   0.715356 | 267 |
 | Slovenian |   0.758865 |   0.764784 |   0.758865 | 282 |
+| Greek |    0.733564 |   0.747129 |   0.733564 | 289 |
 Performance per label:
+|                                           |   precision |   recall |   f1-score |     support |
+|:------------------------------------------|------------:|---------:|-----------:|------------:|
+| arts, culture, entertainment and media    |    0.602151 | 0.875    |   0.713376 |   64        |
+| conflict, war and peace                   |    0.611111 | 0.916667 |   0.733333 |   36        |
+| crime, law and justice                    |    0.861538 | 0.811594 |   0.835821 |   69        |
+| disaster, accident and emergency incident |    0.691176 | 0.886792 |   0.77686  |   53        |
+| economy, business and finance             |    0.779221 | 0.508475 |   0.615385 |  118        |
+| education                                 |    0.847458 | 0.735294 |   0.787402 |   68        |
+| environment                               |    0.589041 | 0.754386 |   0.661538 |   57        |
+| health                                    |    0.79661  | 0.79661  |   0.79661  |   59        |
+| human interest                            |    0.552239 | 0.672727 |   0.606557 |   55        |
+| labour                                    |    0.855072 | 0.830986 |   0.842857 |   71        |
+| lifestyle and leisure                     |    0.773585 | 0.476744 |   0.589928 |   86        |
+| politics                                  |    0.568182 | 0.735294 |   0.641026 |   68        |
+| religion                                  |    0.842105 | 0.941176 |   0.888889 |   51        |
+| science and technology                    |    0.637681 | 0.8      |   0.709677 |   55        |
+| society                                   |    0.918033 | 0.5      |   0.647399 |  112        |
+| sport                                     |    0.824324 | 0.968254 |   0.890511 |   63        |
+| weather                                   |    0.953488 | 0.931818 |   0.942529 |   44        |
 For downstream tasks, **we advise you to use only labels that were predicted with confidence score
 higher or equal to 0.90 which further improves the performance**.