MoritzLaurer HF staff commited on
Commit
3e8cb87
1 Parent(s): 6d5161a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +51 -0
README.md CHANGED
@@ -57,6 +57,57 @@ print(output)
57
  ### Details on data and training
58
  The code for preparing the data and training & evaluating the model is fully open-source here: https://github.com/MoritzLaurer/zeroshot-classifier/tree/main
59
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
60
  ## Limitations and bias
61
  The model can only do text classification tasks.
62
 
 
57
  ### Details on data and training
58
  The code for preparing the data and training & evaluating the model is fully open-source here: https://github.com/MoritzLaurer/zeroshot-classifier/tree/main
59
 
60
+ ## Metrics
61
+
62
+ Balanced accuracy metrics on all datasets.
63
+ `deberta-v3-base-zeroshot-v1.1-all-33` was trained on all datasets, with only maximum 500 texts per class to avoid overfitting.
64
+ The metrics on these datasets are therefore not strictly zeroshot, as the model has seen some data for each task.
65
+ `deberta-v3-base-zeroshot-v1.1-heldout` indicates zeroshot performance on the respective dataset.
66
+ To calculate these zeroshot metrics, the pipeline was run 28 times, each time with one dataset held out from training to simulate a zeroshot setup.
67
+
68
+ ![figure_base_v1.1](https://github.com/MoritzLaurer/zeroshot-classifier/blob/main/results/fig_base_v1.1.png)
69
+
70
+ | | deberta-v3-base-mnli-fever-anli-ling-wanli-binary | deberta-v3-base-zeroshot-v1.1-heldout | deberta-v3-base-zeroshot-v1.1-all-33 |
71
+ |:---------------------------|---------------------------:|----------------------------------------:|---------------------------------------:|
72
+ | datasets mean (w/o nli) | 62 | 70.7 | 84 |
73
+ | amazonpolarity (2) | 91.7 | 95.7 | 96 |
74
+ | imdb (2) | 87.3 | 93.6 | 94.5 |
75
+ | appreviews (2) | 91.3 | 92.2 | 94.4 |
76
+ | yelpreviews (2) | 95.1 | 97.4 | 98.3 |
77
+ | rottentomatoes (2) | 83 | 88.7 | 90.8 |
78
+ | emotiondair (6) | 46.5 | 42.6 | 74.5 |
79
+ | emocontext (4) | 58.5 | 57.4 | 81.2 |
80
+ | empathetic (32) | 31.3 | 37.3 | 52.7 |
81
+ | financialphrasebank (3) | 78.3 | 68.9 | 91.2 |
82
+ | banking77 (72) | 18.9 | 46 | 73.7 |
83
+ | massive (59) | 44 | 56.6 | 78.9 |
84
+ | wikitoxic_toxicaggreg (2) | 73.7 | 82.5 | 90.5 |
85
+ | wikitoxic_obscene (2) | 77.3 | 91.6 | 92.6 |
86
+ | wikitoxic_threat (2) | 83.5 | 95.2 | 96.7 |
87
+ | wikitoxic_insult (2) | 79.6 | 91 | 91.6 |
88
+ | wikitoxic_identityhate (2) | 83.9 | 88 | 94.4 |
89
+ | hateoffensive (3) | 55.2 | 66.1 | 86 |
90
+ | hatexplain (3) | 44.1 | 57.6 | 76.9 |
91
+ | biasframes_offensive (2) | 56.8 | 85.4 | 87 |
92
+ | biasframes_sex (2) | 85.4 | 87 | 91.8 |
93
+ | biasframes_intent (2) | 56.3 | 85.2 | 87.8 |
94
+ | agnews (4) | 77.3 | 80 | 90.5 |
95
+ | yahootopics (10) | 53.6 | 57.7 | 72.8 |
96
+ | trueteacher (2) | 51.4 | 49.5 | 82.4 |
97
+ | spam (2) | 51.8 | 50 | 97.2 |
98
+ | wellformedquery (2) | 49.9 | 52.5 | 77.2 |
99
+ | manifesto (56) | 5.8 | 18.9 | 39.1 |
100
+ | capsotu (21) | 25.2 | 64 | 72.5 |
101
+ | mnli_m (2) | 92.4 | nan | 92.7 |
102
+ | mnli_mm (2) | 92.4 | nan | 92.5 |
103
+ | fevernli (2) | 89 | nan | 89.1 |
104
+ | anli_r1 (2) | 79.4 | nan | 80 |
105
+ | anli_r2 (2) | 68.4 | nan | 68.4 |
106
+ | anli_r3 (2) | 66.2 | nan | 68 |
107
+ | wanli (2) | 81.6 | nan | 81.8 |
108
+ | lingnli (2) | 88.4 | nan | 88.4 |
109
+
110
+
111
  ## Limitations and bias
112
  The model can only do text classification tasks.
113