Upload README.md
Browse files
README.md
CHANGED
@@ -1,63 +1,80 @@
|
|
1 |
-
---
|
2 |
-
language: german
|
3 |
-
widget:
|
4 |
-
- text: "It has been determined that the amount of greenhouse gases have decreased by almost half because of the prevalence in the utilization of nuclear power."
|
5 |
-
---
|
6 |
|
7 |
### Welcome to ParlBERT-Topic-German!
|
8 |
|
9 |
-
|
10 |
|
11 |
-
This model was trained on
|
12 |
|
13 |
🗃 **Dataset**
|
14 |
|
15 |
-
|
16 |
-
|
17 |
-
| TOPIC | ARGUMENT | NON-ARGUMENT |
|
18 |
|----|----|----|
|
19 |
-
|
|
20 |
-
|
|
21 |
-
|
|
22 |
-
|
|
23 |
-
|
|
24 |
-
|
|
25 |
-
|
|
26 |
-
| minimum wage | 325 | 1,346 |
|
27 |
|
28 |
🏃🏼♂️**Model training**
|
29 |
|
30 |
-
**ParlBERT-Topic** was fine-tuned on
|
|
|
|
|
31 |
|
32 |
```
|
33 |
-
|
34 |
-
|
35 |
-
|
36 |
-
|
37 |
-
|
38 |
-
|
39 |
-
)
|
|
|
40 |
```
|
41 |
|
|
|
42 |
📊 **Evaluation**
|
43 |
|
44 |
The model was evaluated on an evaluation set (20%):
|
45 |
|
46 |
-
|
|
47 |
-
|----|----|----|----|----|----|----|
|
48 |
-
| RoBERTArg | 0.8193 | 0.8021 | 0.8463 | 0.7986 | 0.7623 | 0.8719 |
|
49 |
-
|
50 |
-
Showing the **confusion matrix** using again the evaluation set:
|
51 |
-
|
52 |
-
| | ARGUMENT | NON-ARGUMENT |
|
53 |
|----|----|----|
|
54 |
-
|
|
55 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
56 |
|
57 |
⚠️ **Intended Uses & Potential Limitations**
|
58 |
|
59 |
The model can only be a starting point to dive into the exciting field of policy topic classification in political texts. But be aware. Models are often highly topic dependent. Therefore, the model may perform less well on different topics and text types not included in the training set.
|
60 |
|
61 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
62 |
|
63 |
🐦 Twitter: [@chklamm](http://twitter.com/chklamm)
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
|
2 |
### Welcome to ParlBERT-Topic-German!
|
3 |
|
4 |
+
🏷 **Model description**
|
5 |
|
6 |
+
This model was trained on \~10k manually annotated political interpellations (📚 [Breunig/ Schnatterer 2019](https://oxford.universitypressscholarship.com/view/10.1093/oso/9780198835332.001.0001/oso-9780198835332)) of comparative agenda topics to classify text into one of twenty labels (annotation codebook).
|
7 |
|
8 |
🗃 **Dataset**
|
9 |
|
10 |
+
| party | speeches | tokens |
|
|
|
|
|
11 |
|----|----|----|
|
12 |
+
| CDU/CSU | 7,635 | 4,862,654 |
|
13 |
+
| SPD | 5,321 | 3,158,315 |
|
14 |
+
| AfD | 3,465 | 1,844,707 |
|
15 |
+
| FDP | 3,067 | 1,593,108 |
|
16 |
+
| The Greens | 2,866 | 1,522,305 |
|
17 |
+
| The Left | 2,671 | 1,394,089 |
|
18 |
+
| cross-bencher | 200 | 86,170 |
|
|
|
19 |
|
20 |
🏃🏼♂️**Model training**
|
21 |
|
22 |
+
**ParlBERT-Topic** was fine-tuned on a domain adapted model for topic modeling with interpellations dataset from the Comparative Agendas Project (mlm\_probability=.15). We used the HuggingFace trainer with the following hyperparameters.
|
23 |
+
|
24 |
+
🤖 ** Use **
|
25 |
|
26 |
```
|
27 |
+
from transformers import pipeline
|
28 |
+
|
29 |
+
pipeline_classification_topics = pipeline("text-classification", model="chkla/parlbert-topics-german", tokenizer="bert-base-german-cased", return_all_scores=False, device=0)
|
30 |
+
|
31 |
+
text = "Sachgebiet Ausschließliche Gesetzgebungskompetenz des Bundes über die Zusammenarbeit des Bundes und der Länder zum Schutze der freiheitlichen demokratischen Grundordnung, des Bestandes und der Sicherheit des Bundes oder eines Landes Wir fragen die Bundesregierung"
|
32 |
+
|
33 |
+
pipeline_classification_topics(text) # Government
|
34 |
+
|
35 |
```
|
36 |
|
37 |
+
|
38 |
📊 **Evaluation**
|
39 |
|
40 |
The model was evaluated on an evaluation set (20%):
|
41 |
|
42 |
+
| Label | F1 | support |
|
|
|
|
|
|
|
|
|
|
|
|
|
43 |
|----|----|----|
|
44 |
+
| International | 80.0 | 1,126 |
|
45 |
+
| Defense | 85.0 | 1,099 |
|
46 |
+
| Government | 71.3 | 989 |
|
47 |
+
| International | 76.5 | 978 |
|
48 |
+
| International | 76.6 | 845 |
|
49 |
+
| International | 86.0 | 800 |
|
50 |
+
| International | 67.1 | 0.8021 |
|
51 |
+
| International | 78.6 | 0.8021 |
|
52 |
+
| International | 78.2 | 0.8021 |
|
53 |
+
| International | 64.4 | 0.8021 |
|
54 |
+
| International | 81.0 | 0.8021 |
|
55 |
+
| International | 69.1 | 0.8021 |
|
56 |
+
| International | 62.8 | 0.8021 |
|
57 |
+
| International | 76.3 | 0.8021 |
|
58 |
+
| International | 49.2 | 0.8021 |
|
59 |
+
| International | 63.0 | 0.8021 |
|
60 |
+
| International | 71.6 | 0.8021 |
|
61 |
+
| International | 79.6 | 0.8021 |
|
62 |
+
| International | 61.5 | 0.8021 |
|
63 |
+
| International | 45.4 | 0.8021 |
|
64 |
+
|
65 |
|
66 |
⚠️ **Intended Uses & Potential Limitations**
|
67 |
|
68 |
The model can only be a starting point to dive into the exciting field of policy topic classification in political texts. But be aware. Models are often highly topic dependent. Therefore, the model may perform less well on different topics and text types not included in the training set.
|
69 |
|
70 |
+
👥 ** Cite **
|
71 |
+
```
|
72 |
+
@article{klamm2022frameast,
|
73 |
+
title={FrameASt: A Framework for Second-level Agenda Setting in Parliamentary Debates through the Lense of Comparative Agenda Topics},
|
74 |
+
author={Klamm, Christopher and Rehbein, Ines and Ponzetto, Simone},
|
75 |
+
journal={ParlaCLARIN III at LREC2022},
|
76 |
+
year={2022}
|
77 |
+
}
|
78 |
+
```
|
79 |
|
80 |
🐦 Twitter: [@chklamm](http://twitter.com/chklamm)
|