g8a9 commited on
Commit
7ca37ec
1 Parent(s): 5ec0684

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +33 -2
README.md CHANGED
@@ -5,6 +5,37 @@ language:
5
  library_name: transformers
6
  ---
7
 
8
- This model has been trained and released as part of the MilaNLP solution to the EDOS Shared Task.
 
9
 
10
- This model card is WIP, we will update it soon.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5
  library_name: transformers
6
  ---
7
 
8
+ This model has been trained and released as part of the MilaNLP solution to the EDOS Shared Task. \
9
+ Please check out the paper [MilaNLP at SemEval-2023 Task 10: Ensembling Domain-Adapted and Regularized Pretrained Language Models for Robust Sexism Detection](https://aclanthology.org/2023.semeval-1.285/) for further details.
10
 
11
+
12
+ ## Adaptation Details
13
+
14
+ We ran domain adaptation of a [pretrained DeBERTa](https://huggingface.co/microsoft/deberta-v3-large) with standard MLM on the unlabeled Reddit corpus (1M posts) provided by the task organizers (Kirk et al., 2023) and the Gab Hate Corpus
15
+ (87K posts) (Kennedy et al., 2022). After concatenating and shuffling the two datasets, we held out 5% as validation data, stratifying on the data source. Our final training dataset counted around 20M words.
16
+
17
+ Please refer to the paper for full details.
18
+
19
+ ## Reference
20
+
21
+ If you use the model, please consider citing:
22
+
23
+ ```bibtex
24
+ @inproceedings{cercas-curry-etal-2023-milanlp,
25
+ title = "{M}ila{NLP} at {S}em{E}val-2023 Task 10: Ensembling Domain-Adapted and Regularized Pretrained Language Models for Robust Sexism Detection",
26
+ author = "Cercas Curry, Amanda and
27
+ Attanasio, Giuseppe and
28
+ Nozza, Debora and
29
+ Hovy, Dirk",
30
+ booktitle = "Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)",
31
+ month = jul,
32
+ year = "2023",
33
+ address = "Toronto, Canada",
34
+ publisher = "Association for Computational Linguistics",
35
+ url = "https://aclanthology.org/2023.semeval-1.285",
36
+ doi = "10.18653/v1/2023.semeval-1.285",
37
+ pages = "2067--2074",
38
+ abstract = "We present the system proposed by the MilaNLP team for the Explainable Detection of Online Sexism (EDOS) shared task.We propose an ensemble modeling approach to combine different classifiers trained with domain adaptation objectives and standard fine-tuning.Our results show that the ensemble is more robust than individual models and that regularized models generate more {``}conservative{''} predictions, mitigating the effects of lexical overfitting.However, our error analysis also finds that many of the misclassified instances are debatable, raising questions about the objective annotatability of hate speech data.",
39
+ }
40
+
41
+ ```