LoicDL's picture
Update README.md
b84f56c verified

Monolingual Dutch Models for Zero-Shot Text Classification

This family of Dutch models were finetuned on combined data from the (translated) snli and SICK-NL datasets. They are intended to be used in zero-shot classification for Dutch through Huggingface Pipelines.

The Models

Base Model Huggingface id (fine-tuned)
BERTje LoicDL/bert-base-dutch-cased-finetuned-snli
RobBERT V2 LoicDL/robbert-v2-dutch-finetuned-snli
RobBERTje this model

How to use

While this family of models can be used for evaluating (monolingual) NLI datasets, it's primary intended use is zero-shot text classification in Dutch. In this setting, classification tasks are recast as NLI problems. Consider the following sentence pairing that can be used to simulate a sentiment classification problem:

  • Premise: The food in this place was horrendous
  • Hypothesis: This is a negative review

For more information on using Natural Language Inference models for zero-shot text classification, we refer to this paper.

By default, all our models are fully compatible with the Huggingface pipeline for zero-shot classification. They can be downloaded and accessed through the following code:

from transformers import pipeline

classifier = pipeline(
                      task="zero-shot-classification",
                      model='LoicDL/robbertje-dutch-finetuned-snli'
                    )


text_piece = "Het eten in dit restaurant is heel lekker."
labels = ["positief", "negatief", "neutraal"]
template = "Het sentiment van deze review is {}"

predictions = classifier(text_piece,
                         labels,
                         multi_class=False,
                         hypothesis_template=template
                         )

Model Performance

Performance on NLI task

Model Accuracy [%] F1 [%]
bert-base-dutch-cased-finetuned-snli 86.21 86.42
robbert-v2-dutch-finetuned-snli 87.61 88.02
robbertje-dutch-finetuned-snli 83.28 84.11

BibTeX entry and citation info

If you would like to use or cite our paper or model, feel free to use the following BibTeX code:

@article{De Langhe_Maladry_Vanroy_De Bruyne_Singh_Lefever_2024,
title={Benchmarking Zero-Shot Text Classification for Dutch},
volume={13},
url={https://www.clinjournal.org/clinj/article/view/172},
journal={Computational Linguistics in the Netherlands Journal},
author={De Langhe, Loic and Maladry, Aaron and Vanroy, Bram and De Bruyne, Luna and Singh, Pranaydeep and Lefever, Els and De Clercq, Orphée},
year={2024},
month={Mar.},
pages={63–90} }