"Não é medo, é recheio": Sequence Labeling for Pun Location and Detection in Portuguese

This repository contains the models fine-tuned for the task of Pun Location with Portuguese Language, trained with the Puntuguese dataset. There are several models available:

GlorIA-1.3B-all
GlorIA-1.3B-positive
albertina-900m-ptbr-all
albertina-900m-ptbr-positive
albertina-900m-ptpt-all
albertina-900m-ptpt-positive

The *-all models were fine-tuned with all the data from the training portion of Puntuguese, including negative examples. Meanwhile, the *-positive models were trained only on texts that contain at least one pun sign.

We make available all of the models' checkpoints. Therefore, we encourage to walk through the files and find the one most suitable.

How to use

To load a model, use the AutoModelForSequenceClassification.from_pretrained() method with the subfolder argument.

For example, if we want to load the checkpoint 500 of albertina-900m-ptbr-positive, we need the following code:

from transformers import AutoModelForSequenceClassification

model = AutoModelForSequenceClassification.from_pretrained('Superar/Portuguese-Pun-Location',
                                                            subfolder='albertina-900m-ptbr-positive/checkpoint-500')

This should load the correct model.

How to cite

@inproceedings{gameiro_etal:epia2024,
  title = {Sequence Labeling for Pun Location and Detection in {{Portuguese}}},
  booktitle = {Proceedings of 23rd {{EPIA}} Conference on Artificial Intelligence, {{EPIA}} 2024},
  author = {Gameiro, Patr{\'{\i}}cia and In{\'a}cio, Marcio and Gon{\c c}alo Oliveira, Hugo and Alves, Ana},
  year = {2024},
  pages = {In press},
  address = {Viana do Castelo, Portugal}
}

Superar
/

Portuguese-Pun-Location

"Não é medo, é recheio": Sequence Labeling for Pun Location and Detection in Portuguese

How to use

How to cite

Dataset used to train Superar/Portuguese-Pun-Location