"Não é medo, é recheio": Sequence Labeling for Pun Location and Detection in Portuguese
This repository contains the models fine-tuned for the task of Pun Location with Portuguese Language, trained with the Puntuguese dataset. There are several models available:
GlorIA-1.3B-all
GlorIA-1.3B-positive
albertina-900m-ptbr-all
albertina-900m-ptbr-positive
albertina-900m-ptpt-all
albertina-900m-ptpt-positive
The *-all
models were fine-tuned with all the data from the training portion of Puntuguese, including negative examples. Meanwhile, the *-positive
models were trained only on texts that contain at least one pun sign.
We make available all of the models' checkpoints. Therefore, we encourage to walk through the files and find the one most suitable.
How to use
To load a model, use the AutoModelForSequenceClassification.from_pretrained()
method with the subfolder
argument.
For example, if we want to load the checkpoint 500 of albertina-900m-ptbr-positive
, we need the following code:
from transformers import AutoModelForSequenceClassification
model = AutoModelForSequenceClassification.from_pretrained('Superar/Portuguese-Pun-Location',
subfolder='albertina-900m-ptbr-positive/checkpoint-500')
This should load the correct model.
How to cite
@inproceedings{gameiro_etal:epia2024,
title = {Sequence Labeling for Pun Location and Detection in {{Portuguese}}},
booktitle = {Proceedings of 23rd {{EPIA}} Conference on Artificial Intelligence, {{EPIA}} 2024},
author = {Gameiro, Patr{\'{\i}}cia and In{\'a}cio, Marcio and Gon{\c c}alo Oliveira, Hugo and Alves, Ana},
year = {2024},
pages = {In press},
address = {Viana do Castelo, Portugal}
}