|
--- |
|
language: nl |
|
tags: |
|
- token-classification |
|
- sequence-tagger-model |
|
--- |
|
|
|
# Goal |
|
This model can be used to add emoji to an input text. |
|
|
|
To accomplish this, we framed the problem as a token-classification problem, predicting the emoji that should follow a certain word/token as an entity. |
|
|
|
The accompanying demo, which includes all the pre- and postprocessing needed can be found [here](https://huggingface.co/spaces/ml6team/emoji_predictor). |
|
|
|
For the moment, this only works for Dutch texts. |
|
|
|
|
|
|
|
# Dataset |
|
For this model, we scraped about 1000 unique tweets per emoji we support: |
|
['π¨', 'π₯', 'π', 'π ', 'π€―', 'π', 'πΎ', 'π', 'β', 'π°'] |
|
|
|
Which could look like this: |
|
``` |
|
Wow ππ, what a cool car ππ! |
|
Omg, I hate mondays π ... I need a drink πΎ |
|
``` |
|
|
|
After some processing, we can reposition this in a more known NER format: |
|
|
|
|
|
| Word | Label | |
|
|-------|-----| |
|
| Wow | B-π| |
|
| , | O | |
|
| what | O | |
|
| a | O | |
|
| cool | O | |
|
| car | O | |
|
| ! | B-π| |
|
|
|
Which can then be leveraged for training a token classification model. |
|
|
|
Unfortunately, Terms of Service prohibit us from sharing the original dataset. |
|
|
|
|
|
|
|
# Training |
|
|
|
The model was trained for 4 epochs. |
|
|