File size: 1,214 Bytes

08b02a6

---
language: nl
tags:
- token-classification
- sequence-tagger-model
---

# Goal
This model can be used to add emoji to an input text.

To accomplish this, we framed the problem as a token-classification problem, predicting the emoji that should follow a certain word/token as an entity.

The accompanying demo, which includes all the pre- and postprocessing needed can be found [here](https://huggingface.co/spaces/ml6team/emoji_predictor).

For the moment, this only works for Dutch texts.



# Dataset
For this model, we scraped about 1000 unique tweets per emoji we support:
['😨', '😥', '😍', '😠', '🤯', '😄', '🍾', '🚗', '☕', '💰']

Which could look like this:
```
Wow 😍😍, what a cool car 🚗🚗!
Omg, I hate mondays 😠... I need a drink 🍾
```

After some processing, we can reposition this in a more known NER format:


| Word | Label |
|-------|-----|
| Wow   | B-😍|
| ,     | O   |
| what  | O   |
| a     | O   |
| cool  | O   |
| car   | O   |
| !     | B-🚗|

Which can then be leveraged for training a token classification model.

Unfortunately, Terms of Service prohibit us from sharing the original dataset.



# Training

The model was trained for 4 epochs.