File size: 1,214 Bytes
08b02a6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
---
language: nl
tags:
- token-classification
- sequence-tagger-model
---

# Goal
This model can be used to add emoji to an input text.

To accomplish this, we framed the problem as a token-classification problem, predicting the emoji that should follow a certain word/token as an entity.

The accompanying demo, which includes all the pre- and postprocessing needed can be found [here](https://huggingface.co/spaces/ml6team/emoji_predictor).

For the moment, this only works for Dutch texts.



# Dataset
For this model, we scraped about 1000 unique tweets per emoji we support:
['😨', 'πŸ˜₯', '😍', '😠', '🀯', 'πŸ˜„', '🍾', 'πŸš—', 'β˜•', 'πŸ’°']

Which could look like this:
```
Wow 😍😍, what a cool car πŸš—πŸš—!
Omg, I hate mondays 😠... I need a drink 🍾
```

After some processing, we can reposition this in a more known NER format:


| Word | Label |
|-------|-----|
| Wow   | B-😍|
| ,     | O   |
| what  | O   |
| a     | O   |
| cool  | O   |
| car   | O   |
| !     | B-πŸš—|

Which can then be leveraged for training a token classification model.

Unfortunately, Terms of Service prohibit us from sharing the original dataset.



# Training

The model was trained for 4 epochs.