thomasdehaene's picture
Create README.md
08b02a6
metadata
language: nl
tags:
  - token-classification
  - sequence-tagger-model

Goal

This model can be used to add emoji to an input text.

To accomplish this, we framed the problem as a token-classification problem, predicting the emoji that should follow a certain word/token as an entity.

The accompanying demo, which includes all the pre- and postprocessing needed can be found here.

For the moment, this only works for Dutch texts.

Dataset

For this model, we scraped about 1000 unique tweets per emoji we support: ['😨', 'πŸ˜₯', '😍', '😠', '🀯', 'πŸ˜„', '🍾', 'πŸš—', 'β˜•', 'πŸ’°']

Which could look like this:

Wow 😍😍, what a cool car πŸš—πŸš—!
Omg, I hate mondays 😠... I need a drink 🍾

After some processing, we can reposition this in a more known NER format:

Word Label
Wow B-😍
, O
what O
a O
cool O
car O
! B-πŸš—

Which can then be leveraged for training a token classification model.

Unfortunately, Terms of Service prohibit us from sharing the original dataset.

Training

The model was trained for 4 epochs.