ml6team
/

xlm-roberta-base-nl-emoji-ner

Token Classification

sequence-tagger-model

Inference Endpoints

Model card Files Files and versions Community

xlm-roberta-base-nl-emoji-ner / README.md

thomasdehaene's picture

Create README.md

08b02a6 over 2 years ago

|

history blame contribute delete

1.21 kB

	---
	language: nl
	tags:
	- token-classification
	- sequence-tagger-model
	---

	# Goal
	This model can be used to add emoji to an input text.

	To accomplish this, we framed the problem as a token-classification problem, predicting the emoji that should follow a certain word/token as an entity.

	The accompanying demo, which includes all the pre- and postprocessing needed can be found [here](https://huggingface.co/spaces/ml6team/emoji_predictor).

	For the moment, this only works for Dutch texts.



	# Dataset
	For this model, we scraped about 1000 unique tweets per emoji we support:
	['😨', '😥', '😍', '😠', '🤯', '😄', '🍾', '🚗', '☕', '💰']

	Which could look like this:
	```
	Wow 😍😍, what a cool car 🚗🚗!
	Omg, I hate mondays 😠... I need a drink 🍾
	```

	After some processing, we can reposition this in a more known NER format:


	\| Word \| Label \|
	\|-------\|-----\|
	\| Wow \| B-😍\|
	\| , \| O \|
	\| what \| O \|
	\| a \| O \|
	\| cool \| O \|
	\| car \| O \|
	\| ! \| B-🚗\|

	Which can then be leveraged for training a token classification model.

	Unfortunately, Terms of Service prohibit us from sharing the original dataset.



	# Training

	The model was trained for 4 epochs.