edumunozsala
/

roberta_bne_sentiment_analysis_es

Text Classification

TextClassification

SentimentAnalysis

Inference Endpoints

Model card Files Files and versions Community

roberta_bne_sentiment_analysis_es / README.md

edumunozsala's picture

Fix Metric Typos (#1)

6a506e8 over 2 years ago

|

3.63 kB

	---
	language: es
	tags:
	- sagemaker
	- roberta-bne
	- TextClassification
	- SentimentAnalysis
	license: apache-2.0
	datasets:
	- IMDbreviews_es
	metrics:
	- accuracy
	model-index:
	- name: roberta_bne_sentiment_analysis_es
	results:
	- task:
	name: Sentiment Analysis
	type: sentiment-analysis
	dataset:
	name: "IMDb Reviews in Spanish"
	type: IMDbreviews_es
	metrics:
	- name: Accuracy
	type: accuracy
	value: 0.9106666666666666
	- name: F1 Score
	type: f1
	value: 0.9090909090909091
	- name: Precision
	type: precision
	value: 0.9063852813852814
	- name: Recall
	type: recall
	value: 0.9118127381600436
	widget:
	- text: "Se trata de una película interesante, con un solido argumento y un gran interpretación de su actor principal"
	---

	# Model roberta_bne_sentiment_analysis_es

	## A finetuned model for Sentiment analysis in Spanish

	This model was trained using Amazon SageMaker and the new Hugging Face Deep Learning container,
	The base model is RoBERTa-base-bne which is a RoBERTa base model and has been pre-trained using the largest Spanish corpus known to date, with a total of 570GB.
	It was trained by The [National Library of Spain (Biblioteca Nacional de España)](http://www.bne.es/en/Inicio/index.html)


	RoBERTa BNE Citation
	Check out the paper for all the details: https://arxiv.org/abs/2107.07253

	```
	@article{gutierrezfandino2022,
	author = {Asier Gutiérrez-Fandiño and Jordi Armengol-Estapé and Marc Pàmies and Joan Llop-Palao and Joaquin Silveira-Ocampo and Casimiro Pio Carrino and Carme Armentano-Oller and Carlos Rodriguez-Penagos and Aitor Gonzalez-Agirre and Marta Villegas},
	title = {MarIA: Spanish Language Models},
	journal = {Procesamiento del Lenguaje Natural},
	volume = {68},
	number = {0},
	year = {2022},
	issn = {1989-7553},
	url = {http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/6405},
	pages = {39--60}
	}
	```

	## Dataset
	The dataset is a collection of movie reviews in Spanish, about 50,000 reviews. The dataset is balanced and provides every review in english, in spanish and the label in both languages.

	Sizes of datasets:
	- Train dataset: 42,500
	- Validation dataset: 3,750
	- Test dataset: 3,750

	## Intended uses & limitations

	This model is intented for Sentiment Analysis for spanish corpus and finetuned specially for movie reviews but it can be applied to other kind of reviews.

	## Hyperparameters
	{
	"epochs": "4",
	"train_batch_size": "32",
	"eval_batch_size": "8",
	"fp16": "true",
	"learning_rate": "3e-05",
	"model_name": "\"PlanTL-GOB-ES/roberta-base-bne\"",
	"sagemaker_container_log_level": "20",
	"sagemaker_program": "\"train.py\"",
	}

	## Evaluation results

	- Accuracy = 0.9106666666666666

	- F1 Score = 0.9090909090909091

	- Precision = 0.9063852813852814

	- Recall = 0.9118127381600436

	## Test results

	## Model in action

	### Usage for Sentiment Analysis

	```python
	import torch
	from transformers import AutoTokenizer, AutoModelForSequenceClassification

	tokenizer = AutoTokenizer.from_pretrained("edumunozsala/roberta_bne_sentiment_analysis_es")
	model = AutoModelForSequenceClassification.from_pretrained("edumunozsala/roberta_bne_sentiment_analysis_es")

	text ="Se trata de una película interesante, con un solido argumento y un gran interpretación de su actor principal"

	input_ids = torch.tensor(tokenizer.encode(text)).unsqueeze(0)
	outputs = model(input_ids)
	output = outputs.logits.argmax(1)
	```

	Created by [Eduardo Muñoz/@edumunozsala](https://github.com/edumunozsala)