README.md · AK776161/birdseye_roberta-base-tweet-eval at main

birdseye_roberta-base-tweet-eval / README.md

AK776161

Update README.md (#1)

7f26391 over 1 year ago

preview code

raw

history blame contribute delete

3.16 kB

	---
	license: afl-3.0
	datasets:
	- tweet_eval
	- sentiment140
	- mteb/tweet_sentiment_extraction
	- yelp_review_full
	- amazon_polarity
	language:
	- en
	metrics:
	- accuracy
	- sparse_val accuracy
	- sparse_val categorical accuracy
	library_name: transformers
	pipeline_tag: text-classification
	tags:
	- textclassisification
	- roberta
	- robertabase
	- sentimentanalysis
	- nlp
	- tweetanalysis
	- tweet
	- analysis
	- sentiment
	- positive
	- newsanalysis
	---

	---
	<b>BYRD'S I - ROBERTA BASED TWEET/REVIEW/TEXT ANALYSIS</b>
	---

	This is ro<b>BERT</b>a-base model fine tuned on 8 datasets with ~20 M tweets this model is suitable for english while can do a fine job on other languages.

	<b>Git Repo:</b><a href = "https://github.com/Caffeine-Coders/Sentiment-Analysis-Project"> SENTIMENTANALYSIS-PROJECT</a>

	<b>Demo:</b><a href = "https://byrdi.netlify.app/"> BYRD'S I</a>

	<b>labels: </b>
	0 -> Negative;
	1 -> Neutral;
	2 -> Positive;

	<b>Model Metrics</b><br/>
	<b>Accuracy: </b> ~96% <br/>
	<b>Sparse Categorical Accuracy: </b> 0.9597 <br/>
	<b>Loss: </b> 0.1144 <br/>
	<b>val_loss -- [onLast_train] : </b> 0.1482 <br/>
	<b>Note: </b>
	Due to dataset discrepencies of Neutral data we published another model <a href = "https://huggingface.co/AK776161/birdseye_roberta-base-18">
	Byrd's I only positive_negative model</a> to find only neutral data and have used
	<b>AdaBoot</b> method to get the accurate output.
	# Example of Classification:
	```python
	from transformers import AutoTokenizer, AutoModelForSequenceClassification, AutoModelForSeq2SeqLM
	from transformers import TFAutoModelForSequenceClassification
	import pandas as pd
	import numpy as np
	import tensorflow

	# model 0
	tokenizer = AutoTokenizer.from_pretrained("AK776161/birdseye_roberta-base-18", use_fast = True)
	model = AutoModelForSequenceClassification.from_pretrained("AK776161/birdseye_roberta-base-18", from_tf=True)
	# model1
	tokenizer1 = AutoTokenizer.from_pretrained("AK776161/birdseye_roberta-base-tweet-eval", use_fast = True)
	model1 = AutoModelForSequenceClassification.from_pretrained("AK776161/birdseye_roberta-base-tweet-eval",from_tf=True)

	#-----------------------Adaboot technique---------------------------
	def nparraymeancalc(arr1, arr2):
	returner = []
	for i in range(0,len(arr1)):
	if(arr1[i][1] < -7):
	arr1[i][1] = 0
	returner.append(np.mean([arr1[i],arr2[i]], axis = 0))

	return np.array(returner)

	def predictions(tokenizedtext):
	output1 = model(**tokenizedtext)
	output2 = model1(**tokenizedtext)

	logits1 = output1.logits
	logits1 = logits1.detach().numpy()

	logits2 = output2.logits
	logits2 = logits2.detach().numpy()

	# print(logits1, logits2)
	predictionresult = nparraymeancalc(logits1,logits2)

	return np.array(predictionresult)

	def labelassign(predictionresult):
	labels = []
	for i in predictionresult:
	label_id = i.argmax()
	labels.append(label_id)
	return labels

	tokenizeddata = tokenizer("----YOUR_TEXT---", return_tensors = 'pt', padding = True, truncation = True)
	result = predictions(tokenizeddata)

	print(labelassign(result))
	```
	Output for "I LOVE YOU":
	```
	1) Positive: 0.994
	2) Negative: 0.000
	3) Neutral: 0.006
	```