|
--- |
|
license: afl-3.0 |
|
datasets: |
|
- tweet_eval |
|
- sentiment140 |
|
- mteb/tweet_sentiment_extraction |
|
- yelp_review_full |
|
- amazon_polarity |
|
language: |
|
- en |
|
metrics: |
|
- accuracy |
|
- sparse_val accuracy |
|
- sparse_val categorical accuracy |
|
library_name: transformers |
|
pipeline_tag: text-classification |
|
tags: |
|
- textclassisification |
|
- roberta |
|
- robertabase |
|
- sentimentanalysis |
|
- nlp |
|
- tweetanalysis |
|
- tweet |
|
- analysis |
|
- sentiment |
|
- positive |
|
- newsanalysis |
|
--- |
|
|
|
--- |
|
<b>BYRD'S I - ROBERTA BASED TWEET/REVIEW/TEXT ANALYSIS</b> |
|
--- |
|
|
|
This is ro<b>BERT</b>a-base model fine tuned on 8 datasets with ~20 M tweets this model is suitable for english while can do a fine job on other languages. |
|
|
|
<b>Git Repo:</b><a href = "https://github.com/Caffeine-Coders/Sentiment-Analysis-Project"> SENTIMENTANALYSIS-PROJECT</a> |
|
|
|
<b>Demo:</b><a href = "https://byrdi.netlify.app/"> BYRD'S I</a> |
|
|
|
<b>labels: </b> |
|
0 -> Negative; |
|
1 -> Neutral; |
|
2 -> Positive; |
|
|
|
<b>Model Metrics</b><br/> |
|
<b>Accuracy: </b> ~96% <br/> |
|
<b>Sparse Categorical Accuracy: </b> 0.9597 <br/> |
|
<b>Loss: </b> 0.1144 <br/> |
|
<b>val_loss -- [onLast_train] : </b> 0.1482 <br/> |
|
<b>Note: </b> |
|
Due to dataset discrepencies of Neutral data we published another model <a href = "https://huggingface.co/AK776161/birdseye_roberta-base-18"> |
|
Byrd's I only positive_negative model</a> to find only neutral data and have used |
|
<b>AdaBoot</b> method to get the accurate output. |
|
# Example of Classification: |
|
```python |
|
from transformers import AutoTokenizer, AutoModelForSequenceClassification, AutoModelForSeq2SeqLM |
|
from transformers import TFAutoModelForSequenceClassification |
|
import pandas as pd |
|
import numpy as np |
|
import tensorflow |
|
|
|
# model 0 |
|
tokenizer = AutoTokenizer.from_pretrained("AK776161/birdseye_roberta-base-18", use_fast = True) |
|
model = AutoModelForSequenceClassification.from_pretrained("AK776161/birdseye_roberta-base-18", from_tf=True) |
|
# model1 |
|
tokenizer1 = AutoTokenizer.from_pretrained("AK776161/birdseye_roberta-base-tweet-eval", use_fast = True) |
|
model1 = AutoModelForSequenceClassification.from_pretrained("AK776161/birdseye_roberta-base-tweet-eval",from_tf=True) |
|
|
|
#-----------------------Adaboot technique--------------------------- |
|
def nparraymeancalc(arr1, arr2): |
|
returner = [] |
|
for i in range(0,len(arr1)): |
|
if(arr1[i][1] < -7): |
|
arr1[i][1] = 0 |
|
returner.append(np.mean([arr1[i],arr2[i]], axis = 0)) |
|
|
|
return np.array(returner) |
|
|
|
def predictions(tokenizedtext): |
|
output1 = model(**tokenizedtext) |
|
output2 = model1(**tokenizedtext) |
|
|
|
logits1 = output1.logits |
|
logits1 = logits1.detach().numpy() |
|
|
|
logits2 = output2.logits |
|
logits2 = logits2.detach().numpy() |
|
|
|
# print(logits1, logits2) |
|
predictionresult = nparraymeancalc(logits1,logits2) |
|
|
|
return np.array(predictionresult) |
|
|
|
def labelassign(predictionresult): |
|
labels = [] |
|
for i in predictionresult: |
|
label_id = i.argmax() |
|
labels.append(label_id) |
|
return labels |
|
|
|
tokenizeddata = tokenizer("----YOUR_TEXT---", return_tensors = 'pt', padding = True, truncation = True) |
|
result = predictions(tokenizeddata) |
|
|
|
print(labelassign(result)) |
|
``` |
|
Output for "I LOVE YOU": |
|
``` |
|
1) Positive: 0.994 |
|
2) Negative: 0.000 |
|
3) Neutral: 0.006 |
|
``` |
|
|
|
|
|
|
|
|