Update README.md
Browse files
README.md
CHANGED
@@ -10,6 +10,106 @@ language:
|
|
10 |
- en
|
11 |
metrics:
|
12 |
- accuracy
|
|
|
|
|
13 |
library_name: transformers
|
14 |
pipeline_tag: text-classification
|
15 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
10 |
- en
|
11 |
metrics:
|
12 |
- accuracy
|
13 |
+
- sparse_val accuracy
|
14 |
+
- sparse_val categorical accuracy
|
15 |
library_name: transformers
|
16 |
pipeline_tag: text-classification
|
17 |
+
tags:
|
18 |
+
- textclassisification
|
19 |
+
- roberta
|
20 |
+
- robertabase
|
21 |
+
- sentimentanalysis
|
22 |
+
- nlp
|
23 |
+
- tweetanalysis
|
24 |
+
- tweet
|
25 |
+
- analysis
|
26 |
+
- sentiment
|
27 |
+
- positive
|
28 |
+
- newsanalysis
|
29 |
+
---
|
30 |
+
|
31 |
+
---
|
32 |
+
<b>BYRD'S I - ROBERTA BASED TWEET/REVIEW/TEXT ANALYSIS</b>
|
33 |
+
---
|
34 |
+
|
35 |
+
This is ro<b>BERT</b>a-base model fine tuned on 8 datasets with ~20 M tweets this model is suitable for english while can do a fine job on other languages.
|
36 |
+
|
37 |
+
<b>Git Repo:</b><a href = "https://github.com/Caffeine-Coders/Sentiment-Analysis-Project"> SENTIMENTANALYSIS-PROJECT</a>
|
38 |
+
|
39 |
+
<b>Demo:</b><a href = "https://byrdi.netlify.app/"> BYRD'S I</a>
|
40 |
+
|
41 |
+
<b>labels: </b>
|
42 |
+
0 -> Negative;
|
43 |
+
1 -> Neutral;
|
44 |
+
2 -> Positive;
|
45 |
+
|
46 |
+
<b>Model Metrics</b><br/>
|
47 |
+
<b>Accuracy: </b> ~96% <br/>
|
48 |
+
<b>Sparse Categorical Accuracy: </b> 0.9597 <br/>
|
49 |
+
<b>Loss: </b> 0.1144 <br/>
|
50 |
+
<b>val_loss -- [onLast_train] : </b> 0.1482 <br/>
|
51 |
+
<b>Note: </b>
|
52 |
+
Due to dataset discrepencies of Neutral data we published another model <a href = "https://huggingface.co/AK776161/birdseye_roberta-base-18">
|
53 |
+
Byrd's I only positive_negative model</a> to find only neutral data and have used
|
54 |
+
<b>AdaBoot</b> method to get the accurate output.
|
55 |
+
# Example of Classification:
|
56 |
+
```python
|
57 |
+
from transformers import AutoTokenizer, AutoModelForSequenceClassification, AutoModelForSeq2SeqLM
|
58 |
+
from transformers import TFAutoModelForSequenceClassification
|
59 |
+
import pandas as pd
|
60 |
+
import numpy as np
|
61 |
+
import tensorflow
|
62 |
+
|
63 |
+
# model 0
|
64 |
+
tokenizer = AutoTokenizer.from_pretrained("AK776161/birdseye_roberta-base-18", use_fast = True)
|
65 |
+
model = AutoModelForSequenceClassification.from_pretrained("AK776161/birdseye_roberta-base-18", from_tf=True)
|
66 |
+
# model1
|
67 |
+
tokenizer1 = AutoTokenizer.from_pretrained("AK776161/birdseye_roberta-base-tweet-eval", use_fast = True)
|
68 |
+
model1 = AutoModelForSequenceClassification.from_pretrained("AK776161/birdseye_roberta-base-tweet-eval",from_tf=True)
|
69 |
+
|
70 |
+
#-----------------------Adaboot technique---------------------------
|
71 |
+
def nparraymeancalc(arr1, arr2):
|
72 |
+
returner = []
|
73 |
+
for i in range(0,len(arr1)):
|
74 |
+
if(arr1[i][1] < -7):
|
75 |
+
arr1[i][1] = 0
|
76 |
+
returner.append(np.mean([arr1[i],arr2[i]], axis = 0))
|
77 |
+
|
78 |
+
return np.array(returner)
|
79 |
+
|
80 |
+
def predictions(tokenizedtext):
|
81 |
+
output1 = model(**tokenizedtext)
|
82 |
+
output2 = model1(**tokenizedtext)
|
83 |
+
|
84 |
+
logits1 = output1.logits
|
85 |
+
logits1 = logits1.detach().numpy()
|
86 |
+
|
87 |
+
logits2 = output2.logits
|
88 |
+
logits2 = logits2.detach().numpy()
|
89 |
+
|
90 |
+
# print(logits1, logits2)
|
91 |
+
predictionresult = nparraymeancalc(logits1,logits2)
|
92 |
+
|
93 |
+
return np.array(predictionresult)
|
94 |
+
|
95 |
+
def labelassign(predictionresult):
|
96 |
+
labels = []
|
97 |
+
for i in predictionresult:
|
98 |
+
label_id = i.argmax()
|
99 |
+
labels.append(label_id)
|
100 |
+
return labels
|
101 |
+
|
102 |
+
tokenizeddata = tokenizer("----YOUR_TEXT---", return_tensors = 'pt', padding = True, truncation = True)
|
103 |
+
result = predictions(tokenizeddata)
|
104 |
+
|
105 |
+
print(labelassign(result))
|
106 |
+
```
|
107 |
+
Output for "I LOVE YOU":
|
108 |
+
```
|
109 |
+
1) Positive: 0.994
|
110 |
+
2) Negative: 0.000
|
111 |
+
3) Neutral: 0.006
|
112 |
+
```
|
113 |
+
|
114 |
+
|
115 |
+
|