File size: 2,696 Bytes
8184369
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6fc6ad1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8184369
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
---
language: en
---

# SKEP-Roberta

## Introduction

SKEP (SKEP: Sentiment Knowledge Enhanced Pre-training for Sentiment Analysis) is proposed by Baidu in 2020,

SKEP propose Sentiment Knowledge Enhanced Pre-training for sentiment analysis. Sentiment masking and three sentiment pre-training objectives are designed to incorporate various types of knowledge for pre-training model.

More detail: https://aclanthology.org/2020.acl-main.374.pdf

## Released Model Info

|Model Name|Language|Model Structure|
|:---:|:---:|:---:|
|skep-roberta-large| English |Layer:24, Hidden:1024, Heads:24|

This released pytorch model is converted from the officially released PaddlePaddle SKEP model and 
a series of experiments have been conducted to check the accuracy of the conversion.

- Official PaddlePaddle SKEP repo: 
  1. https://github.com/PaddlePaddle/PaddleNLP/blob/develop/paddlenlp/transformers/skep
  2. https://github.com/baidu/Senta
- Pytorch Conversion repo: Not released yet


## How to use
```Python
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("Yaxin/roberta-large-ernie2-skep-en")
model = AutoModel.from_pretrained("Yaxin/roberta-large-ernie2-skep-en")
```

```
#!/usr/bin/env python
#encoding: utf-8
import torch
from transformers import RobertaTokenizer, RobertaForMaskedLM

tokenizer = RobertaTokenizer.from_pretrained('Yaxin/roberta-large-ernie2-skep-en')

input_tx = "<s> He like play with student, so he became a <mask> after graduation </s>"
# input_tx = "<s> He is a <mask> and likes to get along with his students </s>"

tokenized_text = tokenizer.tokenize(input_tx)
indexed_tokens = tokenizer.convert_tokens_to_ids(tokenized_text)

tokens_tensor = torch.tensor([indexed_tokens])
segments_tensors = torch.tensor([[0] * len(tokenized_text)])

model = RobertaForMaskedLM.from_pretrained('Yaxin/roberta-large-ernie2-skep-en')
model.eval()

with torch.no_grad():
    outputs = model(tokens_tensor, token_type_ids=segments_tensors)
    predictions = outputs[0]

predicted_index = [torch.argmax(predictions[0, i]).item() for i in range(0, (len(tokenized_text) - 1))]
predicted_token = [tokenizer.convert_ids_to_tokens([predicted_index[x]])[0] for x in
                   range(1, (len(tokenized_text) - 1))]

print('Predicted token is:', predicted_token)
```
## Citation

```bibtex
@article{tian2020skep,
  title={SKEP: Sentiment knowledge enhanced pre-training for sentiment analysis},
  author={Tian, Hao and Gao, Can and Xiao, Xinyan and Liu, Hao and He, Bolei and Wu, Hua and Wang, Haifeng and Wu, Feng},
  journal={arXiv preprint arXiv:2005.05635},
  year={2020}
}

```

```
reference:
https://github.com/nghuyong/ERNIE-Pytorch

```