English
Recommendation
File size: 22,723 Bytes
a8e69a1
6139274
 
ff709d3
a8e69a1
6139274
56ad836
6139274
 
 
 
a8e69a1
6139274
 
 
 
 
 
 
 
6bf62be
 
 
6139274
 
 
a867c30
6139274
 
 
 
1cfff10
 
6139274
1cfff10
6139274
1cfff10
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6139274
 
 
 
 
 
 
 
 
b60ceaf
 
 
 
56ad836
 
 
24a558c
56ad836
b60ceaf
 
 
 
56ad836
e23c933
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
064fa95
b60ceaf
 
d665fc0
b0f2819
aeb84d9
b0f2819
c8e6ab0
b0f2819
b60ceaf
 
e23c933
 
 
 
 
 
 
 
 
 
 
b60ceaf
 
 
aeb84d9
 
 
 
e23c933
aeb84d9
 
b60ceaf
 
e23c933
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ff709d3
e23c933
ff709d3
2ade94d
f7d7f64
 
4a35c54
 
c69917a
6139274
a867c30
6139274
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3b53656
6139274
1179992
c939647
be9bccf
7711015
 
 
 
 
 
 
 
 
 
 
ef3aca2
89f42a0
6139274
 
6bf62be
 
 
 
 
 
 
 
 
 
 
6139274
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
---
language: en
tags:
- Recommendation
license: apache-2.0
datasets:
- surprise
- numpy
- keras
- pandas
thumbnail: https://github.com/Marcosdib/S2Query/Classification_Architecture_model.png
---

![MCTIimg](https://antigo.mctic.gov.br/mctic/export/sites/institucional/institucional/entidadesVinculadas/conselhos/pag-old/RODAPE_MCTI.png)


# MCTI Text Classification Task (uncased) DRAFT

Disclaimer: The Brazilian Ministry of Science, Technology, and Innovation (MCTI) has partially supported this project.

The model [NLP MCTI Recommendation Multi](https://huggingface.co/spaces/unb-lamfo-nlp-mcti/nlp-mcti-lda-recommender) is part of the project [Research Financing Product Portfolio (FPP)](https://huggingface.co/unb-lamfo-nlp-mcti) focuses 
on the task of Recommendation and explores different machine learning strategies that provide suggestions of items that are likely to be handy for a particular individual. Several methods were faced against each other to compare the error estimatives. 
Using LDA model, a simulated dataset was created.

## According to the abstract,

XXXXX
["Using transfer learning to classify long unstructured texts with small amounts of labeled data"](https://www.scitepress.org/Link.aspx?doi=10.5220/0011527700003318).

## Model description

The surprise library provides 11 classifier models that try to predict the classification of training data based on several different collaborative-filtering techniques.
The models provided with a brief explanation in English are mentioned below, for more information please refer to the package [documentation](https://surprise.readthedocs.io/en/stable/prediction_algorithms_package.html).

random_pred.NormalPredictor: Algorithm predicting a random rating based on the distribution of the training set, which is assumed to be normal.

baseline_only.BaselineOnly: Algorithm predicting the baseline estimate for given user and item.

knns.KNNBasic: A basic collaborative filtering algorithm.

knns.KNNWithMeans: A basic collaborative filtering algorithm, taking into account the mean ratings of each user.

knns.KNNWithZScore: A basic collaborative filtering algorithm, taking into account the z-score normalization of each user.

knns.KNNBaseline: A basic collaborative filtering algorithm taking into account a baseline rating.

matrix_factorization.SVD: The famous SVD algorithm, as popularized by Simon Funk during the Netflix Prize.

matrix_factorization.SVDpp: The SVD++ algorithm, an extension of SVD taking into account implicit ratings.

matrix_factorization.NMF: A collaborative filtering algorithm based on Non-negative Matrix Factorization.

slope_one.SlopeOne: A simple yet accurate collaborative filtering algorithm.

co_clustering.CoClustering: A collaborative filtering algorithm based on co-clustering.

Every model was used and evaluated. When faced with each other different methods presented different error estimatives. 

## Intended uses
You can use the raw model for either masked language modeling or next sentence prediction, but it's mostly intended to
be fine-tuned on a downstream task. See the [model hub](https://www.google.com) to look for
fine-tuned versions of a task that interests you.
Note that this model is primarily aimed at being fine-tuned on tasks that use the whole sentence (potentially masked)
to make decisions, such as sequence classification, token classification or question answering. For tasks such as text
generation you should look at model like XXX.
### How to use
The datasets for collaborative filtering must be:
        - The dataframe containing the ratings. 
        - It must have three columns, corresponding to the user (raw) ids, 
          the item (raw) ids, and the ratings, in this order.  
```python
>>> import pandas as pd
>>> import numpy as np

class Data:
````
The databases (ml_100k, ml_1m and jester) are built-in the surprise package for
        collaborative-filtering
```python
  def_init_(self):
    self.available_databases=['ml_100k', 'ml_1m','jester', 'lda_topics', 'lda_rankings', 'uniform']
   def show_available_databases(self):
        print('The avaliable database are:')
        for i,database in enumerate(self.available_databases):
            print(str(i)+': '+database)            
        
    def read_data(self,database_name):
        self.database_name=database_name
        self.the_data_reader= getattr(self, 'read_'+database_name.lower())
        self.the_data_reader()   

    def read_ml_100k(self):

        from surprise import Dataset
        data = Dataset.load_builtin('ml-100k')
        self.df = pd.DataFrame(data.__dict__['raw_ratings'], columns=['user_id','item_id','rating','timestamp'])
        self.df.drop(columns=['timestamp'],inplace=True)
        self.df.rename({'user_id':'userID','item_id':'itemID'},axis=1,inplace=True)

    def read_ml_1m(self):

        from surprise import Dataset
        data = Dataset.load_builtin('ml-1m')
        self.df = pd.DataFrame(data.__dict__['raw_ratings'], columns=['user_id','item_id','rating','timestamp'])
        self.df.drop(columns=['timestamp'],inplace=True)
        self.df.rename({'user_id':'userID','item_id':'itemID'},axis=1,inplace=True)

    def read_jester(self):

        from surprise import Dataset
        data = Dataset.load_builtin('jester')
        self.df = pd.DataFrame(data.__dict__['raw_ratings'], columns=['user_id','item_id','rating','timestamp'])
        self.df.drop(columns=['timestamp'],inplace=True)
        self.df.rename({'user_id':'userID','item_id':'itemID'},axis=1,inplace=True)
```

Hyperparameters -

  `n_users` : number of simulated users in the database;
  
  `n_ratings` : number of simulated rating events in the database.
        
This is a fictional dataset based in the choice of an uniformly distributed random rating(from 1 to 5) for one of the simulated users of the recommender-system that is being designed in this research project.
```python

        
    def read_uniform(self):

         n_users = 20
        n_ratings = 10000
        
        import random
        
        opo = pd.read_csv('../oportunidades.csv')
        df = [(random.randrange(n_users), random.randrange(len(opo)), random.randrange(1,5)) for i in range(n_ratings)]
        self.df = pd.DataFrame(df, columns = ['userID', 'itemID', 'rating'])
```

Hyperparameters -

  n_users` : number of simulated users in the database;
        
  n_ratings` : number of simulated rating events in the database.
        
This first LDA based dataset builds a model with K = `n_users` topics. LDA topics are used as proxies for simulated users with different clusters of interest. At first a random opportunity is chosen, than the amount of a randomly chosen topic inside the description is multiplied by five. The ceiling operation of this result is the rating that the fictional user will give to that opportunity. Because the amount of each topic predicted by the model is disollved among various topics, it is very rare to find an opportunity that has a higher LDA value. The consequence is that this dataset has really low volatility and the major part of ratings are equal to 1.

```python

    def read_lda_topics(self):

        n_users = 20
        n_ratings = 10000
        
        import gensim
        import random
        import math
        
        opo = pd.read_csv('../oportunidades_results.csv')
        # opo = opo.iloc[np.where(opo['opo_brazil']=='Y')]
        
        try:
            lda_model = gensim.models.ldamodel.LdaModel.load(f'models/lda_model{n_users}.model')
        except:
            import generate_users
            generate_users.gen_model(n_users)
            lda_model = gensim.models.ldamodel.LdaModel.load(f'models/lda_model{n_users}.model')

        df = []
        for i in range(n_ratings):
            opo_n = random.randrange(len(opo))
            txt = opo.loc[opo_n,'opo_texto']
            opo_bow = lda_model.id2word.doc2bow(txt.split())
            topics = lda_model.get_document_topics(opo_bow)
            topics = {topic[0]:topic[1] for topic in topics}
            user = random.sample(topics.keys(), 1)[0]
            rating = math.ceil(topics[user]*5)
            df.append((user, opo_n, rating))

        self.df = pd.DataFrame(df, columns = ['userID', 'itemID', 'rating'])
        
    def read_lda_rankings(self):

        n_users = 9
        n_ratings = 1000
        
        import gensim
        import random
        import math
        import tqdm
        
        opo = pd.read_csv('../oportunidades.csv')
        opo = opo.iloc[np.where(opo['opo_brazil']=='Y')]
        opo.index = range(len(opo))
        
        path = f'models/output_linkedin_cle_lda_model_{n_users}_topics_symmetric_alpha_auto_beta'
        lda_model = gensim.models.ldamodel.LdaModel.load(path)
        
        df = []
        
        pbar = tqdm.tqdm(total= n_ratings)
        for i in range(n_ratings):
            opo_n = random.randrange(len(opo))
            txt = opo.loc[opo_n,'opo_texto']
            opo_bow = lda_model.id2word.doc2bow(txt.split())
            topics = lda_model.get_document_topics(opo_bow)
            topics = {topic[0]:topic[1] for topic in topics}

            prop = pd.DataFrame([topics], index=['prop']).T.sort_values('prop', ascending=True)
            prop['rating'] = range(1, len(prop)+1)
            prop['rating'] = prop['rating']/len(prop)
            prop['rating'] = prop['rating'].apply(lambda x: math.ceil(x*5))
            prop.reset_index(inplace=True)

            prop = prop.sample(1)

            df.append((prop['index'].values[0], opo_n, prop['rating'].values[0]))
            pbar.update(1)

        pbar.close() 
        self.df = pd.DataFrame(df, columns = ['userID', 'itemID', 'rating'])
```

### Limitations and bias
In this model we have faced some obstacles that we had overcome, but some of those, by the nature of the project, couldn't be totally solved.
Due the fact that our dataset was build it by ourselves, there was no interaction yet between a user and the dataset, therefore we don't have 
realistic ratings which made us have to generate a simulation, making the results less believable.
Also in this part of the project, we have used a database of scrappings of linkedin profiles. 
The problem is that the profiles that linkedin shows is biased, so the profiles that appears was geographically closed, or related to the users organization and email.

## Training data
To train the LDA model, we use a database of linkedin profiles
## Training procedure
### Preprocessing
Pre-processing was used to standardize the texts for the English language, reduce the number of insignificant tokens and
optimize the training of the models.
The following assumptions were considered:
- The Data Entry base is obtained from the result of goal 4.
- Labeling (Goal 4) is considered true for accuracy measurement purposes;
- Preprocessing experiments compare accuracy in a shallow neural network (SNN);
- Pre-processing was investigated for the classification goal.
From the Database obtained in Meta 4, stored in the project's [GitHub](https://github.com/mcti-sefip/mcti-sefip-ppfcd2020/blob/scraps-desenvolvimento/Rotulagem/db_PPF_validacao_para%20UNB_%20FINAL.xlsx), a Notebook was developed in [Google Colab](https://colab.research.google.com) 
to implement the [pre-processing code](https://github.com/mcti-sefip/mcti-sefip-ppfcd2020/blob/pre-processamento/Pre_Processamento/MCTI_PPF_Pr%C3%A9_processamento.ipynb), which also can be found on the project's GitHub.
Several Python packages were used to develop the preprocessing code:
#### Table 3: Python packages used
|                         Objective                      |   Package    |
|--------------------------------------------------------|--------------|
| Resolve contractions and slang usage in text           | [contractions](https://pypi.org/project/contractions) |
| Natural Language Processing                            | [nltk](https://pypi.org/project/nltk)         |
| Others data manipulations and calculations included in Python 3.10: io, json, math, re (regular expressions), shutil, time, unicodedata;    | [numpy](https://pypi.org/project/numpy)        |
| Data manipulation and analysis                         | [pandas](https://pypi.org/project/pandas)       |
| http library                                           | [requests](https://pypi.org/project/requests)     |
| Training model                                         | [scikit-learn](https://pypi.org/project/scikit-learn) |
| Machine learning                                       | [tensorflow](https://pypi.org/project/tensorflow)   |
| Machine learning                                       | [keras](https://keras.io/)        |
| Translation from multiple languages to English         | [translators](https://pypi.org/project/translators)  |
As detailed in the notebook on [GitHub](https://github.com/mcti-sefip/mcti-sefip-ppfcd2020/blob/pre-processamento/Pre_Processamento/MCTI_PPF_Pr%C3%A9_processamento), in the pre-processing, code was created to build and evaluate 8 (eight) different 
bases, derived from the base of goal 4, with the application of the methods shown in Figure 2.
#### Table 4: Preprocessing methods evaluated
|  id    |                   Experiments                                          |
|--------|------------------------------------------------------------------------|
| Base   | Original Texts                                                         |
| xp1    | Expand Contractions                                                    |
| xp2    | Expand Contractions + Convert text to lowercase                        |
| xp3    | Expand Contractions + Remove Punctuation                               |
| xp4    | Expand Contractions + Remove Punctuation + Convert text to lowercase   |
| xp5    | xp4 + Stemming                                                         |
| xp6    | xp4 + Lemmatization                                                    |
| xp7    | xp4 + Stemming + Stopwords Removal                                     |
| xp8    | ap4 + Lemmatization + Stopwords Removal                                |
First, the treatment of punctuation and  capitalization was evaluated. This phase  resulted in the construction and 
evaluation of the first four bases (xp1, xp2, xp3, xp4).
Then, the content simplification was evaluated, from the xp4 base, considering stemming (xp5),  stemming (xp6), 
stemming + stopwords removal (xp7), and stemming + stopwords removal (xp8).
All eight bases were evaluated to classify the  eligibility of the opportunity, through the  training of a shallow 
neural network  (SNN – Shallow Neural Network).  The metrics for the eight bases were evaluated. The results are 
shown in Table 5.
#### Table 5: Results obtained in Preprocessing
|  id    |                   Experiment                                           | acurácia | f1-score | recall | precision | Média(s) | N_tokens | max_lenght |
|--------|------------------------------------------------------------------------|----------|----------|--------|-----------|----------|----------|------------|
| Base   | Original Texts                                                         |  89,78%  |  84,20%  | 79,09% |   90,95%  |  417,772 |   23788  |   5636     |
| xp1    | Expand Contractions                                                    |  88,71%  |  81,59%  | 71,54% |   97,33%  |  414,715 |   23768  |   5636     |
| xp2    | Expand Contractions + Convert text to lowercase                        |  90,32%  |  85,64%  | 77,19% |   97,44%  |  368,375 |   20322  |   5629     |
| xp3    | Expand Contractions + Remove Punctuation                               |  91,94%  |  87,73%  | 79,66% |   98,72%  |  386,650 |   22121  |   4950     |
| xp4    | Expand Contractions + Remove Punctuation + Convert text to lowercase   |  90,86%  |  86,61%  | 80,85% |   94,25%  |  326,830 |   18616  |   4950     |
| xp5    | xp4 + Stemming                                                         |  91,94%  |  87,68%  | 78,47% |  100,00%  |  257,960 |   14319  |   4950     |
| xp6    | xp4 + Lemmatization                                                    |  89,78%  |  85,06%  | 79,66% |   91,87%  |  282,645 |   16194  |   4950     |
| xp7    | xp4 + Stemming + Stopwords Removal                                     |  92,47%  |  88,46%  | 79,66% |  100,00%  |  210,320 |   14212  |   2817     |
| xp8    | ap4 + Lemmatization + Stopwords Removal                                |  92,47%  |  88,46%  | 79,66% |  100,00%  |  225,580 |   16081  |   2726     |
Even so, between these two excellent options, one can judge which one to choose. XP7: It has less training time, 
less number of unique tokens. XP8: It has smaller maximum sizes. In this case, the criterion used for the choice 
was the computational cost required to train the vector representation models (word-embedding, sentence-embeddings, 
document-embedding). The training time is so close that it did not have such a large weight for the analysis.
As a last step, a spreadsheet was generated for the model (xp8) with the fields opo_pre and opo_pre_tkn, containing the preprocessed text in sentence format and tokens, respectively. This [database](https://github.com/mcti-sefip/mcti-sefip-ppfcd2020/blob/pre-processamento/Pre_Processamento/oportunidades_final_pre_processado.xlsx) was made 
available on the project's GitHub with the inclusion of columns opo_pre (text) and opo_pre_tkn (tokenized). 
### Pretraining
The model was trained on 4 cloud TPUs in Pod configuration (16 TPU chips total) for one million steps with a batch size
of 256. The sequence length was limited to 128 tokens for 90% of the steps and 512 for the remaining 10%. The optimizer
used is Adam with a learning rate of 1e-4, \\(\beta_{1} = 0.9\\) and \\(\beta_{2} = 0.999\\), a weight decay of 0.01,
learning rate warmup for 10,000 steps and linear decay of the learning rate after.
## Evaluation results
### Model training with Word2Vec embeddings
Now we have a pre-trained model of word2vec embeddings that has already learned relevant meaningsfor our classification problem.
We can couple it to our classification models (Fig. 4), realizing transferlearning and then training the model with the labeled
data in a supervised manner. The new coupled model can be seen in Figure 5 under word2vec model training. The Table 3 shows the
obtained results with related metrics. With this implementation, we achieved new levels of accuracy with 86% for the CNN
architecture and 88% for the LSTM architecture.
#### Table 6: Results from Pre-trained WE + ML models
| ML Model |  Accuracy | F1 Score  | Precision |   Recall  |
|:--------:|:---------:|:---------:|:---------:|:---------:|
| NN       |  0.8269   |  0.8545   |  0.8392   |  0.8712   |
| DNN      |  0.7115   |  0.7794   |  0.7255   |  0.8485   |
| CNN      |  0.8654   |  0.9083   |  0.8486   |  0.9773   |
| LSTM     |  0.8846   |  0.9139   |  0.9056   |  0.9318   |
### Transformer-based implementation
Another way we used pre-trained vector representations was by use of a Longformer (Beltagy et al., 2020). We chose it because
of the limitation of the first generation of transformers and BERT-based architectures involving the size of the sentences:
the maximum of 512 tokens. The reason behind that limitation is that the self-attention mechanism scale quadratically with the
input sequence length O(n2) (Beltagy et al., 2020). The Longformer allowed the processing sequences of a thousand characters
without facing the memory bottleneck of BERT-like architectures and achieved SOTA in several benchmarks.
For our text length distribution in Figure 3, if we used a Bert-based architecture with a maximum length of 512, 99 sentences
would have to be truncated and probably miss some critical information. By comparison, with the Longformer, with a maximum 
length of 4096, only eight sentences will have their information shortened.
To apply the Longformer, we used the pre-trained base (available on the link) that was previously trained with a combination
of vast datasets as input to the model, as shown in figure 5 under Longformer model training. After coupling to our classification
models, we realized supervised training of the whole model. At this point, only transfer learning was applied since more 
computational power was needed to realize the fine-tuning of the weights. The results with related metrics can be viewed in table 4.
This approach achieved adequate accuracy scores, above 82% in all implementation architectures.
#### Table 7: Results from Pre-trained Longformer + ML models
| ML Model |  Accuracy | F1 Score  | Precision |   Recall  |
|:--------:|:---------:|:---------:|:---------:|:---------:|
| NN       |  0.8269   |  0.8754   |0.7950     |  0.9773   |
| DNN      |  0.8462   |  0.8776   |0.8474     |  0.9123   |
| CNN      |  0.8462   |  0.8776   |0.8474     |  0.9123   |
| LSTM     |  0.8269   |  0.8801   |0.8571     |  0.9091   | 
## Checkpoints
- Examples
- Implementation Notes
- Usage Example
- >>>
- >>> ...
## Config
## Tokenizer

## Benchmarks

|                 |  RMSE     | MSE       | MAE       |   FCP     |
|-----------------|-----------|-----------|-----------|-----------|
| NormalPredictor |  1.820737 |	3.315084  | 1.475522  |	0.514134  |
| BaselineOnly    |  1.072843 | 1.150992  | 0.890233  | 0.556560  |
| KNNBasic        |  1.232248 |	1.518436  |	0.936799  | 0.648604  |
| KNNWithMeans    |  1.124166 |	1.263750  |	0.808329  |	0.597148  |
| KNNWithZScore   |  1.056550 |	1.116299  |	0.750004  |	0.669651  |
| KNNBaseline     |  1.134660 |	1.287454  |	0.825161  |	0.614270  |
| SVD             |  0.977468 |	0.955444  |	0.757485  |	0.723829  |
| SVDpp           |  0.843065 |	0.710758  |	0.670516  |	0.671737  |
| NMF             |  1.122684 |	1.260420  |	0.722101  |	0.688728  |
| SlopeOne        |  1.073552 |	1.152514  |	0.747142  |	0.651937  |
| CoClustering    |  1.293383 |	1.672838  |	1.007951  |	0.494174  |


### BibTeX entry and citation info
```bibtex
@article{recommend22,
author       ={Jo\~{a}o Gabriel de Moraes Souza. and Daniel Oliveira Cajueiro. and Johnathan de O. Milagres. and Vin\´{i}cius de Oliveira Watanabe. and V\´{i}tor Bandeira Borges. and Victor Rafael Celestino.},
title        ={A comprehensive review of recommendation systems: method, data, evaluation and coding},
booktitle    ={xxxx},
year         ={xxxx},
pages        ={xxxx},
publisher    ={xxxx},
organization ={xxxx},
doi          ={xxxx},
isbn         ={xxxx},
issn         ={xxxx},
}
```
<a href="https://huggingface.co/exbert/?model=bert-base-uncased">
	<img width="300px" src="https://cdn-media.huggingface.co/exbert/button.png">
</a>