English
Recommendation
File size: 22,928 Bytes
a8e69a1
6139274
 
ff709d3
a8e69a1
6139274
56ad836
6139274
 
 
 
a8e69a1
6139274
 
 
 
8485cfe
6139274
 
 
6bf62be
 
 
6139274
 
 
1f9a38a
 
 
6139274
 
 
1cfff10
 
6139274
1cfff10
6139274
1cfff10
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
723dff6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1cfff10
6139274
723dff6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6139274
 
 
 
 
 
 
 
b60ceaf
 
 
 
56ad836
 
 
24a558c
56ad836
b60ceaf
 
d7a1800
b60ceaf
56ad836
e23c933
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
064fa95
b60ceaf
 
d665fc0
b0f2819
aeb84d9
b0f2819
c8e6ab0
b0f2819
b60ceaf
 
e23c933
 
 
 
 
 
 
 
 
 
 
b60ceaf
 
 
aeb84d9
 
 
 
e23c933
aeb84d9
 
b60ceaf
 
e23c933
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ff709d3
e23c933
ff709d3
8485cfe
2ade94d
8485cfe
 
e6fb94c
8485cfe
 
 
4a35c54
c69917a
6139274
d7a1800
6139274
3d51a0e
6139274
3d51a0e
6139274
3d51a0e
16b786c
3d51a0e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ed0f76a
3d51a0e
ed0f76a
3d51a0e
ed0f76a
3d51a0e
ed0f76a
3d51a0e
ed0f76a
3d51a0e
ed0f76a
3d51a0e
ed0f76a
3d51a0e
ed0f76a
3d51a0e
ed0f76a
3d51a0e
ed0f76a
3d51a0e
ed0f76a
3d51a0e
ed0f76a
3d51a0e
ed0f76a
3d51a0e
ed0f76a
3d51a0e
ed0f76a
3d51a0e
ed0f76a
3d51a0e
ed0f76a
3d51a0e
ed0f76a
3d51a0e
ed0f76a
3d51a0e
ed0f76a
3d51a0e
ed0f76a
3d51a0e
ed0f76a
3d51a0e
ed0f76a
3d51a0e
ed0f76a
3d51a0e
ed0f76a
3d51a0e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
16b786c
 
 
d7a1800
bd8c8f4
16b786c
 
 
 
 
 
 
 
 
 
 
bd8c8f4
 
16b786c
 
 
 
 
 
 
 
 
bd8c8f4
16b786c
d7a1800
a804d84
bd8c8f4
16b786c
 
 
 
 
 
5e73f5f
d7a1800
e26b2ec
5e73f5f
16b786c
 
 
 
 
 
 
 
 
e26b2ec
d7a1800
e26b2ec
 
16b786c
e26b2ec
 
d7a1800
e26b2ec
 
16b786c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
a804d84
 
d7a1800
16b786c
a804d84
16b786c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
a804d84
 
 
6139274
1179992
46029bd
 
 
 
 
c939647
be9bccf
7711015
 
 
 
 
 
 
 
 
 
 
ef3aca2
46029bd
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
89f42a0
6139274
 
e8567b2
6bf62be
 
6139274
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
---
language: en
tags:
- Recommendation
license: apache-2.0
datasets:
- surprise
- numpy
- keras
- pandas
thumbnail: https://github.com/Marcosdib/S2Query/Classification_Architecture_model.png
---

![MCTIimg](https://antigo.mctic.gov.br/mctic/export/sites/institucional/institucional/entidadesVinculadas/conselhos/pag-old/RODAPE_MCTI.png)


# MCTI Recommendation Task (uncased) DRAFT

Disclaimer: The Brazilian Ministry of Science, Technology, and Innovation (MCTI) has partially supported this project.

The model [NLP MCTI Recommendation Multi](https://huggingface.co/spaces/unb-lamfo-nlp-mcti/nlp-mcti-lda-recommender) is part of the project [Research Financing Product Portfolio (FPP)](https://huggingface.co/unb-lamfo-nlp-mcti) focuses 
on the task of Recommendation and explores different machine learning strategies that provide suggestions of items that are likely to be handy for a particular individual. Several methods were faced against each other to compare the error estimatives. 
Using LDA model, a simulated dataset was created.

## According to the abstract,

Current model card disposes model's description and it's classes. Also, inteded uses are described along with a "how to use" section, exposing necessary conditions for the data used.
Further in the card, data and it's limitation and bias were discussed. Tables along the page supports the information and tests that were made.
How the recommendation is made, datasets used and the benchmarks generated are all set all over the model card.  

## Model description

The surprise library provides 11 classifier models that try to predict the classification of training data based on several different collaborative-filtering techniques.
The models provided with a brief explanation in English are mentioned below, for more information please refer to the package [documentation](https://surprise.readthedocs.io/en/stable/prediction_algorithms_package.html).

random_pred.NormalPredictor: Algorithm predicting a random rating based on the distribution of the training set, which is assumed to be normal.

baseline_only.BaselineOnly: Algorithm predicting the baseline estimate for given user and item.

knns.KNNBasic: A basic collaborative filtering algorithm.

knns.KNNWithMeans: A basic collaborative filtering algorithm, taking into account the mean ratings of each user.

knns.KNNWithZScore: A basic collaborative filtering algorithm, taking into account the z-score normalization of each user.

knns.KNNBaseline: A basic collaborative filtering algorithm taking into account a baseline rating.

matrix_factorization.SVD: The famous SVD algorithm, as popularized by Simon Funk during the Netflix Prize.

matrix_factorization.SVDpp: The SVD++ algorithm, an extension of SVD taking into account implicit ratings.

matrix_factorization.NMF: A collaborative filtering algorithm based on Non-negative Matrix Factorization.

slope_one.SlopeOne: A simple yet accurate collaborative filtering algorithm.

co_clustering.CoClustering: A collaborative filtering algorithm based on co-clustering.

It is possible to pass a custom dataframe as an argument to this class. The dataframe in question needs to have 3 columns with the following name: ['userID', 'itemID', 'rating'].

```python
class Method:
    def __init__(self,df):
        
        self.df=df
        self.available_methods=[
            'surprise.NormalPredictor',
            'surprise.BaselineOnly',
            'surprise.KNNBasic',
            'surprise.KNNWithMeans',
            'surprise.KNNWithZScore',
            'surprise.KNNBaseline',
            'surprise.SVD',
            'surprise.SVDpp',
            'surprise.NMF',
            'surprise.SlopeOne',
            'surprise.CoClustering',
        ]        
        
    def show_methods(self):
        print('The avaliable methods are:')
        for i,method in enumerate(self.available_methods):
            print(str(i)+': '+method)



    def run(self,the_method):
        self.the_method=the_method
        if(self.the_method[0:8]=='surprise'):
            self.run_surprise()
        elif(self.the_method[0:6]=='Gensim'):
            self.run_gensim()
        elif(self.the_method[0:13]=='Transformers-'):
            self.run_transformers()
        else:
            print('This method is not defined! Try another one.')

    def run_surprise(self):
        from surprise import Reader
        from surprise import Dataset
        from surprise.model_selection import train_test_split
        reader = Reader(rating_scale=(1, 5))
        data = Dataset.load_from_df(self.df[['userID', 'itemID', 'rating']], reader)        
        trainset, testset = train_test_split(data, test_size=.30)
        the_method=self.the_method.replace("surprise.", "")
        eval(f"exec('from surprise import {the_method}')")
        the_algorithm=locals()[the_method]()
        the_algorithm.fit(trainset)
        self.predictions=the_algorithm.test(testset)
        list_predictions=[(uid,iid,r_ui,est) for uid,iid,r_ui,est,_ in self.predictions]        
        self.predictions_df = pd.DataFrame(list_predictions, columns =['user_id', 'item_id', 'rating','predicted_rating'])
```
Every model was used and evaluated. When faced with each other different methods presented different error estimatives. 


The surprise library provides 4 different methods to assess the accuracy of the ratings prediction. Those are: rmse, mse, mae and fcp. For further discussion on each metric please visit the package documentation.

```python

class Evaluator:

    def __init__(self,predictions_df):

        self.available_evaluators=['surprise.rmse','surprise.mse',
                                   'surprise.mae','surprise.fcp']
        self.predictions_df=predictions_df
        
    def show_evaluators(self):
        print('The avaliable evaluators are:')
        for i,evaluator in enumerate(self.available_evaluators):
            print(str(i)+': '+evaluator)
        


    def run(self,the_evaluator):        
        self.the_evaluator=the_evaluator
        if(self.the_evaluator[0:8]=='surprise'):
            self.run_surprise()
        else:
            print('This evaluator is not available!')

    def run_surprise(self):
        import surprise
        from surprise import accuracy
        predictions=[surprise.prediction_algorithms.predictions.Prediction(row['user_id'],row['item_id'],row['rating'],row['predicted_rating'],{}) for index,row in self.predictions_df.iterrows()]
        self.predictions=predictions
        self.the_evaluator= 'accuracy.' + self.the_evaluator.replace("surprise.", "")
        self.acc = eval(f'{self.the_evaluator}(predictions,verbose=True)')
```
## Intended uses
You can use the raw model for either masked language modeling or next sentence prediction, but it's mostly intended to
be fine-tuned on a downstream task. See the [model hub](https://www.google.com) to look for
fine-tuned versions of a task that interests you.
Note that this model is primarily aimed at being fine-tuned on tasks that use the whole sentence (potentially masked)
to make decisions, such as sequence classification, token classification or question answering. For tasks such as text
generation you should look at model like XXX.
### How to use
The datasets for collaborative filtering must be:
        - The dataframe containing the ratings. 
        - It must have three columns, corresponding to the user (raw) ids, 
          the item (raw) ids, and the ratings, in this order.  
```python
>>> import pandas as pd
>>> import numpy as np

class Data:
````
The databases (ml_100k, ml_1m and jester) are built-in the surprise package for
        collaborative-filtering.
```python
  def_init_(self):
    self.available_databases=['ml_100k', 'ml_1m','jester', 'lda_topics', 'lda_rankings', 'uniform']
   def show_available_databases(self):
        print('The avaliable database are:')
        for i,database in enumerate(self.available_databases):
            print(str(i)+': '+database)            
        
    def read_data(self,database_name):
        self.database_name=database_name
        self.the_data_reader= getattr(self, 'read_'+database_name.lower())
        self.the_data_reader()   

    def read_ml_100k(self):

        from surprise import Dataset
        data = Dataset.load_builtin('ml-100k')
        self.df = pd.DataFrame(data.__dict__['raw_ratings'], columns=['user_id','item_id','rating','timestamp'])
        self.df.drop(columns=['timestamp'],inplace=True)
        self.df.rename({'user_id':'userID','item_id':'itemID'},axis=1,inplace=True)

    def read_ml_1m(self):

        from surprise import Dataset
        data = Dataset.load_builtin('ml-1m')
        self.df = pd.DataFrame(data.__dict__['raw_ratings'], columns=['user_id','item_id','rating','timestamp'])
        self.df.drop(columns=['timestamp'],inplace=True)
        self.df.rename({'user_id':'userID','item_id':'itemID'},axis=1,inplace=True)

    def read_jester(self):

        from surprise import Dataset
        data = Dataset.load_builtin('jester')
        self.df = pd.DataFrame(data.__dict__['raw_ratings'], columns=['user_id','item_id','rating','timestamp'])
        self.df.drop(columns=['timestamp'],inplace=True)
        self.df.rename({'user_id':'userID','item_id':'itemID'},axis=1,inplace=True)
```

Hyperparameters -

  `n_users` : number of simulated users in the database;
  
  `n_ratings` : number of simulated rating events in the database.
        
This is a fictional dataset based in the choice of an uniformly distributed random rating(from 1 to 5) for one of the simulated users of the recommender-system that is being designed in this research project.
```python

        
    def read_uniform(self):

         n_users = 20
        n_ratings = 10000
        
        import random
        
        opo = pd.read_csv('../oportunidades.csv')
        df = [(random.randrange(n_users), random.randrange(len(opo)), random.randrange(1,5)) for i in range(n_ratings)]
        self.df = pd.DataFrame(df, columns = ['userID', 'itemID', 'rating'])
```

Hyperparameters -

  n_users` : number of simulated users in the database;
        
  n_ratings` : number of simulated rating events in the database.
        
This first LDA based dataset builds a model with K = `n_users` topics. LDA topics are used as proxies for simulated users with different clusters of interest. At first a random opportunity is chosen, than the amount of a randomly chosen topic inside the description is multiplied by five. The ceiling operation of this result is the rating that the fictional user will give to that opportunity. Because the amount of each topic predicted by the model is disollved among various topics, it is very rare to find an opportunity that has a higher LDA value. The consequence is that this dataset has really low volatility and the major part of ratings are equal to 1.

```python

    def read_lda_topics(self):

        n_users = 20
        n_ratings = 10000
        
        import gensim
        import random
        import math
        
        opo = pd.read_csv('../oportunidades_results.csv')
        # opo = opo.iloc[np.where(opo['opo_brazil']=='Y')]
        
        try:
            lda_model = gensim.models.ldamodel.LdaModel.load(f'models/lda_model{n_users}.model')
        except:
            import generate_users
            generate_users.gen_model(n_users)
            lda_model = gensim.models.ldamodel.LdaModel.load(f'models/lda_model{n_users}.model')

        df = []
        for i in range(n_ratings):
            opo_n = random.randrange(len(opo))
            txt = opo.loc[opo_n,'opo_texto']
            opo_bow = lda_model.id2word.doc2bow(txt.split())
            topics = lda_model.get_document_topics(opo_bow)
            topics = {topic[0]:topic[1] for topic in topics}
            user = random.sample(topics.keys(), 1)[0]
            rating = math.ceil(topics[user]*5)
            df.append((user, opo_n, rating))

        self.df = pd.DataFrame(df, columns = ['userID', 'itemID', 'rating'])
        
    def read_lda_rankings(self):

        n_users = 9
        n_ratings = 1000
        
        import gensim
        import random
        import math
        import tqdm
        
        opo = pd.read_csv('../oportunidades.csv')
        opo = opo.iloc[np.where(opo['opo_brazil']=='Y')]
        opo.index = range(len(opo))
        
        path = f'models/output_linkedin_cle_lda_model_{n_users}_topics_symmetric_alpha_auto_beta'
        lda_model = gensim.models.ldamodel.LdaModel.load(path)
        
        df = []
        
        pbar = tqdm.tqdm(total= n_ratings)
        for i in range(n_ratings):
            opo_n = random.randrange(len(opo))
            txt = opo.loc[opo_n,'opo_texto']
            opo_bow = lda_model.id2word.doc2bow(txt.split())
            topics = lda_model.get_document_topics(opo_bow)
            topics = {topic[0]:topic[1] for topic in topics}

            prop = pd.DataFrame([topics], index=['prop']).T.sort_values('prop', ascending=True)
            prop['rating'] = range(1, len(prop)+1)
            prop['rating'] = prop['rating']/len(prop)
            prop['rating'] = prop['rating'].apply(lambda x: math.ceil(x*5))
            prop.reset_index(inplace=True)

            prop = prop.sample(1)

            df.append((prop['index'].values[0], opo_n, prop['rating'].values[0]))
            pbar.update(1)

        pbar.close() 
        self.df = pd.DataFrame(df, columns = ['userID', 'itemID', 'rating'])
```

### Limitations and bias

In this model we have faced some obstacles that we had overcome, but some of those, by the nature of the project, couldn't be totally solved.
Databases containing profiles of possible users of the planned prototype are not available. 
For this reason, it was necessary to carry out simulations in order to represent the interests of these users, so that the recommendation system could be modeled. 
A simulation of clusters of latent interests was realized, based on topics present in the texts describing financial products. Due the fact that the dataset was build it by ourselves, there was no interaction yet between a user and the dataset, therefore we don't have 
realistic ratings, making the results less believable.

Later on, we have used a database of scrappings of linkedin profiles. 
The problem is that the profiles that linkedin shows is biased, so the profiles that appears was geographically closed, or related to the users organization and email.

## Training data
To train the Latent Dirichlet allocation (LDA) model, it was used a database of a scrapping of Researchers profiles on Linkedin.
## Training procedure

## Evaluation results

## Checkpoints

- Example
```python

data=Data()
data.show_available_databases()
data.read_data('ml_100k')
method=Method(data.df)  
method.show_methods()
method.run('surprise.KNNWithMeans')
predictions_df=method.predictions_df
evaluator=Evaluator(predictions_df)
evaluator.show_evaluators()
evaluator.run('surprise.mse')
```
The avaliable database are:
0: ml_100k

1: ml_1m

2: jester

3: lda_topics

4: lda_rankings

5: uniform

The avaliable methods are:

0: surprise.NormalPredictor

1: surprise.BaselineOnly

2: surprise.KNNBasic

3: surprise.KNNWithMeans

4: surprise.KNNWithZScore

5: surprise.KNNBaseline

6: surprise.SVD

7: surprise.SVDpp

8: surprise.NMF

9: surprise.SlopeOne

10: surprise.CoClustering

Computing the msd similarity matrix...

Done computing similarity matrix.

The avaliable evaluators are:

0: surprise.rmse

1: surprise.mse

2: surprise.mae

3: surprise.fcp

MSE: 0.9146

    
 Next, we have the code that builds the table with the accuracy metrics for all rating prediction models built-in the surprise package. The expected return of this function is a pandas dataframe (11x4) corresponding to the 11 classifier models and 4 different accuracy metrics.
    
```python

def model_table(label):

    import tqdm
    
    table = pd.DataFrame()
    
    data=Data()
    data.read_data(label)
    
    method=Method(data.df)
    
    
    for m in method.available_methods:
        print(m)
        method.run(m)
        predictions_df=method.predictions_df
        evaluator=Evaluator(predictions_df)
        
        metrics = []
        
        for e in evaluator.available_evaluators:
            evaluator.run(e)
            metrics.append(evaluator.acc)
            
        table = table.append(dict(zip(evaluator.available_evaluators,metrics)),ignore_index=True)
        
    table.index = [x[9:] for x in method.available_methods]
    table.columns = [x[9:].upper() for x in evaluator.available_evaluators]
            
    return table


import sys, os

sys.stdout = open(os.devnull, 'w') # Codigo para desativar os prints

uniform = model_table('uniform')  
#topics = model_table('lda_topics')
ranking = model_table('lda_rankings')

sys.stdout = sys.__stdout__ # Codigo para reativar os prints

```

- Usage Example

In this section it will be explained how the recommendation is made for the user.

```python

import gradio as gr
import random
import pandas as pd

opo = pd.read_csv('oportunidades_results.csv', lineterminator='\n')
# opo = opo.iloc[np.where(opo['opo_brazil']=='Y')]
simulation = pd.read_csv('simulation2.csv')
userID = max(simulation['userID']) + 1

This function, creates the string that it will be displayed to the user on the app, showing  the opportunities title, link and the resume.

def build_display_text(opo_n):
    
    title = opo.loc[opo_n]['opo_titulo']
    link = opo.loc[opo_n]['link']
    summary = opo.loc[opo_n]['facebook-bart-large-cnn_results']

    display_text = f"**{title}**\n\nURL:\n{link}\n\nSUMMARY:\n{summary}"

    return display_text
```

Here it will be generate 4 random opportunities.

```python
opo_n_one = random.randrange(len(opo))
opo_n_two = random.randrange(len(opo))
opo_n_three = random.randrange(len(opo))
opo_n_four = random.randrange(len(opo))

evaluated = []
```
The next function, is the "predict_next", that accepts an option and a rating.

```python
def predict_next(option, nota):
    global userID
    global opo_n_one
    global opo_n_two
    global opo_n_three
    global opo_n_four
    global evaluated
    global opo
    global simulation
```
Here it will be taken the number, on our database, of the rated opportunity.
```python

    selected = [opo_n_one, opo_n_two, opo_n_three, opo_n_four][int(option)-1]
```

Here is created a new database called simulation, that takes the previous simulation then adds a new line with te ID of the user, the rated item and the rate.  integrates the selected opportunity.

```python
    simulation = simulation.append({'userID': userID, 'itemID': selected, 'rating': nota}, ignore_index=True)
    evaluated.append(selected)
    
    from surprise import Reader
    reader = Reader(rating_scale=(1, 5))

    from surprise import Dataset
    data = Dataset.load_from_df(simulation[['userID', 'itemID', 'rating']], reader)
    trainset = data.build_full_trainset()

    from surprise import SVDpp
    svdpp = SVDpp()
    svdpp.fit(trainset)

    items = list()
    est = list()

    for i in range(len(opo)):
        if i not in evaluated:
            items.append(i)
            est.append(svdpp.predict(userID, i).est)

    opo_n_one = items[est.index(sorted(est)[-1])]
    opo_n_two = items[est.index(sorted(est)[-2])]
    opo_n_three = items[est.index(sorted(est)[-3])]
    opo_n_four = items[est.index(sorted(est)[-4])]

    return build_display_text(opo_n_one), build_display_text(opo_n_two), build_display_text(opo_n_three), build_display_text(opo_n_four)
```

Here we have the interation of gradio, that allows the construction of the app.

```python

with gr.Blocks() as demo:
    with gr.Row():
        one_opo = gr.Textbox(build_display_text(opo_n_one), label='Oportunidade 1')
        two_opo = gr.Textbox(build_display_text(opo_n_two), label='Oportunidade 2')

    with gr.Row():
        three_opo = gr.Textbox(build_display_text(opo_n_three), label='Oportunidade 3')
        four_opo = gr.Textbox(build_display_text(opo_n_four), label='Oportunidade 4')

    with gr.Row():
        option = gr.Radio(['1', '2', '3', '4'], label='Opção', value = '1')

    with gr.Row():
        nota = gr.Slider(1,5,step=1,label="Nota 1")

    with gr.Row():
        confirm = gr.Button("Confirmar")

        confirm.click(fn=predict_next,
               inputs=[option, nota],
               outputs=[one_opo, two_opo, three_opo, four_opo])

if __name__ == "__main__":
    demo.launch()
```

## Benchmarks

```python

# LDA-GENERATED DATASET
ranking
```
|                 |  RMSE     | MSE       | MAE       |   FCP     |
|-----------------|-----------|-----------|-----------|-----------|
| NormalPredictor |  1.820737 |	3.315084  | 1.475522  |	0.514134  |
| BaselineOnly    |  1.072843 | 1.150992  | 0.890233  | 0.556560  |
| KNNBasic        |  1.232248 |	1.518436  |	0.936799  | 0.648604  |
| KNNWithMeans    |  1.124166 |	1.263750  |	0.808329  |	0.597148  |
| KNNWithZScore   |  1.056550 |	1.116299  |	0.750004  |	0.669651  |
| KNNBaseline     |  1.134660 |	1.287454  |	0.825161  |	0.614270  |
| SVD             |  0.977468 |	0.955444  |	0.757485  |	0.723829  |
| SVDpp           |  0.843065 |	0.710758  |	0.670516  |	0.671737  |
| NMF             |  1.122684 |	1.260420  |	0.722101  |	0.688728  |
| SlopeOne        |  1.073552 |	1.152514  |	0.747142  |	0.651937  |
| CoClustering    |  1.293383 |	1.672838  |	1.007951  |	0.494174  |

```python

# BENCHMARK DATASET
uniform
```
|                 |  RMSE     | MSE       | MAE       |   FCP     |
|-----------------|-----------|-----------|-----------|-----------|
| NormalPredictor |  1.508925 |	2.276854  | 1.226758  |	0.503723  |
| BaselineOnly    |  1.153331 | 1.330172  | 1.022732  | 0.506818  |
| KNNBasic        |  1.205058 |	1.452165  |	1.026591  | 0.501168  |
| KNNWithMeans    |  1.202024 |	1.444862  |	1.028149  |	0.503527  |
| KNNWithZScore   |  1.216041 |1.478756	  | 1.041070  |	0.501582  |
| KNNBaseline     |  1.225609 |	1.502117  | 1.048107  |	0.498198  |
| SVD             |  1.176273 |	1.383619  |	1.013285  |	0.502067  |
| SVDpp           |  1.192619 |	1.422340  |	1.018717  |	0.500909  |
| NMF             |  1.338216 |	1.790821  |	1.120604  |	0.492944  |
| SlopeOne        |  1.224219 |	1.498713  |	1.047170  |	0.494298  |
| CoClustering    |  1.223020 |	1.495778  |	1.033699  |	0.518509  |


### BibTeX entry and citation info
```bibtex
@unpublished{recommend22,
author       ={Jo\~{a}o Gabriel de Moraes Souza. and Daniel Oliveira Cajueiro. and Johnathan de O. Milagres. and Vin\´{i}cius de Oliveira Watanabe. and V\´{i}tor Bandeira Borges. and Victor Rafael Celestino.},
title        ={A comprehensive review of recommendation systems: method, data, evaluation and coding},
}
```
<a href="https://huggingface.co/exbert/?model=bert-base-uncased">
	<img width="300px" src="https://cdn-media.huggingface.co/exbert/button.png">
</a>