unb-lamfo-nlp-mcti
/

NLP-Recommendation-MCTI

Model card Files Files and versions Community

Jmilagres commited on Dec 16, 2022

Commit

8485cfe

•

1 Parent(s): ed0f76a

Update README.md

Files changed (1) hide show

README.md +11 -4

README.md CHANGED Viewed

@@ -14,7 +14,7 @@ thumbnail: https://github.com/Marcosdib/S2Query/Classification_Architecture_mode
 ![MCTIimg](https://antigo.mctic.gov.br/mctic/export/sites/institucional/institucional/entidadesVinculadas/conselhos/pag-old/RODAPE_MCTI.png)
-# MCTI Text Classification Task (uncased) DRAFT
 Disclaimer: The Brazilian Ministry of Science, Technology, and Innovation (MCTI) has partially supported this project.
@@ -221,10 +221,17 @@ This first LDA based dataset builds a model with K = `n_users` topics. LDA topic
 ```
 ### Limitations and bias
 In this model we have faced some obstacles that we had overcome, but some of those, by the nature of the project, couldn't be totally solved.
-Due the fact that our dataset was build it by ourselves, there was no interaction yet between a user and the dataset, therefore we don't have
-realistic ratings which made us have to generate a simulation, making the results less believable.
-Also in this part of the project, we have used a database of scrappings of linkedin profiles.
 The problem is that the profiles that linkedin shows is biased, so the profiles that appears was geographically closed, or related to the users organization and email.
 ## Training data

 ![MCTIimg](https://antigo.mctic.gov.br/mctic/export/sites/institucional/institucional/entidadesVinculadas/conselhos/pag-old/RODAPE_MCTI.png)
+# MCTI Recommendation Task (uncased) DRAFT
 Disclaimer: The Brazilian Ministry of Science, Technology, and Innovation (MCTI) has partially supported this project.
 ```
 ### Limitations and bias
 In this model we have faced some obstacles that we had overcome, but some of those, by the nature of the project, couldn't be totally solved.
+Databases containing profiles of possible users of the planned prototype are not available.
+For this reason, it was necessary to carry out simulations in order to represent the interests of these users, so that the recommendation system could be modeled.
+A simulation of clusters of latent interests was realized, based on topics present in the texts describing financial products.
+Due the fact that the dataset was build it by ourselves, there was no interaction yet between a user and the dataset, therefore we don't have
+realistic ratings, making the results less believable.
+Later on, we have used a database of scrappings of linkedin profiles.
 The problem is that the profiles that linkedin shows is biased, so the profiles that appears was geographically closed, or related to the users organization and email.
 ## Training data