suggestion
Hello,
I have big data course, and I need to make a project for the course,
the project "MUST" include those techs: Spark, Spark SQL, SCALA language, streaming, kafka, storage on mongodb, web dashboard for insights, so I should pull the data and stream it to kafka in real time,
I have seen the amazon reviews data, and its really huge data which is very suitable for the project, do u recommend using the same data for sentiment analysis for this type of project?
or should I do sth else other than sentiment analysis...
yes you can stream the reviews into kafka, process them in realtime using spark and scala for sentiment analysis and store the results in mongodb. I have removed 'neutral' comments as those comments mostly didnt seem neutral to me but its better check urself. Be careful about the noise in the text. You may think of topic modeling as well.
thanks for your reply, actually I'm still at the beginning of the project, I'm trying to pull the data, then I will take into considerations your notes. my regards.