java - Twitter sentiment analysis using Naive Bayes in apache spark

Question

I am trying to do a basic twitter sentiment analysis, by using apache spark.

The below page explains on Naive Bayes function used at apache spark which would be a candidate for the above problem. http://spark.apache.org/docs/1.0.0/mllib-naive-bayes.html

when you check at the java example, the training and test set are given as

JavaRDD<LabeledPoint> training = ... // training set
JavaRDD<LabeledPoint> test = ... // test set

I dont have any clue what datatype they are, but i can understand that they are some non english inputs.

I have a list of tweets say.

"I love my country."
"Great day at office."
"Google Chrome sucks!"

How do i use the naive bayes function to process the text ?

any insights on this would be helpful.

score 2 · Accepted Answer

LabeledPoint is of the format (double, Vectors(double[])) where first parameter is label and second is a Vector of features (only non-negative real values). But for your case it does not match. Which means you have to find a way to convert your data to real values. TFIDF seems to be one way. You might be interested to read this example for better understanding.

java - Twitter sentiment analysis using Naive Bayes in apache spark

1 回答 1

Related

Reference