Wednesday 26 October 2016

Twitter Data Sentiment Analysis Using Pig

Pre-Requisites of Twitter Data + Pig + Sentiment Analysis Project:

hadoop-2.6.0
pig-0.15.0
java-1.7

NOTE: Make sure that install all the above components

Twitter Data + Pig + Sentiment Analysis Project Download Links:

`hadoop-2.6.0.tar.gz` ==> link
`pig-0.15.0.tar.gz` ==> link
`sentimentanalysis-pig.jar` ==> link
`tweets` ==> link

-----------------------------------------------------------------------------

1. Create `sentimentanalysis` folder in your machine

command: mkdir ~/sentimentanalysis




2. Download sample tweets or Download twitter data using flume to do Sentiment Analysis and copy to '~/sentimentanalysis' folder

Note: Download sample tweets link




Example: Sample Tweets

i am learning hadoop course
i am good in hadoop
i am learning hadoop
i am not feeling well
why we need bigdata
i am not happy with rdbms
ravi is not working today
india got the world cup
learn hadoop from kalyan blog
learn spark from kalyan blog


3. verify using cat command

command: cat ~/sentimentanalysis/tweets




4. start the hadoop using below command

command: start-all.sh





5. verify is running or not using "jps" command




6. Open browser using below url

http://localhost:50070/dfshealth.jsp




7. Load the sample tweets into HDFS

hadoop fs -mkdir -p /kalyan/sentimentanalysis/pig/input








hadoop fs -put ~/sentimentanalysis/tweets /kalyan/sentimentanalysis/pig/input











8. start the pig either local mode or mapreduce mode

command: pig -x mapreduce




9. Load the sample tweets in pig `tweets` bag

tweets = load '/kalyan/sentimentanalysis/pig/input' AS (tweet : chararray);




10. Display the data in pig `tweets` bag

dump tweets;




11. Download `sentimentanalysis-pig.jar` file and copy to '~/sentimentanalysis' folder

Note: Download sentimentanalysis-pig.jar link





12. Load the `sentimentanalysis-pig.jar` into HDFS

hadoop fs -put ~/sentimentanalysis/sentimentanalysis-pig.jar /kalyan/sentimentanalysis/pig









13. Add `sentimentanalysis-pig.jar` file into hive class path using below command

REGISTER <PATH OF THE JAR FILE>;

REGISTER hdfs://localhost:8020/kalyan/sentimentanalysis/pig/sentimentanalysis-pig.jar;






14. Define the sentiment function in pig

DEFINE <function name> 'UDF CLASS NAME WITH PACKAGE'

DEFINE sentiment com.orienit.kalyan.sentimentanalysis.pig.udf.SentimentUdf();




15. Analyse the tweets using sentiment function using below commands

sentimenttweets1 = FOREACH tweets GENERATE tweet, sentiment(tweet) as sentiment;




16. Display the data in `sentimenttweets1` bag in pig

dump sentimenttweets1;





17. Store the `sentimenttweets1` result into hdfs folder

STORE sentimenttweets1 INTO '/kalyan/sentimentanalysis/pig/sentimenttweets1';







18. Analyse the `tweets` from `sentimenttweets1` bag using case statement

sentimenttweets2 = FOREACH sentimenttweets1 GENERATE tweet, (
CASE
WHEN sentiment == 1 THEN 'positive'
WHEN sentiment == 0 THEN 'neutral'
WHEN sentiment == -1 THEN 'negative'
END
);




19. Store the `sentimenttweets2` result into hdfs folder

STORE sentimenttweets2 INTO '/kalyan/sentimentanalysis/pig/sentimenttweets2';









Share this article with your friends.

2 comments :

Related Posts Plugin for WordPress, Blogger...