Pre-Requisites of Twitter Data + Pig + Sentiment Analysis Project:
hadoop-2.6.0
pig-0.15.0
java-1.7
NOTE: Make sure that install all the above components
Twitter Data + Pig + Sentiment Analysis Project Download Links:
`hadoop-2.6.0.tar.gz` ==> link
`pig-0.15.0.tar.gz` ==> link
`sentimentanalysis-pig.jar` ==> link
`tweets` ==> link
-----------------------------------------------------------------------------
1. Create `sentimentanalysis` folder in your machine
command: mkdir ~/sentimentanalysis
2. Download sample tweets or Download twitter data using flume to do Sentiment Analysis and copy to '~/sentimentanalysis' folder
Note: Download sample tweets link
Example: Sample Tweets
i am learning hadoop course
i am good in hadoop
i am learning hadoop
i am not feeling well
why we need bigdata
i am not happy with rdbms
ravi is not working today
india got the world cup
learn hadoop from kalyan blog
learn spark from kalyan blog
3. verify using cat command
command: cat ~/sentimentanalysis/tweets
4. start the hadoop using below command
command: start-all.sh
5. verify is running or not using "jps" command
6. Open browser using below url
http://localhost:50070/dfshealth.jsp
7. Load the sample tweets into HDFS
hadoop fs -mkdir -p /kalyan/sentimentanalysis/pig/input
hadoop fs -put ~/sentimentanalysis/tweets /kalyan/sentimentanalysis/pig/input
8. start the pig either local mode or mapreduce mode
command: pig -x mapreduce
9. Load the sample tweets in pig `tweets` bag
tweets = load '/kalyan/sentimentanalysis/pig/input' AS (tweet : chararray);
10. Display the data in pig `tweets` bag
dump tweets;
11. Download `sentimentanalysis-pig.jar` file and copy to '~/sentimentanalysis' folder
Note: Download sentimentanalysis-pig.jar link
12. Load the `sentimentanalysis-pig.jar` into HDFS
hadoop fs -put ~/sentimentanalysis/sentimentanalysis-pig.jar /kalyan/sentimentanalysis/pig
13. Add `sentimentanalysis-pig.jar` file into hive class path using below command
REGISTER <PATH OF THE JAR FILE>;
REGISTER hdfs://localhost:8020/kalyan/sentimentanalysis/pig/sentimentanalysis-pig.jar;
14. Define the sentiment function in pig
DEFINE <function name> 'UDF CLASS NAME WITH PACKAGE'
DEFINE sentiment com.orienit.kalyan.sentimentanalysis.pig.udf.SentimentUdf();
15. Analyse the tweets using sentiment function using below commands
sentimenttweets1 = FOREACH tweets GENERATE tweet, sentiment(tweet) as sentiment;
16. Display the data in `sentimenttweets1` bag in pig
dump sentimenttweets1;
17. Store the `sentimenttweets1` result into hdfs folder
STORE sentimenttweets1 INTO '/kalyan/sentimentanalysis/pig/sentimenttweets1';
18. Analyse the `tweets` from `sentimenttweets1` bag using case statement
sentimenttweets2 = FOREACH sentimenttweets1 GENERATE tweet, (
CASE
WHEN sentiment == 1 THEN 'positive'
WHEN sentiment == 0 THEN 'neutral'
WHEN sentiment == -1 THEN 'negative'
END
);
19. Store the `sentimenttweets2` result into hdfs folder
STORE sentimenttweets2 INTO '/kalyan/sentimentanalysis/pig/sentimenttweets2';
hadoop-2.6.0
pig-0.15.0
java-1.7
NOTE: Make sure that install all the above components
Twitter Data + Pig + Sentiment Analysis Project Download Links:
`hadoop-2.6.0.tar.gz` ==> link
`pig-0.15.0.tar.gz` ==> link
`sentimentanalysis-pig.jar` ==> link
`tweets` ==> link
-----------------------------------------------------------------------------
1. Create `sentimentanalysis` folder in your machine
command: mkdir ~/sentimentanalysis
2. Download sample tweets or Download twitter data using flume to do Sentiment Analysis and copy to '~/sentimentanalysis' folder
Note: Download sample tweets link
Example: Sample Tweets
i am learning hadoop course
i am good in hadoop
i am learning hadoop
i am not feeling well
why we need bigdata
i am not happy with rdbms
ravi is not working today
india got the world cup
learn hadoop from kalyan blog
learn spark from kalyan blog
3. verify using cat command
command: cat ~/sentimentanalysis/tweets
4. start the hadoop using below command
command: start-all.sh
5. verify is running or not using "jps" command
6. Open browser using below url
http://localhost:50070/dfshealth.jsp
7. Load the sample tweets into HDFS
hadoop fs -mkdir -p /kalyan/sentimentanalysis/pig/input
hadoop fs -put ~/sentimentanalysis/tweets /kalyan/sentimentanalysis/pig/input
8. start the pig either local mode or mapreduce mode
command: pig -x mapreduce
9. Load the sample tweets in pig `tweets` bag
tweets = load '/kalyan/sentimentanalysis/pig/input' AS (tweet : chararray);
10. Display the data in pig `tweets` bag
dump tweets;
11. Download `sentimentanalysis-pig.jar` file and copy to '~/sentimentanalysis' folder
Note: Download sentimentanalysis-pig.jar link
12. Load the `sentimentanalysis-pig.jar` into HDFS
hadoop fs -put ~/sentimentanalysis/sentimentanalysis-pig.jar /kalyan/sentimentanalysis/pig
13. Add `sentimentanalysis-pig.jar` file into hive class path using below command
REGISTER <PATH OF THE JAR FILE>;
REGISTER hdfs://localhost:8020/kalyan/sentimentanalysis/pig/sentimentanalysis-pig.jar;
14. Define the sentiment function in pig
DEFINE <function name> 'UDF CLASS NAME WITH PACKAGE'
DEFINE sentiment com.orienit.kalyan.sentimentanalysis.pig.udf.SentimentUdf();
15. Analyse the tweets using sentiment function using below commands
sentimenttweets1 = FOREACH tweets GENERATE tweet, sentiment(tweet) as sentiment;
16. Display the data in `sentimenttweets1` bag in pig
dump sentimenttweets1;
17. Store the `sentimenttweets1` result into hdfs folder
STORE sentimenttweets1 INTO '/kalyan/sentimentanalysis/pig/sentimenttweets1';
18. Analyse the `tweets` from `sentimenttweets1` bag using case statement
sentimenttweets2 = FOREACH sentimenttweets1 GENERATE tweet, (
CASE
WHEN sentiment == 1 THEN 'positive'
WHEN sentiment == 0 THEN 'neutral'
WHEN sentiment == -1 THEN 'negative'
END
);
19. Store the `sentimenttweets2` result into hdfs folder
STORE sentimenttweets2 INTO '/kalyan/sentimentanalysis/pig/sentimenttweets2';
Share this article with your friends.
Nice blog, thanks For sharing this useful article I liked this.
ReplyDeleteMBBS In Abroad
Mba In B Schools
MS In Abroad
GRE Training In Hyderabad
PTE Training In Hyderabad
Toefl Training In Hyderabad
Ielts Training In Hyderabad
This is a very nice blog.
ReplyDeleteBig Data and Hadoop Online Training