Pre-Requisites of Flume Project:
hadoop-2.6.0
flume-1.6.0
mongodb-3.2.7
java-1.7
NOTE: Make sure that install all the above components
Flume Project Download Links:
`hadoop-2.6.0.tar.gz` ==> link
`apache-flume-1.6.0-bin.tar.gz` ==> link
`mongodb-linux-x86_64-ubuntu1404-3.2.7.tgz` ==> link
`kalyan-twitter-hdfs-mongo-agent.conf` ==> link
`kalyan-flume-project-0.1.jar` ==> link
`mongodb-driver-core-3.3.0.jar` ==> link
`mongo-java-driver-3.3.0.jar` ==> link
-----------------------------------------------------------------------------
1. create "kalyan-twitter-hdfs-mongo-agent.conf" file with below content
agent.sources = Twitter
agent.channels = MemChannel1 MemChannel2
agent.sinks = HDFS MongoDB
agent.sources.Twitter.type = com.orienit.kalyan.flume.source.KalyanTwitterSource
agent.sources.Twitter.channels = MemChannel1 MemChannel2
agent.sources.Twitter.consumerKey = ********
agent.sources.Twitter.consumerSecret = ********
agent.sources.Twitter.accessToken = ********
agent.sources.Twitter.accessTokenSecret = ********
agent.sources.Twitter.keywords = hadoop, big data, analytics, bigdata, cloudera, data science, data scientiest, business intelligence, mapreduce, data warehouse, data warehousing, mahout, hbase, nosql, newsql, businessintelligence, cloudcomputing
agent.sinks.HDFS.type = hdfs
agent.sinks.HDFS.channel = MemChannel1
agent.sinks.HDFS.hdfs.path = hdfs://localhost:8020/user/flume/tweets
agent.sinks.HDFS.hdfs.fileType = DataStream
agent.sinks.HDFS.hdfs.writeFormat = Text
agent.sinks.HDFS.hdfs.batchSize = 100
agent.sinks.HDFS.hdfs.rollSize = 0
agent.sinks.HDFS.hdfs.rollCount = 100
agent.sinks.HDFS.hdfs.useLocalTimeStamp = true
agent.sinks.MongoDB.type = com.orienit.kalyan.flume.sink.KalyanMongoSink
agent.sinks.MongoDB.hostNames = localhost
agent.sinks.MongoDB.database = flume
agent.sinks.MongoDB.collection = twitter
agent.sinks.MongoDB.batchSize = 10
agent.sinks.MongoDB.channel = MemChannel2
agent.channels.MemChannel1.type = memory
agent.channels.MemChannel1.capacity = 1000
agent.channels.MemChannel1.transactionCapacity = 100
agent.channels.MemChannel2.type = memory
agent.channels.MemChannel2.capacity = 1000
agent.channels.MemChannel2.transactionCapacity = 100
2. Copy "kalyan-twitter-hdfs-mongo-agent.conf" file into "$FUME_HOME/conf" folder
3. Copy "kalyan-flume-project-0.1.jar, mongodb-driver-core-3.3.0.jar and mongo-java-driver-3.3.0.jar " files into"$FLUME_HOME/lib" folder
4. Execute the below command to `Extract data from Twitter into HDFS & MongoDB using Flume`
$FLUME_HOME/bin/flume-ng agent -n agent --conf $FLUME_HOME/conf -f $FLUME_HOME/conf/kalyan-twitter-hdfs-mongo-agent.conf -Dflume.root.logger=DEBUG,console
5. Verify the data in console
6. Verify the data in HDFS and MongoDB
7. Start the MongoDB Server using below command
8. Start the MongoDB client using below command (mongo)
9. Verify the List of DataBases in MongoDB using below command (show dbs)
10. Verify the List of Operations in MongoDB using below commands
// list of databases
show dbs
// use flume database
use flume
// list of collections
show collections
// find the count of documents in 'twitter' collection
db.twitter.count()
// display list of documents in 'twitter' collection
hadoop-2.6.0
flume-1.6.0
mongodb-3.2.7
java-1.7
NOTE: Make sure that install all the above components
Flume Project Download Links:
`hadoop-2.6.0.tar.gz` ==> link
`apache-flume-1.6.0-bin.tar.gz` ==> link
`mongodb-linux-x86_64-ubuntu1404-3.2.7.tgz` ==> link
`kalyan-twitter-hdfs-mongo-agent.conf` ==> link
`kalyan-flume-project-0.1.jar` ==> link
`mongodb-driver-core-3.3.0.jar` ==> link
`mongo-java-driver-3.3.0.jar` ==> link
-----------------------------------------------------------------------------
1. create "kalyan-twitter-hdfs-mongo-agent.conf" file with below content
agent.sources = Twitter
agent.channels = MemChannel1 MemChannel2
agent.sinks = HDFS MongoDB
agent.sources.Twitter.type = com.orienit.kalyan.flume.source.KalyanTwitterSource
agent.sources.Twitter.channels = MemChannel1 MemChannel2
agent.sources.Twitter.consumerKey = ********
agent.sources.Twitter.consumerSecret = ********
agent.sources.Twitter.accessToken = ********
agent.sources.Twitter.accessTokenSecret = ********
agent.sources.Twitter.keywords = hadoop, big data, analytics, bigdata, cloudera, data science, data scientiest, business intelligence, mapreduce, data warehouse, data warehousing, mahout, hbase, nosql, newsql, businessintelligence, cloudcomputing
agent.sinks.HDFS.type = hdfs
agent.sinks.HDFS.channel = MemChannel1
agent.sinks.HDFS.hdfs.path = hdfs://localhost:8020/user/flume/tweets
agent.sinks.HDFS.hdfs.fileType = DataStream
agent.sinks.HDFS.hdfs.writeFormat = Text
agent.sinks.HDFS.hdfs.batchSize = 100
agent.sinks.HDFS.hdfs.rollSize = 0
agent.sinks.HDFS.hdfs.rollCount = 100
agent.sinks.HDFS.hdfs.useLocalTimeStamp = true
agent.sinks.MongoDB.type = com.orienit.kalyan.flume.sink.KalyanMongoSink
agent.sinks.MongoDB.hostNames = localhost
agent.sinks.MongoDB.database = flume
agent.sinks.MongoDB.collection = twitter
agent.sinks.MongoDB.batchSize = 10
agent.sinks.MongoDB.channel = MemChannel2
agent.channels.MemChannel1.type = memory
agent.channels.MemChannel1.capacity = 1000
agent.channels.MemChannel1.transactionCapacity = 100
agent.channels.MemChannel2.type = memory
agent.channels.MemChannel2.capacity = 1000
agent.channels.MemChannel2.transactionCapacity = 100
2. Copy "kalyan-twitter-hdfs-mongo-agent.conf" file into "$FUME_HOME/conf" folder
3. Copy "kalyan-flume-project-0.1.jar, mongodb-driver-core-3.3.0.jar and mongo-java-driver-3.3.0.jar " files into"$FLUME_HOME/lib" folder
4. Execute the below command to `Extract data from Twitter into HDFS & MongoDB using Flume`
$FLUME_HOME/bin/flume-ng agent -n agent --conf $FLUME_HOME/conf -f $FLUME_HOME/conf/kalyan-twitter-hdfs-mongo-agent.conf -Dflume.root.logger=DEBUG,console
5. Verify the data in console
6. Verify the data in HDFS and MongoDB
7. Start the MongoDB Server using below command
8. Start the MongoDB client using below command (mongo)
9. Verify the List of DataBases in MongoDB using below command (show dbs)
10. Verify the List of Operations in MongoDB using below commands
// list of databases
show dbs
// use flume database
use flume
// list of collections
show collections
// find the count of documents in 'twitter' collection
db.twitter.count()
// display list of documents in 'twitter' collection
db.twitter.find()
Share this article with your friends.
Nice blog, thanks For sharing this useful article I liked this.
ReplyDeleteMBBS In Abroad
Mba In B Schools
MS In Abroad
GRE Training In Hyderabad
PTE Training In Hyderabad
Toefl Training In Hyderabad
Ielts Training In Hyderabad