Wednesday, 28 September 2016

How To Stream Twitter Data Into MongoDB Using Apache Flume

Pre-Requisites of Flume Project:


NOTE: Make sure that install all the above components

Flume Project Download Links:
`hadoop-2.6.0.tar.gz` ==> link
`apache-flume-1.6.0-bin.tar.gz` ==> link

`mongodb-linux-x86_64-ubuntu1404-3.2.7.tgz` ==> link
`kalyan-twitter-mongo-agent.conf` ==> link
`kalyan-flume-project-0.1.jar` ==> link

`mongodb-driver-core-3.3.0.jar` ==> link
`mongo-java-driver-3.3.0.jar` ==> link


1. create "kalyan-twitter-mongo-agent.conf" file with below content

agent.sources = Twitter
agent.channels = MemChannel
agent.sinks = 

agent.sources.Twitter.type = com.orienit.kalyan.flume.source.KalyanTwitterSource
agent.sources.Twitter.channels = MemChannel
agent.sources.Twitter.consumerKey = ********
agent.sources.Twitter.consumerSecret = ********
agent.sources.Twitter.accessToken = ********
agent.sources.Twitter.accessTokenSecret = ********
agent.sources.Twitter.keywords = hadoop, big data, analytics, bigdata, cloudera, data science, data scientiest, business intelligence, mapreduce, data warehouse, data warehousing, mahout, hbase, nosql, newsql, businessintelligence, cloudcomputing

agent.sinks.MongoDB.type = com.orienit.kalyan.flume.sink.KalyanMongoSink

agent.sinks.MongoDB.hostNames = localhost 
agent.sinks.MongoDB.database = flume 
agent.sinks.MongoDB.collection = twitter 
agent.sinks.MongoDB.batchSize = 10 = MemChannel

agent.channels.MemChannel.type = memory
agent.channels.MemChannel.capacity = 1000
agent.channels.MemChannel.transactionCapacity = 100

2. Copy "kalyan-twitter-mongo-agent.conf" file into "$FUME_HOME/conf" folder

3. Copy "kalyan-flume-project-0.1.jar, 
mongodb-driver-core-3.3.0.jar and mongo-java-driver-3.3.0.jar " files into"$FLUME_HOME/lib" folder

4. Execute the below command to `Extract data from Twitter into MongoDB using Flume`

$FLUME_HOME/bin/flume-ng agent -n agent --conf $FLUME_HOME/conf -f $FLUME_HOME/conf/kalyan-twitter-mongo-agent.conf -Dflume.root.logger=DEBUG,console

5. Verify the data in console

6. Verify the data in MongoDB

7. Start the MongoDB Server using below command

8. Start the MongoDB client using below command (mongo)

9. Verify the List of DataBases in MongoDB using below command (show dbs)

10. Verify the List of Operations in MongoDB using below commands

// list of databases
show dbs

// use flume database
use flume

// list of collections
show collections

// find the count of documents in 'twitter' collection

// display list of documents in 'twitter' collection

Share this article with your friends.

1 comment :

Related Posts Plugin for WordPress, Blogger...