How To Stream CSV Data Into HBase Using Apache Flume

Wednesday, 5 October 2016

How To Stream CSV Data Into HBase Using Apache Flume

Pre-Requisites of Flume Project:
hadoop-2.6.0
flume-1.6.0
hbase-0.98.4
java-1.7

Project Compatibility :
1. hadoop-2.6.0 + hbase-0.98.4 + flume-1.6.0
2. hadoop-2.7.2 + hbase-1.1.2 + flume-1.7.0

NOTE: Make sure that install all the above components

Flume Project Download Links:
`hadoop-2.6.0.tar.gz` ==> link
`apache-flume-1.6.0-bin.tar.gz` ==> link
`kalyan-regex-hbase-agent.conf` ==> link
`kalyan-flume-project-0.1.jar` ==> link
`bigdata-examples-0.0.1-SNAPSHOT-dependency-jars.jar` ==> link

-----------------------------------------------------------------------------

1. create "kalyan-regex-hbase-agent.conf" file with below content

agent.sources = EXEC
agent.channels = MemChannel
agent.sinks = HBASE

agent.sources.EXEC.type = exec
agent.sources.EXEC.command = tail -F /tmp/users.csv
agent.sources.EXEC.channels = MemChannel

agent.sinks.HBASE.type = hbase
agent.sinks.HBASE.table = users1
agent.sinks.HBASE.columnFamily = cf
agent.sinks.HBASE.serializer = org.apache.flume.sink.hbase.RegexHbaseEventSerializer
agent.sinks.HBASE.serializer.regex = ^([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*)$
agent.sinks.HBASE.serializer.colNames=userid,username,password,email,country,state,city,dt
agent.sinks.HBASE.channel = MemChannel

agent.channels.MemChannel.type = memory
agent.channels.MemChannel.capacity = 1000
agent.channels.MemChannel.transactionCapacity = 100

2. Copy "kalyan-regex-hbase-agent.conf" file into "$FUME_HOME/conf" folder

3. Copy "kalyan-flume-project-0.1.jar and bigdata-examples-0.0.1-SNAPSHOT-dependency-jars.jar" files into"$FLUME_HOME/lib" folder

4. Generate Large Amount of Sample CSV data follow this article.

5. Execute Below Command to Generate Sample CSV data with 100 lines. Increase this number to get more data ...

java -cp $FLUME_HOME/lib/bigdata-examples-0.0.1-SNAPSHOT-dependency-jars.jar \
com.orienit.kalyan.examples.GenerateUsers \
-f /tmp/users.csv \
-d ',' \
-n 100 \
-s 1

6. Verify the Sample CSV data in Console, using below command

cat /tmp/users.csv

7. To work with Flume + HBase Integration

Follow the below steps
1. start the hbase using below 'start-hbase.sh' command.

2. verify the hbase is running or not with "jps" command

3. connect to hbase using 'hbase shell' command

4. list out all the tables in hbase using 'list' command

5. create the hbase table name is 'users1' with column family name is 'cf' using below create 'users1', 'cf' command.

6. read the data from hbase table 'users1' using below scan 'users1' command.

8. Execute the below command to `Extract data from CSV data into HBase using Flume`

$FLUME_HOME/bin/flume-ng agent -n agent --conf $FLUME_HOME/conf -f $FLUME_HOME/conf/kalyan-regex-hbase-agent.conf -Dflume.root.logger=DEBUG,console