Kalyan Hadoop and Spark Training in Hyderabad Learn Big Data From Basics... @ Kalyan @

Mr.Kalyan, Apache Contributor, Cloudera CCA175 Certified Consultant, 8+ years of Big Data exp, IIT Kharagpur, Gold Medalist.

This blog is mainly meant for Learn Big Data From Basics
1. Development practices
2. Administration practices
3. Interview Questions
4. Big Data integrations
5. Advanced Technologies in Big Data
6. Become more strong on Big Data

Call for Spark & Hadoop Training in Hyderabad, ORIENIT @ 040 65142345 , 9703202345

Showing posts with label Phoenix. Show all posts

Sunday, 19 February 2017

Phoenix Kakfa Integration {Kalyan Contribution to Apache}

Hi All,

Phoenix Kafka is new Integration. You can follow the below link

http://phoenix.apache.org/kafka.html

This is new feature, require now a days ..

Real time Streaming data pushing into OLTP system like Phoenix.

You can see similar use cases in below links from my blog.

Flume Real Time Projects

http://kalyanbigdatatraining.blogspot.com/p/learn-flume-from-basics-step-by-step.html

Kafka Real Time Projects

http://kalyanbigdatatraining.blogspot.com/p/learn-kafka-from-basics-step-by-step.html

How to track the `Production Level of Tracking` for any issues, like below

https://issues.apache.org/jira/browse/PHOENIX-3214

Please share to you friends & colleagues to know more on these kind of use cases.

Thursday, 17 November 2016

Kalyan Spark Streaming + Phoenix + Flume + Play Project

=====================================================
Kalyan Spark + Phoenix + Flume + Play Project
=====================================================

1. Generate Sample Users

2. Generate Sample Product Logs

3. Using the `Flume` transfer the `product.log` file changes to `Phoenix`

4. Use `Spark Streaming` to listen `Flume` data

5. Save the streaming data into `Phoenix` for analytics

6. Give Real-Time analysis on `Phoenix` data using UI tools

7. You can still add your own features

8. Visualize data in UI

======================================================
Execute the below operations
======================================================

1. Execute the below `Phoenix Operations`

Start the Hadoop , Hbase & Phoenix

DROP TABLE users;

CREATE TABLE users("userid" bigint PRIMARY KEY, "username" varchar, "password" varchar, "email" varchar, "country" varchar, "state" varchar, "city" varchar, "dt" varchar);

DROP TABLE productlog;

CREATE TABLE IF NOT EXISTS productlog("userid" bigint not null, "username" varchar, "email" varchar, "product" varchar not null, "transaction" varchar, "country" varchar, "state" varchar, "city" varchar, "dt" varchar not null CONSTRAINT pk PRIMARY KEY ("userid", "product", "dt"));

2. Generate Sample Users by running `GenerateUsers.scala` code

3. Execute the below `Flume Operations`

Create `kalyan-spark-project` folder in `$FLUME_HOME` folder

Copy `exec-avro.conf` file into `$FLUME_HOME/kalyan-spark-project` folder

Execute below command to start flume agent

$FLUME_HOME/bin/flume-ng agent -n agent --conf $FLUME_HOME/conf -f $FLUME_HOME/kalyan-spark-project/exec-avro.conf -Dflume.root.logger=DEBUG,console

4. Generate Sample Logs by running `GenerateProductLog.scala`

5. Execute the below `Spark Operations`

Stream Sample Logs From Flume Sink using `Spark Streaming` by running `KalyanSparkFlumeStreaming.scala` and pass the below arguments

2 localhost 1234 false

Stream Sample Logs From Kafka Producer using `Spark Streaming` by running `KalyanSparkKafkaStreaming.scala` and pass the below arguments

2 topic1,topic2 localhost:9092 false

6. Use Zeppelin for UI

=============================================================
Kalyan Spark + Phoenix + Flume + Play Project : Visualization
=============================================================

1. Open new Terminal

2. Change the directory using below command

cd /home/orienit/spark/workspace/kalyan-spark-streaming-phoenix-project

3. Run the Project using below command

activator run

4. Open the browser with below link : Home Page

http://localhost:9000/

5. Open the browser with below link : Users Page

http://localhost:9000/kalyan/users

6. Open the browser with below link : Different Graphs

http://localhost:9000/kalyan/home

7. Open the browser with below link : Products Page 1

http://localhost:9000/kalyan/graphs/products

8. Open the browser with below link : Products Page 2

http://localhost:9000/kalyan/graphs/products

============================================================
Kalyan Spark + Phoenix + Flume + Play Project : Zeppelin Visualization
============================================================

1. Open new Terminal

2. Run the Zeppelin Project using below command

zeppelin-daemon.sh start

3. Open the browser using below link

http://localhost:8080/

4. Create a new Notebook with name is `Kalyan Spark Streaming Phoenix Project`

5. Follow the below screenshot

6. Run the paragraphs

7. Execute below command

%sql

select * from prouctlog

8. Execute below command

%sql

select userid, count(*) as cnt from prouctlog group by userid

9. Check the below graphs

Thursday, 6 October 2016

How To Stream JSON Data Into Phoenix Using Apache Flume

Pre-Requisites of Flume Project:

hadoop-2.6.0
flume-1.6.0
hbase-1.1.2
phoenix-4.7.0
java-1.7

NOTE: Make sure that install all the above components

Flume Project Download Links:

`hadoop-2.6.0.tar.gz` ==> link
`apache-flume-1.6.0-bin.tar.gz` ==> link
`hbase-1.1.2-bin.tar.gz` ==> link
`phoenix-4.7.0-HBase-1.1-bin.tar.gz` ==> link
`kalyan-json-phoenix-agent.conf` ==> link
`bigdata-examples-0.0.1-SNAPSHOT-dependency-jars.jar` ==> link
`phoenix-flume-4.7.0-HBase-1.1.jar` ==> link
`json-path-2.2.0.jar` ==> link
`commons-io-2.4.jar` ==> link

-----------------------------------------------------------------------------

1. create "kalyan-json-phoenix-agent.conf" file with below content

agent.sources = EXEC
agent.channels = MemChannel
agent.sinks = PHOENIX

agent.sources.EXEC.type = exec
agent.sources.EXEC.command = tail -F /tmp/users.json
agent.sources.EXEC.channels = MemChannel

agent.sinks.PHOENIX.type = org.apache.phoenix.flume.sink.PhoenixSink
agent.sinks.PHOENIX.batchSize = 10
agent.sinks.PHOENIX.zookeeperQuorum = localhost
agent.sinks.PHOENIX.table = users2
agent.sinks.PHOENIX.ddl = CREATE TABLE IF NOT EXISTS users2 (userid BIGINT NOT NULL, username VARCHAR, password VARCHAR, email VARCHAR, country VARCHAR, state VARCHAR, city VARCHAR, dt VARCHAR NOT NULL CONSTRAINT PK PRIMARY KEY (userid, dt))
agent.sinks.PHOENIX.serializer = json
agent.sinks.PHOENIX.serializer.columnsMapping = {"userid":"userid", "username":"username", "password":"password", "email":"email", "country":"country", "state":"state", "city":"city", "dt":"dt"}
agent.sinks.PHOENIX.serializer.partialSchema = true
agent.sinks.PHOENIX.serializer.columns = userid,username,password,email,country,state,city,dt
agent.sinks.PHOENIX.channel = MemChannel

agent.channels.MemChannel.type = memory
agent.channels.MemChannel.capacity = 1000
agent.channels.MemChannel.transactionCapacity = 100

2. Copy "kalyan-json-phoenix-agent.conf" file into "$FUME_HOME/conf" folder

3. Copy "phoenix-flume-4.7.0-HBase-1.1.jar, json-path-2.2.0.jar, commons-io-2.4.jar and bigdata-examples-0.0.1-SNAPSHOT-dependency-jars.jar" files into"$FLUME_HOME/lib" folder

4. Generate Large Amount of Sample JSON data follow this article.

5. Execute Below Command to Generate Sample JSON data with 100 lines. Increase this number to get more data ...

java -cp $FLUME_HOME/lib/bigdata-examples-0.0.1-SNAPSHOT-dependency-jars.jar \
com.orienit.kalyan.examples.GenerateUsers \
-f /tmp/users.json \
-n 100 \
-s 1

6. Verify the Sample JSON data in Console, using below command

cat /tmp/users.json

7. To work with Flume + Phoenix Integration

Follow the below steps

i. start the hbase using below 'start-hbase.sh' command.

ii. verify the hbase is running or not with "jps" command

iii. Start the phoenix using below 'sqlline.py localhost' command.

iv. list out all the tables in phoenix using '!tables' command

8. Execute the below command to `Extract data from JSON data into Phoenix using Flume`

$FLUME_HOME/bin/flume-ng agent -n agent --conf $FLUME_HOME/conf -f $FLUME_HOME/conf/kalyan-json-phoenix-agent.conf -Dflume.root.logger=DEBUG,console

9. Verify the data in console

10. Verify the data in Phoenix

Execute below command to get the data from phoenix table 'users2'

!tables

select count(*) from users2;

select * from users2;

How To Stream CSV Data Into Phoenix Using Apache Flume

Pre-Requisites of Flume Project:

hadoop-2.6.0
flume-1.6.0
hbase-1.1.2
phoenix-4.7.0
java-1.7

NOTE: Make sure that install all the above components

Flume Project Download Links:

`hadoop-2.6.0.tar.gz` ==> link
`apache-flume-1.6.0-bin.tar.gz` ==> link
`hbase-1.1.2-bin.tar.gz` ==> link
`phoenix-4.7.0-HBase-1.1-bin.tar.gz` ==> link
`kalyan-regex-phoenix-agent.conf` ==> link
`bigdata-examples-0.0.1-SNAPSHOT-dependency-jars.jar` ==> link
`phoenix-flume-4.7.0-HBase-1.1.jar` ==> link
`json-path-2.2.0.jar` ==> link
`commons-io-2.4.jar` ==> link

-----------------------------------------------------------------------------

1. create "kalyan-regex-phoenix-agent.conf" file with below content

agent.sources = EXEC
agent.channels = MemChannel
agent.sinks = PHOENIX

agent.sources.EXEC.type = exec
agent.sources.EXEC.command = tail -F /tmp/users.csv
agent.sources.EXEC.channels = MemChannel

agent.sinks.PHOENIX.type = org.apache.phoenix.flume.sink.PhoenixSink
agent.sinks.PHOENIX.batchSize = 10
agent.sinks.PHOENIX.zookeeperQuorum = localhost
agent.sinks.PHOENIX.table = users1
agent.sinks.PHOENIX.ddl = CREATE TABLE IF NOT EXISTS users1 (userid BIGINT NOT NULL, username VARCHAR, password VARCHAR, email VARCHAR, country VARCHAR, state VARCHAR, city VARCHAR, dt VARCHAR NOT NULL CONSTRAINT PK PRIMARY KEY (userid, dt))
agent.sinks.PHOENIX.serializer = regex
agent.sinks.PHOENIX.serializer.regex = ^([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*)$
agent.sinks.PHOENIX.serializer.columns=userid,username,password,email,country,state,city,dt
agent.sinks.PHOENIX.channel = MemChannel

agent.channels.MemChannel.type = memory
agent.channels.MemChannel.capacity = 1000
agent.channels.MemChannel.transactionCapacity = 100

2. Copy "kalyan-regex-phoenix-agent.conf" file into "$FUME_HOME/conf" folder

3. Copy "phoenix-flume-4.7.0-HBase-1.1.jar, json-path-2.2.0.jar, commons-io-2.4.jar and bigdata-examples-0.0.1-SNAPSHOT-dependency-jars.jar" files into"$FLUME_HOME/lib" folder

4. Generate Large Amount of Sample CSV data follow this article.

5. Execute Below Command to Generate Sample CSV data with 100 lines. Increase this number to get more data ...

java -cp $FLUME_HOME/lib/bigdata-examples-0.0.1-SNAPSHOT-dependency-jars.jar \
com.orienit.kalyan.examples.GenerateUsers \
-f /tmp/users.csv \
-d ',' \
-n 100 \
-s 1

6. Verify the Sample CSV data in Console, using below command

cat /tmp/users.csv

7. To work with Flume + Phoenix Integration

Follow the below steps

i. start the hbase using below 'start-hbase.sh' command.

ii. verify the hbase is running or not with "jps" command

iii. Start the phoenix using below 'sqlline.py localhost' command.

iv. list out all the tables in phoenix using '!tables' command

8. Execute the below command to `Extract data from CSV data into Phoenix using Flume`

$FLUME_HOME/bin/flume-ng agent -n agent --conf $FLUME_HOME/conf -f $FLUME_HOME/conf/kalyan-regex-phoenix-agent.conf -Dflume.root.logger=DEBUG,console