Friday, 2 December 2016

How to generate large amount of sample data with simple techniques for Big Data Projects

Kalyan Big Data Projects

How to generate large amount of sample data with simple techniques for Big Data Projects

Follow the below commands to generate large amount of sample data.


Create 'kalyan_bigdata_projects' folder in user home (i.e /home/orienit)


Command: mkdir /home/orienit/kalyan_bigdata_projects





Copy 'kalyan-bigdata-examples.jar' jar file into '/home/orienit/kalyan_bigdata_projects' folder




We are going to learn below Use Cases


Use Case1: Generating Sample Server Logs with simple command

Use Case2: Generating Sample Users in JSON format with simple command
Use Case3: Generating Sample Users in CSV format with simple command
Use Case4: Generating Sample Users in TSV format with simple command
Use Case5: Generating Sample Users in DELIMITED format with simple command
Use Case6: Generating Sample Product Log in JSON format with simple command
Use Case7: Generating Sample Product Log in CSV format with simple command
Use Case8: Generating Sample Product Log in TSV format with simple command
Use Case9: Generating Sample Product Log in DELIMITED format with simple command


Use Case1: Generating Sample Server Logs with simple command


java -cp /home/orienit/kalyan_bigdata_projects/kalyan-bigdata-examples.jar \

com.orienit.kalyan.examples.GenerateServerLog \
-f /tmp/serverlog.txt \
-n 100 \
-s 10 \
-d 2016/01/01 \
-w 5





Read SERVER LOG data




Use Case: Generating Sample Users with simple command


java -cp /home/orienit/kalyan_bigdata_projects/kalyan-bigdata-examples.jar \

com.orienit.kalyan.examples.GenerateUsers




We can pass different arguments for above command


-d => field delimiter like (tab, comma, semicolon, etc )

-f => output file path
-n => number of users, maximum number is 10000
-s => starting number of user id, bydefault is 1
-w => waiting time in milli sec, bydefault is 100 millisec


Use Case2: Generating Sample Users in JSON format with simple command


java -cp /home/orienit/kalyan_bigdata_projects/kalyan-bigdata-examples.jar \

com.orienit.kalyan.examples.GenerateUsers \
-f /tmp/users.json \
-n 10 \
-s 1




Read JSON Data






Use Case3: Generating Sample Users in CSV format with simple command


java -cp /home/orienit/kalyan_bigdata_projects/kalyan-bigdata-examples.jar \

com.orienit.kalyan.examples.GenerateUsers \
-f /tmp/users.csv \
-d ',' \
-n 10 \
-s 1





Read CSV data





Use Case4: Generating Sample Users in TSV format with simple command


java -cp /home/orienit/kalyan_bigdata_projects/kalyan-bigdata-examples.jar \

com.orienit.kalyan.examples.GenerateUsers \
-f /tmp/users.tsv \
-d '\t' \
-n 10 \
-s 1



Read TSV data






Use Case5: Generating Sample Users in DELIMITED format with simple command


java -cp /home/orienit/kalyan_bigdata_projects/kalyan-bigdata-examples.jar \

com.orienit.kalyan.examples.GenerateUsers \
-f /tmp/users.txt \
-d '#' \
-n 10 \
-s 1




Read Any DELIMITED Data




Use Case: Generating Sample Product Log with simple command


java -cp /home/orienit/kalyan_bigdata_projects/kalyan-bigdata-examples.jar \

com.orienit.kalyan.examples.GenerateProductLog




We can pass different arguments for above command


-d => field delimiter like (tab, comma, semicolon, etc )

-f => output file path
-l => number of logs, maximum number is 100000
-n => number of users, maximum number is 10000
-w => waiting time in milli sec, bydefault is 100 millisec


Use Case6: Generating Sample Product Log in JSON format with simple command


java -cp /home/orienit/kalyan_bigdata_projects/kalyan-bigdata-examples.jar \

com.orienit.kalyan.examples.GenerateProductLog \
-f /tmp/productlog.json \
-n 10 \
-l 20




Read JSON data





Use Case7: Generating Sample Product Log in CSV format with simple command


java -cp /home/orienit/kalyan_bigdata_projects/kalyan-bigdata-examples.jar \

com.orienit.kalyan.examples.GenerateProductLog \
-f /tmp/productlog.csv \
-d ',' \
-n 10 \
-l 20




Read CSV data





Use Case8: Generating Sample Product Log in TSV format with simple command


java -cp /home/orienit/kalyan_bigdata_projects/kalyan-bigdata-examples.jar \

com.orienit.kalyan.examples.GenerateProductLog \
-f /tmp/productlog.tsv \
-d '\t' \
-n 10 \
-l 20





Read TSV data





Use Case9: Generating Sample Product Log in DELIMITED format with simple command


java -cp /home/orienit/kalyan_bigdata_projects/kalyan-bigdata-examples.jar \

com.orienit.kalyan.examples.GenerateProductLog \
-f /tmp/productlog.txt \
-d '#' \
-n 10 \
-l 20





Read Any DELIMITED data






Share this article with your friends.
Related Posts Plugin for WordPress, Blogger...