tag:blogger.com,1999:blog-21828335704221753842024-03-09T18:46:15.693-08:00Kalyan Hadoop and Spark Training in Hyderabad Learn Big Data From Basics... @ Kalyan @Mr.Kalyan, Apache Contributor, Cloudera CCA175 Certified Consultant, 8+ years of Big Data exp, IIT Kharagpur, Gold Medalist.<br>
<br>
This blog is mainly meant for Learn Big Data From Basics<br>
1. Development practices<br>
2. Administration practices<br>
3. Interview Questions<br>
4. Big Data integrations<br>
5. Advanced Technologies in Big Data<br>
6. Become more strong on Big Data<br>
<br>
Call for Spark & Hadoop Training in Hyderabad, ORIENIT @ 040 65142345 , 9703202345
Anonymoushttp://www.blogger.com/profile/01685371156177399870noreply@blogger.comBlogger76125tag:blogger.com,1999:blog-2182833570422175384.post-7839649262349593232017-07-06T05:27:00.000-07:002017-07-07T06:16:37.283-07:00HADOOP COURSE MATERIAL BY KALYAN @ ORIENIT<div dir="ltr" style="text-align: left;" trbidi="on">
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjfRpb36kxO8piYEjLtiqt4bDPjKvwes-ixbINatEKarO2xiE2V-DGxV8znSSlFt6_CDcZmPNBYneqEOWn_z0FMelHgR5uF1nSiOWwwK_XzRVJKsfN_3AAZzBn03zl-ONrActwzOBKPGxZS/s1600/Kalyan+Hadoop+Course+Material+%2540+ORIENIT.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="640" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjfRpb36kxO8piYEjLtiqt4bDPjKvwes-ixbINatEKarO2xiE2V-DGxV8znSSlFt6_CDcZmPNBYneqEOWn_z0FMelHgR5uF1nSiOWwwK_XzRVJKsfN_3AAZzBn03zl-ONrActwzOBKPGxZS/s640/Kalyan+Hadoop+Course+Material+%2540+ORIENIT.jpg" width="451" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<span style="color: red; font-size: x-large;"><b>DOWNLOAD HADOOP COURSE MATERIAL BY KALYAN @ ORIENIT</b></span></div>
<div class="separator" style="clear: both; text-align: center;">
<span style="font-size: x-large;"><br /></span></div>
<div class="separator" style="clear: both; text-align: center;">
<span style="font-size: x-large;"><a href="https://drive.google.com/uc?id=0B-haea19mwkmeS1fanlrWXhpMGM&export=download">https://drive.google.com/uc?id=0B-haea19mwkmeS1fanlrWXhpMGM&export=download</a></span></div>
<div class="separator" style="clear: both; text-align: center;">
<span style="font-size: x-large;"><br /></span></div>
<div class="separator" style="clear: both; text-align: center;">
<span style="font-size: x-large;">(or)</span></div>
<div class="separator" style="clear: both; text-align: center;">
<span style="font-size: x-large;"><br /></span></div>
<div class="separator" style="clear: both; text-align: center;">
<span style="font-size: x-large;"><a href="https://drive.google.com/open?id=0B-haea19mwkmeS1fanlrWXhpMGM">https://drive.google.com/open?id=0B-haea19mwkmeS1fanlrWXhpMGM</a></span></div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<br /></div>
Anonymoushttp://www.blogger.com/profile/01685371156177399870noreply@blogger.com10tag:blogger.com,1999:blog-2182833570422175384.post-62968302842270387872017-06-09T05:00:00.001-07:002017-06-09T05:04:05.817-07:00How to Perform Incremental Load in Sqoop<div dir="ltr" style="text-align: left;" trbidi="on">
<span style="font-family: inherit; font-size: large;"><b><span style="color: red;">Importing Incremental Data</span></b></span><br />
<span style="font-family: inherit; font-size: large;"><span style="color: red;"><b><br /></b></span>You can also perform incremental imports using Sqoop. Incremental import is a technique that imports only the newly added rows in a table. It is required to add incremental , check-column , and last-value options to perform the incremental import.<br /><br /><span style="color: #0b5394;"><b>incremental</b> - Used by Sqoop to determine which rows are new. Legal values for this mode include append and lastmodified .<br /><br /><b>check-column</b> - To provide the column that needs to checked the determine the candidate rows.<br /><br /><b>last-value</b> - This is the maximum value of the last import run.</span><br /><br /><br /><b><span style="color: red;">Example:</span></b></span><br />
<span style="font-family: inherit; font-size: large;"><span style="color: red;"><b><br /></b></span><span style="color: #274e13;">sqoop import </span></span><br />
<span style="font-family: inherit; font-size: large;"><span style="color: #274e13;">--connect jdbc:mysql://localhost:3306/kalyan </span></span><br />
<span style="font-family: inherit; font-size: large;"><span style="color: #274e13;">--username root </span></span><br />
<span style="font-family: inherit; font-size: large;"><span style="color: #274e13;">--table sample </span></span><br />
<span style="font-family: inherit; font-size: large;"><span style="color: #274e13;">-m 1</span></span><br />
<span style="font-family: inherit; font-size: large;"><span style="color: #274e13;">--incremental append </span></span><br />
<span style="font-family: inherit; font-size: large;"><span style="color: #274e13;">--check-column id </span></span><br />
<span style="font-family: inherit; font-size: large;"><span style="color: #274e13;">--last-value 1000</span></span><br />
<div>
<span style="font-family: inherit; font-size: large;"><br /></span></div>
<div>
<span style="font-family: inherit; font-size: large;"><br /></span></div>
<div>
<span style="font-family: inherit; font-size: large;"><br /></span></div>
</div>
Anonymoushttp://www.blogger.com/profile/01685371156177399870noreply@blogger.com6tag:blogger.com,1999:blog-2182833570422175384.post-74867326463013940242017-06-09T04:36:00.002-07:002017-06-09T04:36:41.916-07:00How to Enable Transactions in Hive<div dir="ltr" style="text-align: left;" trbidi="on">
<span style="font-family: inherit; font-size: large;">Hive supports transactions by setting the correct parameters. To enable transactions, the following configurations need to be set. These configuration parameters must be set appropriately to turn on transaction support in Hive:<br /><br /><span style="color: purple;">hive.support.concurrency - true<br /><br />hive.enforce.bucketing - true<br /><br />hive.exec.dynamic.partition.mode - nonstrict<br /><br />hive.txn.manager - org.apache.hadoop.hive.ql.lockmgr.DbTxnManager<br /><br />hive.compactor.initiator.on - true on one instance of the Thrift metastore service<br /><br />hive.compactor.worker.threads - 10 for an instance of the Thrift metastore service</span><br /><br /><br />Use this specific table format:<br /><br />CREATE TABLE mytable (<br /> c1 int,<br /> c2 string,<br /> c3 string<br />)<br />CLUSTERED BY (c1) INTO x BUCKETS<br />STORED AS orc<br />TBLPROPERTIES('transactional' = 'true');</span><div>
<span style="font-family: inherit; font-size: large;"><br /></span></div>
<div>
<span style="font-family: inherit; font-size: large;"><br /></span></div>
<div>
<span style="font-family: inherit; font-size: large;"><br /></span></div>
</div>
Anonymoushttp://www.blogger.com/profile/01685371156177399870noreply@blogger.com1tag:blogger.com,1999:blog-2182833570422175384.post-16444237885673614492017-06-09T04:29:00.001-07:002017-06-09T04:30:15.629-07:00What Is ACID and Why Use It?<div dir="ltr" style="text-align: left;" trbidi="on">
<span style="color: red; font-family: inherit; font-size: x-large;">ACID stands for four traits of database transactions:</span><br />
<span style="color: red; font-family: inherit; font-size: x-large;"><br /></span>
<span style="font-family: inherit; font-size: large;"><b>Atomicity</b> - An operation either succeeds completely or fails; operations do not leave </span><span style="font-family: inherit; font-size: large;">incomplete data in the system.</span><br />
<br />
<span style="font-family: inherit; font-size: large;"><b>Consistency</b> - Once an operation completes, the results of that operation are visible </span><span style="font-family: inherit; font-size: large;">to every subsequent operation.</span><br />
<br />
<span style="font-family: inherit; font-size: large;"><b>Isolation</b> - Operations completed by one user do not cause unexpected side effects </span><span style="font-family: inherit; font-size: large;">for other users.</span><br />
<br />
<span style="font-family: inherit; font-size: large;"><b>Durability</b> - Once an operation is complete, it will be preserved even if the machine </span><span style="font-family: inherit; font-size: large;">or system experiences a failure.</span><br />
<div>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><span style="font-family: inherit;">These </span>behaviours<span style="font-family: inherit;"> are mandatory to ensure transaction functionality.</span></span><br />
<br />
<span style="font-family: inherit; font-size: large;">If your operations are ACID compliant, the system will ensure your processing is protected against any failures.</span><br />
<div>
<span style="font-family: inherit; font-size: large;"><br /></span></div>
<div>
<span style="font-family: inherit; font-size: large;"><br /></span></div>
</div>
</div>
Anonymoushttp://www.blogger.com/profile/01685371156177399870noreply@blogger.com2tag:blogger.com,1999:blog-2182833570422175384.post-25100254821100200922017-04-07T16:49:00.003-07:002017-04-07T16:49:39.935-07:00SPARK BASICS Practice on 02 Apr 2017<div dir="ltr" style="text-align: left;" trbidi="on">
<span style="font-size: large;">`Spark` is meant for `In-Memory Distributed Computing`</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">`Spark` provides 4 libraries:</span><br />
<span style="font-size: large;">1. Spark SQL</span><br />
<span style="font-size: large;">2. Spark Streaming</span><br />
<span style="font-size: large;">3. Spark MLLib</span><br />
<span style="font-size: large;">4. Spark GraphX</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">`Spark Context` is the Entry point for any `Spark Operations`</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">`Resilient Distributed DataSets` => RDD</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">RDD features:</span><br />
<span style="font-size: large;">-------------------</span><br />
<span style="font-size: large;">1. immutability</span><br />
<span style="font-size: large;">2. lazy evaluation</span><br />
<span style="font-size: large;">3. cacheable</span><br />
<span style="font-size: large;">4. type infer</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">RDD operations:</span><br />
<span style="font-size: large;">-----------------</span><br />
<span style="font-size: large;">1. Transformations</span><br />
<span style="font-size: large;"><old rdd> ----> <new rdd></span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">2. Actions</span><br />
<span style="font-size: large;"><rdd> ---> <result></span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">Examples on RDD:</span><br />
<span style="font-size: large;">-------------------------</span><br />
<span style="font-size: large;">list <- {1,2,3,4,5}</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">1. Transformations:</span><br />
<span style="font-size: large;">---------------------</span><br />
<span style="font-size: large;">Ex1:</span><br />
<span style="font-size: large;">-----</span><br />
<span style="font-size: large;">f(x) <- {x + 1}</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">f(list) <- {2,3,4,5,6}</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">Ex2:</span><br />
<span style="font-size: large;">-----</span><br />
<span style="font-size: large;">f(x) <- {x * x}</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">f(list) <- {1,4,9,16,25}</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">2. Actions:</span><br />
<span style="font-size: large;">---------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">sum(list) -> 15</span><br />
<span style="font-size: large;">min(list) -> 1</span><br />
<span style="font-size: large;">max(list) -> 5</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">How to Start the Spark:</span><br />
<span style="font-size: large;">-----------------------------</span><br />
<span style="font-size: large;">scala <span class="Apple-tab-span" style="white-space: pre;"> </span>=><span class="Apple-tab-span" style="white-space: pre;"> </span>spark-shell</span><br />
<span style="font-size: large;">python <span class="Apple-tab-span" style="white-space: pre;"> </span>=><span class="Apple-tab-span" style="white-space: pre;"> </span>pyspark</span><br />
<span style="font-size: large;">R <span class="Apple-tab-span" style="white-space: pre;"> </span>=><span class="Apple-tab-span" style="white-space: pre;"> </span>sparkR</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">Spark-1.x:</span><br />
<span style="font-size: large;">--------------------------------------</span><br />
<span style="font-size: large;">Spark context available as 'sc'</span><br />
<span style="font-size: large;">Spark Sql Context available as 'sqlContext'</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">Spark-2.x:</span><br />
<span style="font-size: large;">--------------------------------------</span><br />
<span style="font-size: large;">Spark context available as 'sc'</span><br />
<span style="font-size: large;">Spark session available as 'spark'</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">How to Create a RDD:</span><br />
<span style="font-size: large;">---------------------------------------</span><br />
<span style="font-size: large;">We can create RDD 2 ways</span><br />
<span style="font-size: large;">1. from collections (List, Seq, Set, ....)</span><br />
<span style="font-size: large;">2. from data sets (text, csv, tsv, json, ...)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">1. from collections:</span><br />
<span style="font-size: large;">---------------------------------------</span><br />
<span style="font-size: large;">val list = List(1,2,3,4,5)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">val rdd = sc.parallelize(list)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">val rdd = sc.parallelize(list, 2)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">Syntax:</span><br />
<span style="font-size: large;">-----------------------</span><br />
<span style="font-size: large;">val rdd = sc.parallelize(<collection object>, <no.of partitions>)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">2. from datasets:</span><br />
<span style="font-size: large;">---------------------------------------</span><br />
<span style="font-size: large;">val file = "file:///home/orienit/work/input/demoinput"</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">val rdd = sc.textFile(file)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">val rdd = sc.textFile(file, 1)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">Syntax:</span><br />
<span style="font-size: large;">-----------------------</span><br />
<span style="font-size: large;">val rdd = sc.textFile(<file path>, <no.of partitions>)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">------------------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val list = List(1,2,3,4,5)</span><br />
<span style="font-size: large;">list: List[Int] = List(1, 2, 3, 4, 5)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val rdd = sc.parallelize(list)</span><br />
<span style="font-size: large;">rdd: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[0] at parallelize at <console>:26</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> rdd.getNumPartitions</span><br />
<span style="font-size: large;">res0: Int = 4</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val rdd = sc.parallelize(list, 2)</span><br />
<span style="font-size: large;">rdd: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[1] at parallelize at <console>:26</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> rdd.getNumPartitions</span><br />
<span style="font-size: large;">res1: Int = 2</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">------------------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val file = "file:///home/orienit/work/input/demoinput"</span><br />
<span style="font-size: large;">file: String = file:///home/orienit/work/input/demoinput</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val rdd = sc.textFile(file)</span><br />
<span style="font-size: large;">rdd: org.apache.spark.rdd.RDD[String] = file:///home/orienit/work/input/demoinput MapPartitionsRDD[3] at textFile at <console>:26</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> rdd.getNumPartitions</span><br />
<span style="font-size: large;">res2: Int = 2</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val rdd = sc.textFile(file, 1)</span><br />
<span style="font-size: large;">rdd: org.apache.spark.rdd.RDD[String] = file:///home/orienit/work/input/demoinput MapPartitionsRDD[5] at textFile at <console>:26</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> rdd.getNumPartitions</span><br />
<span style="font-size: large;">res3: Int = 1</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">------------------------------------------------------</span><br />
<span style="font-size: large;">Examples on RDD</span><br />
<span style="font-size: large;">------------------------------------------------------</span><br />
<span style="font-size: large;">val list = List(1,2,3,4,5)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">val rdd1 = sc.parallelize(list, 2)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">------------------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">val rdd2 = rdd1.map(x => x + 1)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">val rdd3 = rdd1.map(x => x +* x)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">------------------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val list = List(1,2,3,4,5)</span><br />
<span style="font-size: large;">list: List[Int] = List(1, 2, 3, 4, 5)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val rdd1 = sc.parallelize(list, 2)</span><br />
<span style="font-size: large;">rdd1: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[6] at parallelize at <console>:26</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">------------------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val rdd2 = rdd1.map(x => x + 1)</span><br />
<span style="font-size: large;">rdd2: org.apache.spark.rdd.RDD[Int] = MapPartitionsRDD[7] at map at <console>:28</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> rdd1.collect</span><br />
<span style="font-size: large;">res4: Array[Int] = Array(1, 2, 3, 4, 5)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> rdd2.collect</span><br />
<span style="font-size: large;">res5: Array[Int] = Array(2, 3, 4, 5, 6)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val rdd3 = rdd1.map(x => x * x)</span><br />
<span style="font-size: large;">rdd3: org.apache.spark.rdd.RDD[Int] = MapPartitionsRDD[8] at map at <console>:28</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> rdd3.collect</span><br />
<span style="font-size: large;">res6: Array[Int] = Array(1, 4, 9, 16, 25)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">------------------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">rdd1.min</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">rdd1.max</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">rdd1.sum</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">------------------------------------------------------</span><br />
<span style="font-size: large;">scala> rdd1.min</span><br />
<span style="font-size: large;">res7: Int = 1</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> rdd1.max</span><br />
<span style="font-size: large;">res8: Int = 5</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> rdd1.sum</span><br />
<span style="font-size: large;">res9: Double = 15.0</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> rdd1.collect</span><br />
<span style="font-size: large;">res10: Array[Int] = Array(1, 2, 3, 4, 5)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> rdd1.count</span><br />
<span style="font-size: large;">res11: Long = 5</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">------------------------------------------------------</span><br />
<span style="font-size: large;">Word Count in Spark using Scala</span><br />
<span style="font-size: large;">------------------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">val input = "file:///home/orienit/work/input/demoinput"</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">val output = "file:///home/orienit/work/output/spark-op"</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">val fileRdd = sc.textFile(input, 1)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">val wordsRdd = fileRdd.flatMap(line => line.split(" "))</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">val tuplesRdd = wordsRdd.map(word => (word, 1))</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">val wordCountRdd = tuplesRdd.reduceByKey((a,b) => a + b)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">wordCountRdd.saveAsTextFile(output)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">------------------------------------------------------</span><br />
<span style="font-size: large;">Optimize the Code :</span><br />
<span style="font-size: large;">------------------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">val input = "file:///home/orienit/work/input/demoinput"</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">val output = "file:///home/orienit/work/output/spark-op"</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">val fileRdd = sc.textFile(input, 1)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">val wordCountRdd = fileRdd.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey((a,b) => a + b)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">wordCountRdd.saveAsTextFile(output)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">------------------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val input = "file:///home/orienit/work/input/demoinput"</span><br />
<span style="font-size: large;">input: String = file:///home/orienit/work/input/demoinput</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val output = "file:///home/orienit/work/output/spark-op"</span><br />
<span style="font-size: large;">output: String = file:///home/orienit/work/output/spark-op</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val fileRdd = sc.textFile(input, 1)</span><br />
<span style="font-size: large;">fileRdd: org.apache.spark.rdd.RDD[String] = file:///home/orienit/work/input/demoinput MapPartitionsRDD[11] at textFile at <console>:26</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> fileRdd.collect</span><br />
<span style="font-size: large;">res12: Array[String] = Array(I am going, to hyd, I am learning, hadoop course)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val wordsRdd = fileRdd.flatMap(line => line.split(" "))</span><br />
<span style="font-size: large;">wordsRdd: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[12] at flatMap at <console>:28</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> wordsRdd.collect</span><br />
<span style="font-size: large;">res13: Array[String] = Array(I, am, going, to, hyd, I, am, learning, hadoop, course)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val tuplesRdd = wordsRdd.map(word => (word, 1))</span><br />
<span style="font-size: large;">tuplesRdd: org.apache.spark.rdd.RDD[(String, Int)] = MapPartitionsRDD[13] at map at <console>:30</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> tuplesRdd.collect</span><br />
<span style="font-size: large;">res14: Array[(String, Int)] = Array((I,1), (am,1), (going,1), (to,1), (hyd,1), (I,1), (am,1), (learning,1), (hadoop,1), (course,1))</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val wordCountRdd = tuplesRdd.reduceByKey((a,b) => a + b)</span><br />
<span style="font-size: large;">wordCountRdd: org.apache.spark.rdd.RDD[(String, Int)] = ShuffledRDD[14] at reduceByKey at <console>:32</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> wordCountRdd.collect</span><br />
<span style="font-size: large;">res15: Array[(String, Int)] = Array((learning,1), (hadoop,1), (am,2), (hyd,1), (I,2), (to,1), (going,1), (course,1))</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> wordCountRdd.saveAsTextFile(output)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">------------------------------------------------------</span><br />
<span style="font-size: large;">Grep Job in Spark using Scala:</span><br />
<span style="font-size: large;">------------------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">val input = "file:///home/orienit/work/input/demoinput"</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">val output = "file:///home/orienit/work/output/spark-grep-op"</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">val fileRdd = sc.textFile(input, 1)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">val grepRdd = fileRdd.filter(line => line.contains("am"))</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">grepRdd.saveAsTextFile(output)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">------------------------------------------------------</span><br />
<span style="font-size: large;">Sed Job in Spark using Scala:</span><br />
<span style="font-size: large;">------------------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">val input = "file:///home/orienit/work/input/demoinput"</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">val output = "file:///home/orienit/work/output/spark-sed-op"</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">val fileRdd = sc.textFile(input, 1)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">val sedRdd = fileRdd.map(line => line.replaceAll("am", "xyz"))</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">sedRdd.saveAsTextFile(output)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">------------------------------------------------------</span><br />
<span style="font-size: large;">Spark SQL:</span><br />
<span style="font-size: large;">------------------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">hive> select * from kalyan.student;</span><br />
<span style="font-size: large;">scala> spark.sql("select * from kalyan.student").show</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">------------------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">hive> select year, count(*) from kalyan.student group by year;</span><br />
<span style="font-size: large;">scala> spark.sql("select year, count(*) from kalyan.student group by year").show</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">------------------------------------------------------</span><br />
<span style="font-size: large;">Data Frames in Spark</span><br />
<span style="font-size: large;">------------------------------------------------------</span><br />
<span style="font-size: large;">val hiveDf = spark.sql("select * from kalyan.student")</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">hiveDf.show</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">hiveDf.registerTempTable("hivetbl")</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">val prop = new java.util.Properties</span><br />
<span style="font-size: large;">prop.setProperty("driver","com.mysql.jdbc.Driver")</span><br />
<span style="font-size: large;">prop.setProperty("user","root")</span><br />
<span style="font-size: large;">prop.setProperty("password","hadoop")</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">val jdbcDf = spark.read.jdbc("jdbc:mysql://localhost:3306/kalyan", "student", prop)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">jdbcDf.show</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">jdbcDf.registerTempTable("jdbctbl")</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">------------------------------------------------------</span><br />
<span style="font-size: large;">scala> val hiveDf = spark.sql("select * from kalyan.student")</span><br />
<span style="font-size: large;">hiveDf: org.apache.spark.sql.DataFrame = [name: string, id: int ... 2 more fields]</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> hiveDf.show</span><br />
<span style="font-size: large;">+------+---+------+----+</span><br />
<span style="font-size: large;">| name| id|course|year|</span><br />
<span style="font-size: large;">+------+---+------+----+</span><br />
<span style="font-size: large;">| arun| 1| cse| 1|</span><br />
<span style="font-size: large;">| sunil| 2| cse| 1|</span><br />
<span style="font-size: large;">| raj| 3| cse| 1|</span><br />
<span style="font-size: large;">|naveen| 4| cse| 1|</span><br />
<span style="font-size: large;">| venki| 5| cse| 2|</span><br />
<span style="font-size: large;">|prasad| 6| cse| 2|</span><br />
<span style="font-size: large;">| sudha| 7| cse| 2|</span><br />
<span style="font-size: large;">| ravi| 1| mech| 1|</span><br />
<span style="font-size: large;">| raju| 2| mech| 1|</span><br />
<span style="font-size: large;">| roja| 3| mech| 1|</span><br />
<span style="font-size: large;">| anil| 4| mech| 2|</span><br />
<span style="font-size: large;">| rani| 5| mech| 2|</span><br />
<span style="font-size: large;">|anvith| 6| mech| 2|</span><br />
<span style="font-size: large;">| madhu| 7| mech| 2|</span><br />
<span style="font-size: large;">| arun| 1| it| 3|</span><br />
<span style="font-size: large;">| sunil| 2| it| 3|</span><br />
<span style="font-size: large;">| raj| 3| it| 3|</span><br />
<span style="font-size: large;">|naveen| 4| it| 3|</span><br />
<span style="font-size: large;">| venki| 5| it| 4|</span><br />
<span style="font-size: large;">|prasad| 6| it| 4|</span><br />
<span style="font-size: large;">+------+---+------+----+</span><br />
<span style="font-size: large;">only showing top 20 rows</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">------------------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val jdbcDf = spark.read.jdbc("jdbc:mysql://localhost:3306/kalyan", "student", prop)</span><br />
<span style="font-size: large;">jdbcDf: org.apache.spark.sql.DataFrame = [name: string, id: int ... 2 more fields]</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> jdbcDf.show</span><br />
<span style="font-size: large;">+------+---+------+----+</span><br />
<span style="font-size: large;">| name| id|course|year|</span><br />
<span style="font-size: large;">+------+---+------+----+</span><br />
<span style="font-size: large;">| anil| 1| spark|2016|</span><br />
<span style="font-size: large;">|anvith| 5|hadoop|2015|</span><br />
<span style="font-size: large;">| dev| 6|hadoop|2015|</span><br />
<span style="font-size: large;">| raj| 3| spark|2016|</span><br />
<span style="font-size: large;">| sunil| 4|hadoop|2015|</span><br />
<span style="font-size: large;">|venkat| 2| spark|2016|</span><br />
<span style="font-size: large;">+------+---+------+----+</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">------------------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> hiveDf.registerTempTable("hivetbl")</span><br />
<span style="font-size: large;">warning: there was one deprecation warning; re-run with -deprecation for details</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> jdbcDf.registerTempTable("jdbctbl")</span><br />
<span style="font-size: large;">warning: there was one deprecation warning; re-run with -deprecation for details</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">------------------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">spark.sql("select hivetbl.*, jdbctbl.* from hivetbl join jdbctbl on hivetbl.name = jdbctbl.name").show</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">------------------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> spark.sql("select hivetbl.*, jdbctbl.* from hivetbl join jdbctbl on hivetbl.name = jdbctbl.name").show</span><br />
<span style="font-size: large;">+------+---+------+----+------+---+------+----+</span><br />
<span style="font-size: large;">| name| id|course|year| name| id|course|year|</span><br />
<span style="font-size: large;">+------+---+------+----+------+---+------+----+</span><br />
<span style="font-size: large;">| anil| 4| ece| 4| anil| 1| spark|2016|</span><br />
<span style="font-size: large;">| anil| 4| mech| 2| anil| 1| spark|2016|</span><br />
<span style="font-size: large;">|anvith| 6| ece| 4|anvith| 5|hadoop|2015|</span><br />
<span style="font-size: large;">|anvith| 6| mech| 2|anvith| 5|hadoop|2015|</span><br />
<span style="font-size: large;">| raj| 3| it| 3| raj| 3| spark|2016|</span><br />
<span style="font-size: large;">| raj| 3| cse| 1| raj| 3| spark|2016|</span><br />
<span style="font-size: large;">| sunil| 2| it| 3| sunil| 4|hadoop|2015|</span><br />
<span style="font-size: large;">| sunil| 2| cse| 1| sunil| 4|hadoop|2015|</span><br />
<span style="font-size: large;">+------+---+------+----+------+---+------+----+</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">------------------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">val casDf = sqlContext.read.cassandraFormat("student", "kalyan").load()</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">casDf.show</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">------------------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val casDf = sqlContext.read.cassandraFormat("student", "kalyan").load()</span><br />
<span style="font-size: large;">casDf: org.apache.spark.sql.DataFrame = [name: string, course: string ... 2 more fields]</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> casDf.show</span><br />
<span style="font-size: large;">+------+------+---+----+</span><br />
<span style="font-size: large;">| name|course| id|year|</span><br />
<span style="font-size: large;">+------+------+---+----+</span><br />
<span style="font-size: large;">| anil| spark| 1|2016|</span><br />
<span style="font-size: large;">| raj| spark| 3|2016|</span><br />
<span style="font-size: large;">|anvith|hadoop| 5|2015|</span><br />
<span style="font-size: large;">| dev|hadoop| 6|2015|</span><br />
<span style="font-size: large;">| sunil|hadoop| 4|2015|</span><br />
<span style="font-size: large;">|venkat| spark| 2|2016|</span><br />
<span style="font-size: large;">| kk|hadoop| 7|2015|</span><br />
<span style="font-size: large;">+------+------+---+----+</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">------------------------------------------------------</span><br />
<span style="font-size: large;">INSERT INTO kalyan.student(name, id, course, year) VALUES ('rajesh', 8, 'hadoop', 2017);</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">------------------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> casDf.show</span><br />
<span style="font-size: large;">+------+------+---+----+</span><br />
<span style="font-size: large;">| name|course| id|year|</span><br />
<span style="font-size: large;">+------+------+---+----+</span><br />
<span style="font-size: large;">| anil| spark| 1|2016|</span><br />
<span style="font-size: large;">| raj| spark| 3|2016|</span><br />
<span style="font-size: large;">|anvith|hadoop| 5|2015|</span><br />
<span style="font-size: large;">| dev|hadoop| 6|2015|</span><br />
<span style="font-size: large;">| sunil|hadoop| 4|2015|</span><br />
<span style="font-size: large;">|venkat| spark| 2|2016|</span><br />
<span style="font-size: large;">| kk|hadoop| 7|2015|</span><br />
<span style="font-size: large;">|rajesh|hadoop| 8|2017|</span><br />
<span style="font-size: large;">+------+------+---+----+</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">------------------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">cqlsh:kalyan> select year, count(*) from kalyan.student group by year;</span><br />
<span style="font-size: large;">SyntaxException: line 1:42 missing EOF at 'group' (...(*) from kalyan.student [group] by...)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">------------------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">casDf.registerTempTable("castbl")</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">spark.sql("select year, count(*) from castbl group by year").show</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">------------------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> casDf.registerTempTable("castbl")</span><br />
<span style="font-size: large;">warning: there was one deprecation warning; re-run with -deprecation for details</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> spark.sql("select year, count(*) from castbl group by year").show </span><br />
<span style="font-size: large;">+----+--------+</span><br />
<span style="font-size: large;">|year|count(1)|</span><br />
<span style="font-size: large;">+----+--------+</span><br />
<span style="font-size: large;">|2015| 4|</span><br />
<span style="font-size: large;">|2016| 3|</span><br />
<span style="font-size: large;">|2017| 1|</span><br />
<span style="font-size: large;">+----+--------+</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">------------------------------------------------------</span><br />
<span style="font-size: large;">spark.sql("select castbl.*, jdbctbl.* from castbl join jdbctbl on castbl.name = jdbctbl.name").show</span><br />
<span style="font-size: large;">------------------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> spark.sql("select castbl.*, jdbctbl.* from castbl join jdbctbl on castbl.name = jdbctbl.name").show</span><br />
<span style="font-size: large;">+------+------+---+----+------+---+------+----+</span><br />
<span style="font-size: large;">| name|course| id|year| name| id|course|year|</span><br />
<span style="font-size: large;">+------+------+---+----+------+---+------+----+</span><br />
<span style="font-size: large;">| anil| spark| 1|2016| anil| 1| spark|2016|</span><br />
<span style="font-size: large;">|anvith|hadoop| 5|2015|anvith| 5|hadoop|2015|</span><br />
<span style="font-size: large;">| dev|hadoop| 6|2015| dev| 6|hadoop|2015|</span><br />
<span style="font-size: large;">| raj| spark| 3|2016| raj| 3| spark|2016|</span><br />
<span style="font-size: large;">| sunil|hadoop| 4|2015| sunil| 4|hadoop|2015|</span><br />
<span style="font-size: large;">|venkat| spark| 2|2016|venkat| 2| spark|2016|</span><br />
<span style="font-size: large;">+------+------+---+----+------+---+------+----+</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">------------------------------------------------------</span><br />
<span style="font-size: large;">scala> casDf.toJSON.collect.foreach(println)</span><br />
<span style="font-size: large;">{"name":"anil","course":"spark","id":1,"year":2016}</span><br />
<span style="font-size: large;">{"name":"raj","course":"spark","id":3,"year":2016}</span><br />
<span style="font-size: large;">{"name":"anvith","course":"hadoop","id":5,"year":2015}</span><br />
<span style="font-size: large;">{"name":"dev","course":"hadoop","id":6,"year":2015}</span><br />
<span style="font-size: large;">{"name":"sunil","course":"hadoop","id":4,"year":2015}</span><br />
<span style="font-size: large;">{"name":"venkat","course":"spark","id":2,"year":2016}</span><br />
<span style="font-size: large;">{"name":"kk","course":"hadoop","id":7,"year":2015}</span><br />
<span style="font-size: large;">{"name":"rajesh","course":"hadoop","id":8,"year":2017}</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> hiveDf.toJSON.collect.foreach(println)</span><br />
<span style="font-size: large;">{"name":"arun","id":1,"course":"cse","year":1}</span><br />
<span style="font-size: large;">{"name":"sunil","id":2,"course":"cse","year":1}</span><br />
<span style="font-size: large;">{"name":"raj","id":3,"course":"cse","year":1}</span><br />
<span style="font-size: large;">{"name":"naveen","id":4,"course":"cse","year":1}</span><br />
<span style="font-size: large;">{"name":"venki","id":5,"course":"cse","year":2}</span><br />
<span style="font-size: large;">{"name":"prasad","id":6,"course":"cse","year":2}</span><br />
<span style="font-size: large;">{"name":"sudha","id":7,"course":"cse","year":2}</span><br />
<span style="font-size: large;">{"name":"ravi","id":1,"course":"mech","year":1}</span><br />
<span style="font-size: large;">{"name":"raju","id":2,"course":"mech","year":1}</span><br />
<span style="font-size: large;">{"name":"roja","id":3,"course":"mech","year":1}</span><br />
<span style="font-size: large;">{"name":"anil","id":4,"course":"mech","year":2}</span><br />
<span style="font-size: large;">{"name":"rani","id":5,"course":"mech","year":2}</span><br />
<span style="font-size: large;">{"name":"anvith","id":6,"course":"mech","year":2}</span><br />
<span style="font-size: large;">{"name":"madhu","id":7,"course":"mech","year":2}</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> jdbcDf.toJSON.collect.foreach(println)</span><br />
<span style="font-size: large;">{"name":"anil","id":1,"course":"spark","year":2016}</span><br />
<span style="font-size: large;">{"name":"anvith","id":5,"course":"hadoop","year":2015}</span><br />
<span style="font-size: large;">{"name":"dev","id":6,"course":"hadoop","year":2015}</span><br />
<span style="font-size: large;">{"name":"raj","id":3,"course":"spark","year":2016}</span><br />
<span style="font-size: large;">{"name":"sunil","id":4,"course":"hadoop","year":2015}</span><br />
<span style="font-size: large;">{"name":"venkat","id":2,"course":"spark","year":2016}</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">------------------------------------------------------</span><br />
<br /></div>
Anonymoushttp://www.blogger.com/profile/01685371156177399870noreply@blogger.com4tag:blogger.com,1999:blog-2182833570422175384.post-78478409685713712672017-04-07T16:48:00.003-07:002017-04-07T16:48:37.774-07:00SCALA BASICS Practice on 01 Apr 2017<div dir="ltr" style="text-align: left;" trbidi="on">
<span style="font-size: large;">`Scala` means `Scalable Language`</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">`Scala`is Functional + Object Oriented Programming Language</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">In Java:</span><br />
<span style="font-size: large;">------------</span><br />
<span style="font-size: large;">1. Primitive data types (int, float, double, long ...)</span><br />
<span style="font-size: large;">2. Wrapper Classes (integer, Float, Double, Long ..)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">Wrapper Classes are possible to do `Serialization and DeSerialization`</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">Java Syntax:</span><br />
<span style="font-size: large;">-----------------</span><br />
<span style="font-size: large;"><data type> <variable name> = <data> ;</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">In Scala:</span><br />
<span style="font-size: large;">-------------------</span><br />
<span style="font-size: large;">Everything is `Object`</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">Scala Syntax:</span><br />
<span style="font-size: large;">-----------------</span><br />
<span style="font-size: large;">val <variable name> : <data type> = <data></span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">var <variable name> : <data type> = <data></span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">val => value <span class="Apple-tab-span" style="white-space: pre;"> </span>=> it is immutable (we can't change the reference)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">var => variable<span class="Apple-tab-span" style="white-space: pre;"> </span>=> it is mutable (we can change the reference)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">orienit@kalyan:~$ scala</span><br />
<span style="font-size: large;">Welcome to Scala 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_66-internal).</span><br />
<span style="font-size: large;">Type in expressions for evaluation. Or try :help.</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> </span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">`Scala` Provides `REPL` functionality.</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">`REPL` means Read Evaluate Print Loop</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">`REPL` functionality already there in `Python` and `R` programming languages</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">---------------------------------------------------</span><br />
<span style="font-size: large;">scala> val name : String = "kalyan"</span><br />
<span style="font-size: large;">name: String = kalyan</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> name = "xyz"</span><br />
<span style="font-size: large;"><console>:12: error: reassignment to val</span><br />
<span style="font-size: large;"> name = "xyz"</span><br />
<span style="font-size: large;"> ^</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> var name : String = "kalyan"</span><br />
<span style="font-size: large;">name: String = kalyan</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> name = "xyz"</span><br />
<span style="font-size: large;">name: String = xyz</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">---------------------------------------------------</span><br />
<span style="font-size: large;">`Scala` provides `Type Infer`</span><br />
<span style="font-size: large;">---------------------------------------------------</span><br />
<span style="font-size: large;">Based on the `data` it will find the `data type`</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val name : String = "kalyan"</span><br />
<span style="font-size: large;">name: String = kalyan</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val name = "kalyan"</span><br />
<span style="font-size: large;">name: String = kalyan</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val id = 1</span><br />
<span style="font-size: large;">id: Int = 1</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val id = 1.5</span><br />
<span style="font-size: large;">id: Double = 1.5</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val id = 1l</span><br />
<span style="font-size: large;">id: Long = 1</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val id = 1f</span><br />
<span style="font-size: large;">id: Float = 1.0</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">---------------------------------------------------</span><br />
<span style="font-size: large;">scala> val id = 1</span><br />
<span style="font-size: large;">id: Int = 1</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val id = 1l</span><br />
<span style="font-size: large;">id: Long = 1</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val id = 1f</span><br />
<span style="font-size: large;">id: Float = 1.0</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val id = 1d</span><br />
<span style="font-size: large;">id: Double = 1.0</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val id : Long = 1</span><br />
<span style="font-size: large;">id: Long = 1</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val id : Float = 1</span><br />
<span style="font-size: large;">id: Float = 1.0</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val id : Double = 1</span><br />
<span style="font-size: large;">id: Double = 1.0</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">---------------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> a.toInt</span><br />
<span style="font-size: large;">res4: Int = 1</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> a.toDouble</span><br />
<span style="font-size: large;">res5: Double = 1.0</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> a.toFloat</span><br />
<span style="font-size: large;">res6: Float = 1.0</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> a.toLong</span><br />
<span style="font-size: large;">res7: Long = 1</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> a.toChar</span><br />
<span style="font-size: large;">res8: Char = ?</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">---------------------------------------------------</span><br />
<span style="font-size: large;">`Scala` Provides `Operator Overloading` simillar to `C++`</span><br />
<span style="font-size: large;">---------------------------------------------------</span><br />
<span style="font-size: large;">scala> val a = 1</span><br />
<span style="font-size: large;">a: Int = 1</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val b = 2</span><br />
<span style="font-size: large;">b: Int = 2</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val c = a + b</span><br />
<span style="font-size: large;">c: Int = 3</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val c = a.+(b)</span><br />
<span style="font-size: large;">c: Int = 3</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val c = a.-(b)</span><br />
<span style="font-size: large;">c: Int = -1</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val c = a.*(b)</span><br />
<span style="font-size: large;">c: Int = 2</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">---------------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">a + b <===> a.+(b)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">a - b <===> a.-(b)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">a * b <===> a.*(b)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">---------------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> a < b</span><br />
<span style="font-size: large;">res0: Boolean = true</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> a <= b</span><br />
<span style="font-size: large;">res1: Boolean = true</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> a >= b</span><br />
<span style="font-size: large;">res2: Boolean = false</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> a > b</span><br />
<span style="font-size: large;">res3: Boolean = false</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">---------------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> a.</span><br />
<span style="font-size: large;">!= >> isInfinite min toInt </span><br />
<span style="font-size: large;">% >>> isInfinity round toLong </span><br />
<span style="font-size: large;">& ^ isNaN self toOctalString </span><br />
<span style="font-size: large;">* abs isNegInfinity shortValue toRadians </span><br />
<span style="font-size: large;">+ byteValue isPosInfinity signum toShort </span><br />
<span style="font-size: large;">- ceil isValidByte to unary_+ </span><br />
<span style="font-size: large;">/ compare isValidChar toBinaryString unary_- </span><br />
<span style="font-size: large;">< compareTo isValidInt toByte unary_~ </span><br />
<span style="font-size: large;"><< doubleValue isValidLong toChar underlying </span><br />
<span style="font-size: large;"><= floatValue isValidShort toDegrees until </span><br />
<span style="font-size: large;">== floor isWhole toDouble | </span><br />
<span style="font-size: large;">> getClass longValue toFloat </span><br />
<span style="font-size: large;">>= intValue max toHexString </span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">---------------------------------------------------</span><br />
<span style="font-size: large;">if, if else, if else if expressions in Scala</span><br />
<span style="font-size: large;">---------------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">if(exp1) { body1 }</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">if(exp1) { </span><br />
<span style="font-size: large;">body1 </span><br />
<span style="font-size: large;">}</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">---------------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">if(exp1) { body1 } else { body2 }</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">if(exp1) { </span><br />
<span style="font-size: large;">body1 </span><br />
<span style="font-size: large;">} else { </span><br />
<span style="font-size: large;">body2 </span><br />
<span style="font-size: large;">}</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">Wrong in `Scala Prompt`</span><br />
<span style="font-size: large;">---------------------------------------------------</span><br />
<span style="font-size: large;">if(exp1) { </span><br />
<span style="font-size: large;">body1 </span><br />
<span style="font-size: large;">} </span><br />
<span style="font-size: large;">else { </span><br />
<span style="font-size: large;">body2 </span><br />
<span style="font-size: large;">}</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">if(exp1) { </span><br />
<span style="font-size: large;">body1 </span><br />
<span style="font-size: large;">} else </span><br />
<span style="font-size: large;">{ </span><br />
<span style="font-size: large;">body2 </span><br />
<span style="font-size: large;">}</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">---------------------------------------------------</span><br />
<span style="font-size: large;">if(exp1) { body1 } else if(exp2) { body2 }</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">if(exp1) { </span><br />
<span style="font-size: large;">body1 </span><br />
<span style="font-size: large;">} else if(exp2) { </span><br />
<span style="font-size: large;">body2 </span><br />
<span style="font-size: large;">}</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">if(exp1) { </span><br />
<span style="font-size: large;">body1 </span><br />
<span style="font-size: large;">} else if(exp2) { </span><br />
<span style="font-size: large;">body2 </span><br />
<span style="font-size: large;">} else if(exp3) { </span><br />
<span style="font-size: large;">body3 </span><br />
<span style="font-size: large;">}</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">---------------------------------------------------</span><br />
<span style="font-size: large;">val a = 10</span><br />
<span style="font-size: large;">val b = 20</span><br />
<span style="font-size: large;">val c = 30</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">---------------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val a = 10</span><br />
<span style="font-size: large;">a: Int = 10</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val b = 20</span><br />
<span style="font-size: large;">b: Int = 20</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val c = 30</span><br />
<span style="font-size: large;">c: Int = 30</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> if(a > b) println("nothing")</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> if(a < b) println("i am there")</span><br />
<span style="font-size: large;">i am there</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> if(a < b) println("i am there") else println("nothing")</span><br />
<span style="font-size: large;">i am there</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> if(a > b) println("i am there") else println("nothing")</span><br />
<span style="font-size: large;">nothing</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> if(a > b) println("i am there") else if(a < b) println("nothing") </span><br />
<span style="font-size: large;">nothing</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">---------------------------------------------------</span><br />
<span style="font-size: large;">Arrays in Java:</span><br />
<span style="font-size: large;">---------------------------------------------------</span><br />
<span style="font-size: large;"><data type>[] <variable name> = {};</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><data type>[] <variable name> = new <data type>[size];</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">String[] names = {"anil", "raj", "venkat"}</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">or</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">String[] names = new String[3];</span><br />
<span style="font-size: large;">names[0] = "anil";</span><br />
<span style="font-size: large;">names[1] = "raj";</span><br />
<span style="font-size: large;">names[2] = "venkat";</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">---------------------------------------------------</span><br />
<span style="font-size: large;">Arrays in Scala:</span><br />
<span style="font-size: large;">---------------------------------------------------</span><br />
<span style="font-size: large;">val <variable name> : Array[<data type] = Array[<data type](...)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">val <variable name> : Array[<data type] = new Array[<data type](size)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">val names : Array[String] = Array[String]("anil", "raj", "venkat")</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">or</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">val names : Array[String] = new Array[String](3)</span><br />
<span style="font-size: large;">names(0) = "anil"</span><br />
<span style="font-size: large;">names(1) = "raj"</span><br />
<span style="font-size: large;">names(2) = "venkat"</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">---------------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val names : Array[String] = Array[String]("anil", "raj", "venkat")</span><br />
<span style="font-size: large;">names: Array[String] = Array(anil, raj, venkat)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> names(0)</span><br />
<span style="font-size: large;">res14: String = anil</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> names(1)</span><br />
<span style="font-size: large;">res15: String = raj</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> names(2)</span><br />
<span style="font-size: large;">res16: String = venkat</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">---------------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val names : Array[String] = new Array[String](3)</span><br />
<span style="font-size: large;">names: Array[String] = Array(null, null, null)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> names(0) = "anil"</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> names(1) = "raj"</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> names(2) = "venkat"</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> names</span><br />
<span style="font-size: large;">res20: Array[String] = Array(anil, raj, venkat)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> names(0)</span><br />
<span style="font-size: large;">res21: String = anil</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> names(1)</span><br />
<span style="font-size: large;">res22: String = raj</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> names(2)</span><br />
<span style="font-size: large;">res23: String = venkat</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">---------------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val names = Array[String]("anil", "raj", "venkat")</span><br />
<span style="font-size: large;">names: Array[String] = Array(anil, raj, venkat)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val names = Array("anil", "raj", "venkat")</span><br />
<span style="font-size: large;">names: Array[String] = Array(anil, raj, venkat)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">---------------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">val nums = Array(1,2,3,4,5,6,7,8,9,10)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">---------------------------------------------------</span><br />
<span style="font-size: large;">scala> val nums = Array(1,2,3,4,5,6,7,8,9,10)</span><br />
<span style="font-size: large;">nums: Array[Int] = Array(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> for(x <- nums) println(x)</span><br />
<span style="font-size: large;">1</span><br />
<span style="font-size: large;">2</span><br />
<span style="font-size: large;">3</span><br />
<span style="font-size: large;">4</span><br />
<span style="font-size: large;">5</span><br />
<span style="font-size: large;">6</span><br />
<span style="font-size: large;">7</span><br />
<span style="font-size: large;">8</span><br />
<span style="font-size: large;">9</span><br />
<span style="font-size: large;">10</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> for(x <- nums) if(x %2 == 0) println(x)</span><br />
<span style="font-size: large;">2</span><br />
<span style="font-size: large;">4</span><br />
<span style="font-size: large;">6</span><br />
<span style="font-size: large;">8</span><br />
<span style="font-size: large;">10</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> for(x <- nums if(x %2 == 0) ) println(x)</span><br />
<span style="font-size: large;">2</span><br />
<span style="font-size: large;">4</span><br />
<span style="font-size: large;">6</span><br />
<span style="font-size: large;">8</span><br />
<span style="font-size: large;">10</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> for(x <- nums) if(x %2 == 0) println("Even Number: " + x) else println("Odd Number: " + x)</span><br />
<span style="font-size: large;">Odd Number: 1</span><br />
<span style="font-size: large;">Even Number: 2</span><br />
<span style="font-size: large;">Odd Number: 3</span><br />
<span style="font-size: large;">Even Number: 4</span><br />
<span style="font-size: large;">Odd Number: 5</span><br />
<span style="font-size: large;">Even Number: 6</span><br />
<span style="font-size: large;">Odd Number: 7</span><br />
<span style="font-size: large;">Even Number: 8</span><br />
<span style="font-size: large;">Odd Number: 9</span><br />
<span style="font-size: large;">Even Number: 10</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">---------------------------------------------------</span><br />
<span style="font-size: large;">for(x <- nums) </span><br />
<span style="font-size: large;">if(x %2 == 0) {</span><br />
<span style="font-size: large;">println("Even Number: " + x) </span><br />
<span style="font-size: large;">} else {</span><br />
<span style="font-size: large;">println("Odd Number: " + x)</span><br />
<span style="font-size: large;">}</span><br />
<span style="font-size: large;">---------------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> for(x <- nums) </span><br />
<span style="font-size: large;"> | if(x %2 == 0) {</span><br />
<span style="font-size: large;"> | println("Even Number: " + x) </span><br />
<span style="font-size: large;"> | } else {</span><br />
<span style="font-size: large;"> | println("Odd Number: " + x)</span><br />
<span style="font-size: large;"> | }</span><br />
<span style="font-size: large;">Odd Number: 1</span><br />
<span style="font-size: large;">Even Number: 2</span><br />
<span style="font-size: large;">Odd Number: 3</span><br />
<span style="font-size: large;">Even Number: 4</span><br />
<span style="font-size: large;">Odd Number: 5</span><br />
<span style="font-size: large;">Even Number: 6</span><br />
<span style="font-size: large;">Odd Number: 7</span><br />
<span style="font-size: large;">Even Number: 8</span><br />
<span style="font-size: large;">Odd Number: 9</span><br />
<span style="font-size: large;">Even Number: 10</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">---------------------------------------------------</span><br />
<span style="font-size: large;">String Interpolation:</span><br />
<span style="font-size: large;">---------------------------------------------------</span><br />
<span style="font-size: large;">val name = "kalyan"</span><br />
<span style="font-size: large;">val course = "spark"</span><br />
<span style="font-size: large;">val percentage = 80.5</span><br />
<span style="font-size: large;">val count = 100</span><br />
<span style="font-size: large;">---------------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">val exp1 = "name is " + name + ", course is " + course</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">val exp2 = "name is $name, course is $course"</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">val exp3 = s"name is $name, course is $course"</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">val exp4 = s"name is $name, percentage is $percentage"</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">val exp5 = s"name is $name, percentage is $percentage%.3f"</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">val exp6 = f"name is $name, percentage is $percentage%.3f"</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">val exp7 = s"name is $name\ncourse is $course"</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">val exp8 = raw"name is $name\ncourse is $course"</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">---------------------------------------------------</span><br />
<span style="font-size: large;">---------------------------------------------------</span><br />
<span style="font-size: large;">scala> val name = "kalyan"</span><br />
<span style="font-size: large;">name: String = kalyan</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val course = "spark"</span><br />
<span style="font-size: large;">course: String = spark</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val percentage = 80.5</span><br />
<span style="font-size: large;">percentage: Double = 80.5</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val count = 100</span><br />
<span style="font-size: large;">count: Int = 100</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">---------------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val exp1 = "name is " + name + ", course is " + course</span><br />
<span style="font-size: large;">exp1: String = name is kalyan, course is spark</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val exp2 = "name is $name, course is $course"</span><br />
<span style="font-size: large;">exp2: String = name is $name, course is $course</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val exp3 = s"name is $name, course is $course"</span><br />
<span style="font-size: large;">exp3: String = name is kalyan, course is spark</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val exp4 = s"name is $name, percentage is $percentage"</span><br />
<span style="font-size: large;">exp4: String = name is kalyan, percentage is 80.5</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val exp5 = s"name is $name, percentage is $percentage%.3f"</span><br />
<span style="font-size: large;">exp5: String = name is kalyan, percentage is 80.5%.3f</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val exp6 = f"name is $name, percentage is $percentage%.3f"</span><br />
<span style="font-size: large;">exp6: String = name is kalyan, percentage is 80.500</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val exp7 = s"name is $name\ncourse is $course"</span><br />
<span style="font-size: large;">exp7: String =</span><br />
<span style="font-size: large;">name is kalyan</span><br />
<span style="font-size: large;">course is spark</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val exp8 = raw"name is $name\ncourse is $course"</span><br />
<span style="font-size: large;">exp8: String = name is kalyan\ncourse is spark</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">---------------------------------------------------</span><br />
<span style="font-size: large;">Collections in `Scala`</span><br />
<span style="font-size: large;">---------------------------------------------------</span><br />
<span style="font-size: large;">In scala we have 2 types of collections:</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">1. immutable collections (scala.collection.immutable)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">2. mutable collections (scala.collection.mutable)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">---------------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> import scala.collection.immutable.</span><br />
<span style="font-size: large;">:: LongMap SortedMap </span><br />
<span style="font-size: large;">AbstractMap LongMapEntryIterator SortedSet </span><br />
<span style="font-size: large;">BitSet LongMapIterator Stack </span><br />
<span style="font-size: large;">DefaultMap LongMapKeyIterator Stream </span><br />
<span style="font-size: large;">HashMap LongMapUtils StreamIterator </span><br />
<span style="font-size: large;">HashSet LongMapValueIterator StreamView </span><br />
<span style="font-size: large;">IndexedSeq Map StreamViewLike </span><br />
<span style="font-size: large;">IntMap MapLike StringLike </span><br />
<span style="font-size: large;">IntMapEntryIterator MapProxy StringOps </span><br />
<span style="font-size: large;">IntMapIterator Nil Traversable </span><br />
<span style="font-size: large;">IntMapKeyIterator NumericRange TreeMap </span><br />
<span style="font-size: large;">IntMapUtils Page TreeSet </span><br />
<span style="font-size: large;">IntMapValueIterator PagedSeq TrieIterator </span><br />
<span style="font-size: large;">Iterable Queue Vector </span><br />
<span style="font-size: large;">LinearSeq Range VectorBuilder </span><br />
<span style="font-size: large;">List RedBlackTree VectorIterator </span><br />
<span style="font-size: large;">ListMap Seq VectorPointer </span><br />
<span style="font-size: large;">ListSerializeEnd Set WrappedString </span><br />
<span style="font-size: large;">ListSet SetProxy </span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">---------------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> import scala.collection.mutable.</span><br />
<span style="font-size: large;">AVLIterator LinkedListLike </span><br />
<span style="font-size: large;">AVLTree ListBuffer </span><br />
<span style="font-size: large;">AbstractBuffer ListMap </span><br />
<span style="font-size: large;">AbstractIterable LongMap </span><br />
<span style="font-size: large;">AbstractMap Map </span><br />
<span style="font-size: large;">AbstractSeq MapBuilder </span><br />
<span style="font-size: large;">AbstractSet MapLike </span><br />
<span style="font-size: large;">AnyRefMap MapProxy </span><br />
<span style="font-size: large;">ArrayBuffer MultiMap </span><br />
<span style="font-size: large;">ArrayBuilder MutableList </span><br />
<span style="font-size: large;">ArrayLike ObservableBuffer </span><br />
<span style="font-size: large;">ArrayOps ObservableMap </span><br />
<span style="font-size: large;">ArraySeq ObservableSet </span><br />
<span style="font-size: large;">ArrayStack OpenHashMap </span><br />
<span style="font-size: large;">BitSet PriorityQueue </span><br />
<span style="font-size: large;">Buffer PriorityQueueProxy </span><br />
<span style="font-size: large;">BufferLike Publisher </span><br />
<span style="font-size: large;">BufferProxy Queue </span><br />
<span style="font-size: large;">Builder QueueProxy </span><br />
<span style="font-size: large;">Cloneable ResizableArray </span><br />
<span style="font-size: large;">DefaultEntry RevertibleHistory </span><br />
<span style="font-size: large;">DefaultMapModel Seq </span><br />
<span style="font-size: large;">DoubleLinkedList SeqLike </span><br />
<span style="font-size: large;">DoubleLinkedListLike Set </span><br />
<span style="font-size: large;">FlatHashTable SetBuilder </span><br />
<span style="font-size: large;">GrowingBuilder SetLike </span><br />
<span style="font-size: large;">HashEntry SetProxy </span><br />
<span style="font-size: large;">HashMap SortedSet </span><br />
<span style="font-size: large;">HashSet Stack </span><br />
<span style="font-size: large;">HashTable StackProxy </span><br />
<span style="font-size: large;">History StringBuilder </span><br />
<span style="font-size: large;">ImmutableMapAdaptor Subscriber </span><br />
<span style="font-size: large;">ImmutableSetAdaptor SynchronizedBuffer </span><br />
<span style="font-size: large;">IndexedSeq SynchronizedMap </span><br />
<span style="font-size: large;">IndexedSeqLike SynchronizedPriorityQueue </span><br />
<span style="font-size: large;">IndexedSeqOptimized SynchronizedQueue </span><br />
<span style="font-size: large;">IndexedSeqView SynchronizedSet </span><br />
<span style="font-size: large;">Iterable SynchronizedStack </span><br />
<span style="font-size: large;">LazyBuilder Traversable </span><br />
<span style="font-size: large;">Leaf TreeSet </span><br />
<span style="font-size: large;">LinearSeq Undoable </span><br />
<span style="font-size: large;">LinkedEntry UnrolledBuffer </span><br />
<span style="font-size: large;">LinkedHashMap WeakHashMap </span><br />
<span style="font-size: large;">LinkedHashSet WrappedArray </span><br />
<span style="font-size: large;">LinkedList WrappedArrayBuilder </span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">---------------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">---------------------------------------------------</span><br />
<span style="font-size: large;">val list = List(1,2,3,4,5)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">val list = List[Int](1,2,3,4,5)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">val seq = Seq[Int](1,2,3,4,5)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">val set = Set[Int](1,2,3,4,5)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">val vec = Vector[Int](1,2,3,4,5)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">val str = Stream[Int](1,2,3,4,5)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">---------------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">val st = Stack[Int](1,2,3,4,5)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">val qu = Queue[Int](1,2,3,4,5)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">---------------------------------------------------</span><br />
<span style="font-size: large;">scala> val st = Stack[Int](1,2,3,4,5)</span><br />
<span style="font-size: large;"><console>:11: error: not found: value Stack</span><br />
<span style="font-size: large;"> val st = Stack[Int](1,2,3,4,5)</span><br />
<span style="font-size: large;"> ^</span><br />
<span style="font-size: large;">scala> val qu = Queue[Int](1,2,3,4,5)</span><br />
<span style="font-size: large;"><console>:11: error: not found: value Queue</span><br />
<span style="font-size: large;"> val qu = Queue[Int](1,2,3,4,5)</span><br />
<span style="font-size: large;"> ^</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">---------------------------------------------------</span><br />
<span style="font-size: large;">val st1 = scala.collection.immutable.Stack[Int](1,2,3,4,5)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">val qu1 = scala.collection.immutable.Queue[Int](1,2,3,4,5)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">val st2 = scala.collection.mutable.Stack[Int](1,2,3,4,5)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">val qu2 = scala.collection.mutable.Queue[Int](1,2,3,4,5)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">---------------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val st1 = scala.collection.immutable.Stack[Int](1,2,3,4,5)</span><br />
<span style="font-size: large;">st1: scala.collection.immutable.Stack[Int] = Stack(1, 2, 3, 4, 5)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val qu1 = scala.collection.immutable.Queue[Int](1,2,3,4,5)</span><br />
<span style="font-size: large;">qu1: scala.collection.immutable.Queue[Int] = Queue(1, 2, 3, 4, 5)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val st2 = scala.collection.mutable.Stack[Int](1,2,3,4,5)</span><br />
<span style="font-size: large;">st2: scala.collection.mutable.Stack[Int] = Stack(1, 2, 3, 4, 5)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val qu2 = scala.collection.mutable.Queue[Int](1,2,3,4,5)</span><br />
<span style="font-size: large;">qu2: scala.collection.mutable.Queue[Int] = Queue(1, 2, 3, 4, 5)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">---------------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val list = List(1,2,3,4,5)</span><br />
<span style="font-size: large;">list: List[Int] = List(1, 2, 3, 4, 5)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val list = List[Int](1,2,3,4,5)</span><br />
<span style="font-size: large;">list: List[Int] = List(1, 2, 3, 4, 5)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val seq = Seq[Int](1,2,3,4,5)</span><br />
<span style="font-size: large;">seq: Seq[Int] = List(1, 2, 3, 4, 5)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val set = Set[Int](1,2,3,4,5)</span><br />
<span style="font-size: large;">set: scala.collection.immutable.Set[Int] = Set(5, 1, 2, 3, 4)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val vec = Vector[Int](1,2,3,4,5)</span><br />
<span style="font-size: large;">vec: scala.collection.immutable.Vector[Int] = Vector(1, 2, 3, 4, 5)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val str = Stream[Int](1,2,3,4,5)</span><br />
<span style="font-size: large;">str: scala.collection.immutable.Stream[Int] = Stream(1, ?)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">---------------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">val list = List(1,2,3,4,5)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">or</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">val list = 1 :: 2 :: 3 :: 4 :: 5 :: Nil</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">or</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">val list = 1 :: (2 :: (3 :: (4 :: (5 :: Nil))))</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">---------------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val list = List(1,2,3,4,5)</span><br />
<span style="font-size: large;">list: List[Int] = List(1, 2, 3, 4, 5)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val list = 1 :: 2 :: 3 :: 4 :: 5 :: Nil</span><br />
<span style="font-size: large;">list: List[Int] = List(1, 2, 3, 4, 5)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val list = 1 :: (2 :: (3 :: (4 :: (5 :: Nil))))</span><br />
<span style="font-size: large;">list: List[Int] = List(1, 2, 3, 4, 5)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">---------------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> var list = List(4,5,6)</span><br />
<span style="font-size: large;">list: List[Int] = List(4, 5, 6)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> list :+ 7</span><br />
<span style="font-size: large;">res0: List[Int] = List(4, 5, 6, 7)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> 3 +: list</span><br />
<span style="font-size: large;">res1: List[Int] = List(3, 4, 5, 6)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">---------------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">var list = List(4,5,6)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">list = 3 +: list</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">list = list :+ 7</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">list = 2 +: list :+ 8</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">---------------------------------------------------</span><br />
<span style="font-size: large;">scala> var list = List(4,5,6)</span><br />
<span style="font-size: large;">list: List[Int] = List(4, 5, 6)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> list = 3 +: list</span><br />
<span style="font-size: large;">list: List[Int] = List(3, 4, 5, 6)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> list = list :+ 7</span><br />
<span style="font-size: large;">list: List[Int] = List(3, 4, 5, 6, 7)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> list = 2 +: list :+ 8</span><br />
<span style="font-size: large;">list: List[Int] = List(2, 3, 4, 5, 6, 7, 8)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">---------------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">val list1 = List(1,2,3,4,5)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">val list2 = List(6,7,8,9,10)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">---------------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val list1 = List(1,2,3,4,5)</span><br />
<span style="font-size: large;">list1: List[Int] = List(1, 2, 3, 4, 5)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val list2 = List(6,7,8,9,10)</span><br />
<span style="font-size: large;">list2: List[Int] = List(6, 7, 8, 9, 10)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val list3 = list1 +: list2</span><br />
<span style="font-size: large;">list3: List[Any] = List(List(1, 2, 3, 4, 5), 6, 7, 8, 9, 10)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val list3 = list2 +: list1</span><br />
<span style="font-size: large;">list3: List[Any] = List(List(6, 7, 8, 9, 10), 1, 2, 3, 4, 5)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val list3 = list1 :+ list2</span><br />
<span style="font-size: large;">list3: List[Any] = List(1, 2, 3, 4, 5, List(6, 7, 8, 9, 10))</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val list3 = list1 ::: list2</span><br />
<span style="font-size: large;">list3: List[Int] = List(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">---------------------------------------------------</span><br />
<span style="font-size: large;">Functions in Scala:</span><br />
<span style="font-size: large;">---------------------------------------------------</span><br />
<span style="font-size: large;">1. anaonymus functions</span><br />
<span style="font-size: large;">2. named functions</span><br />
<span style="font-size: large;">3. curried functions</span><br />
<span style="font-size: large;">---------------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">1. anaonymus functions</span><br />
<span style="font-size: large;">---------------------------------------------------</span><br />
<span style="font-size: large;">(x : Int, y : Int) => x + y</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">val add = (x : Int, y : Int) => x + y</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">add(1,2)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">---------------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> (x : Int, y : Int) => x + y</span><br />
<span style="font-size: large;">res2: (Int, Int) => Int = <function2></span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val add = (x : Int, y : Int) => x + y</span><br />
<span style="font-size: large;">add: (Int, Int) => Int = <function2></span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> add(1,2)</span><br />
<span style="font-size: large;">res3: Int = 3</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">2. named functions</span><br />
<span style="font-size: large;">---------------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">def add(x : Int, y : Int) = x + y</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">add(1,2)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">---------------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> def add(x : Int, y : Int) = x + y</span><br />
<span style="font-size: large;">add: (x: Int, y: Int)Int</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> add(1,2)</span><br />
<span style="font-size: large;">res4: Int = 3</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">3. curried functions</span><br />
<span style="font-size: large;">---------------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">def add(x : Int)(y : Int) = x + y</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">add(1)(2)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">---------------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> def add(x : Int)(y : Int) = x + y</span><br />
<span style="font-size: large;">add: (x: Int)(y: Int)Int</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> add(1)(2)</span><br />
<span style="font-size: large;">res5: Int = 3</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">---------------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">`named functions` importance</span><br />
<span style="font-size: large;">---------------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">def add(x : Int, y : Int) = x + y</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">def add(x : Int = 1, y : Int = 2) = x + y</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">---------------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> def add(x : Int = 1, y : Int = 2) = x + y</span><br />
<span style="font-size: large;">add: (x: Int, y: Int)Int</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> add()</span><br />
<span style="font-size: large;">res6: Int = 3</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> add(10)</span><br />
<span style="font-size: large;">res7: Int = 12</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> add(10,20)</span><br />
<span style="font-size: large;">res8: Int = 30</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> add(x = 10, y = 20)</span><br />
<span style="font-size: large;">res9: Int = 30</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> add(y = 20, x = 10)</span><br />
<span style="font-size: large;">res10: Int = 30</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">---------------------------------------------------</span><br />
<span style="font-size: large;">factorial of n:</span><br />
<span style="font-size: large;">---------------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">def factorial(n : Int) = {</span><br />
<span style="font-size: large;"> if(n == 1) 1</span><br />
<span style="font-size: large;"> else n * factorial(n-1)</span><br />
<span style="font-size: large;">}</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> def factorial(n : Int) = {</span><br />
<span style="font-size: large;"> | if(n == 1) 1</span><br />
<span style="font-size: large;"> | else n * factorial(n-1)</span><br />
<span style="font-size: large;"> | }</span><br />
<span style="font-size: large;"><console>:13: error: recursive method factorial needs result type</span><br />
<span style="font-size: large;"> else n * factorial(n-1)</span><br />
<span style="font-size: large;"> ^</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">---------------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">def factorial(n : Int) : Int = {</span><br />
<span style="font-size: large;"> if(n == 1) 1</span><br />
<span style="font-size: large;"> else n * factorial(n-1)</span><br />
<span style="font-size: large;">}</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">---------------------------------------------------</span><br />
<span style="font-size: large;">scala> def factorial(n : Int) : Int = {</span><br />
<span style="font-size: large;"> | if(n == 1) 1</span><br />
<span style="font-size: large;"> | else n * factorial(n-1)</span><br />
<span style="font-size: large;"> | }</span><br />
<span style="font-size: large;">factorial: (n: Int)Int</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> factorial(5)</span><br />
<span style="font-size: large;">res11: Int = 120</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> factorial(4)</span><br />
<span style="font-size: large;">res12: Int = 24</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">---------------------------------------------------</span><br />
<span style="font-size: large;">Object Oriented Programming in Scala</span><br />
<span style="font-size: large;">---------------------------------------------------</span><br />
<span style="font-size: large;">trait</span><br />
<span style="font-size: large;">abstract class</span><br />
<span style="font-size: large;">class</span><br />
<span style="font-size: large;">object</span><br />
<span style="font-size: large;">case class</span><br />
<span style="font-size: large;">---------------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">case class A (a :Int, b : String) {</span><br />
<span style="font-size: large;"> override def toString() : String = s"a is $a, b is $b"</span><br />
<span style="font-size: large;">}</span><br />
<span style="font-size: large;">---------------------------------------------------</span><br />
<span style="font-size: large;">scala> case class A (a :Int, b : String) {</span><br />
<span style="font-size: large;"> | override def toString() : String = s"a is $a, b is $b"</span><br />
<span style="font-size: large;"> | }</span><br />
<span style="font-size: large;">defined class A</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> A(1, "kalyan")</span><br />
<span style="font-size: large;">res13: A = a is 1, b is kalyan</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> new A(1, "kalyan")</span><br />
<span style="font-size: large;">res14: A = a is 1, b is kalyan</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val a1 = A(1, "kalyan")</span><br />
<span style="font-size: large;">a1: A = a is 1, b is kalyan</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val a2 = new A(1, "kalyan")</span><br />
<span style="font-size: large;">a2: A = a is 1, b is kalyan</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">--------------------------------------------------</span><br />
<span style="font-size: large;">class B (a :Int, b : String) {</span><br />
<span style="font-size: large;"> override def toString() : String = s"a is $a, b is $b"</span><br />
<span style="font-size: large;">}</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">---------------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> class B (a :Int, b : String) {</span><br />
<span style="font-size: large;"> | override def toString() : String = s"a is $a, b is $b"</span><br />
<span style="font-size: large;"> | }</span><br />
<span style="font-size: large;">defined class B</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> B(1, "kalyan")</span><br />
<span style="font-size: large;"><console>:12: error: not found: value B</span><br />
<span style="font-size: large;"> B(1, "kalyan")</span><br />
<span style="font-size: large;"> ^</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> new B(1, "kalyan")</span><br />
<span style="font-size: large;">res16: B = a is 1, b is kalyan</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val b1 = new B(1, "kalyan")</span><br />
<span style="font-size: large;">b1: B = a is 1, b is kalyan</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">---------------------------------------------------</span></div>
Anonymoushttp://www.blogger.com/profile/01685371156177399870noreply@blogger.com26tag:blogger.com,1999:blog-2182833570422175384.post-87285831064741936882017-03-17T16:41:00.002-07:002017-03-17T16:47:57.446-07:00Scala interview questions<div dir="ltr" style="text-align: left;" trbidi="on">
<h4 id="1-what-is-the-difference-between-a-var-a-val-and-def" style="box-sizing: inherit; color: #111111; float: none; font-family: "Open Sans", Helvetica, Arial, sans-serif; font-size: 1.125em; line-height: 1.22222; margin: 1rem auto; width: 800px;">
1. What is the difference between a <code class="highlighter-rouge" style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;">var</code>, a <code class="highlighter-rouge" style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;">val</code> and <code class="highlighter-rouge" style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;">def</code>?<a class="header-link" href="http://pedrorijo.com/blog/scala-interview-questions/#1-what-is-the-difference-between-a-var-a-val-and-def" style="box-sizing: inherit; color: seagreen; font-size: 0.8em; left: 0.5em; opacity: 0; position: relative; transition: opacity 0.2s ease-in-out 0.1s;"><span class="fa fa-link" style="box-sizing: inherit; display: inline-block; font-family: "fontawesome"; font-size: inherit; font-stretch: normal; font-weight: normal; line-height: 1; transform: translate(0px , 0px);"></span></a></h4>
<div style="box-sizing: inherit; color: #111111; float: none; font-family: "PT Serif", Georgia, Times, serif; font-size: 18px; margin-bottom: 2rem; margin-left: auto; margin-right: 17.0117px; width: 800px;">
A <code class="highlighter-rouge" style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;">var</code> is a variable. It’s a mutable reference to a value. Since it’s mutable, its value may change through the program lifetime. Keep in mind that the variable type cannot change in Scala. You may say that a <code class="highlighter-rouge" style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;">var</code> behaves similarly to Java variables.</div>
<div style="box-sizing: inherit; color: #111111; float: none; font-family: "PT Serif", Georgia, Times, serif; font-size: 18px; margin-bottom: 2rem; margin-left: auto; margin-right: 17.0117px; width: 800px;">
A <code class="highlighter-rouge" style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;">val</code> is a value. It’s an immutable reference, meaning that its value never changes. Once assigned it will always keep the same value. It’s similar to constants in another languages.</div>
<div style="box-sizing: inherit; color: #111111; float: none; font-family: "PT Serif", Georgia, Times, serif; font-size: 18px; margin-bottom: 2rem; margin-left: auto; margin-right: 17.0117px; width: 800px;">
A <code class="highlighter-rouge" style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;">def</code> is a function declaration. It is evaluated on call.</div>
<div class="language-scala highlighter-rouge" style="box-sizing: inherit; color: #111111; font-family: "PT Serif", Georgia, Times, serif; font-size: 18px;">
<pre class="highlight" style="background: rgb(52, 54, 66); border-radius: 3px; box-sizing: inherit; clear: both; color: #c1c2c3; font-family: monospace, monospace; font-size: 1em; margin-bottom: 2rem; margin-top: 1.5em; overflow: auto; padding: 20px; word-wrap: normal;"><code style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;"><span class="k" style="box-sizing: inherit; color: #729fcf;">var</span> <span class="n" style="box-sizing: inherit;">x</span> <span class="k" style="box-sizing: inherit; color: #729fcf;">=</span> <span class="mi" style="box-sizing: inherit; color: #8ae234;">3</span> <span class="c1" style="box-sizing: inherit; color: #8d9684;">// x is of type Int. If you force it to be of type Any then this example would work
</span><span class="n" style="box-sizing: inherit;">x</span> <span class="k" style="box-sizing: inherit; color: #729fcf;">=</span> <span class="mi" style="box-sizing: inherit; color: #8ae234;">4</span> <span class="c1" style="box-sizing: inherit; color: #8d9684;">// accepted by the language/compiler
</span><span class="n" style="box-sizing: inherit;">x</span> <span class="k" style="box-sizing: inherit; color: #729fcf;">=</span> <span class="s" style="box-sizing: inherit;">"error"</span> <span class="c1" style="box-sizing: inherit; color: #8d9684;">// not accepted by the compiler
</span>
<span class="k" style="box-sizing: inherit; color: #729fcf;">val</span> <span class="n" style="box-sizing: inherit;">y</span> <span class="k" style="box-sizing: inherit; color: #729fcf;">=</span> <span class="mi" style="box-sizing: inherit; color: #8ae234;">3</span>
<span class="n" style="box-sizing: inherit;">y</span> <span class="k" style="box-sizing: inherit; color: #729fcf;">=</span> <span class="mi" style="box-sizing: inherit; color: #8ae234;">4</span> <span class="c1" style="box-sizing: inherit; color: #8d9684;">// would produce an error 'error: reassignment to val'
</span>
<span class="k" style="box-sizing: inherit; color: #729fcf;">def</span> <span class="n" style="box-sizing: inherit;">fun</span><span class="o" style="box-sizing: inherit; color: #989daa;">(</span><span class="n" style="box-sizing: inherit;">name</span><span class="k" style="box-sizing: inherit; color: #729fcf;">:</span> <span class="kt" style="box-sizing: inherit; color: #e3e7df;">String</span><span class="o" style="box-sizing: inherit; color: #989daa;">)</span> <span class="k" style="box-sizing: inherit; color: #729fcf;">=</span> <span class="s" style="box-sizing: inherit;">"Hey! My name is: "</span> <span class="o" style="box-sizing: inherit; color: #989daa;">+</span> <span class="n" style="box-sizing: inherit;">name</span>
<span class="n" style="box-sizing: inherit;">fun</span><span class="o" style="box-sizing: inherit; color: #989daa;">(</span><span class="s" style="box-sizing: inherit;">"Scala"</span><span class="o" style="box-sizing: inherit; color: #989daa;">)</span> <span class="c1" style="box-sizing: inherit; color: #8d9684;">// "Hey! My name is: Scala"
</span><span class="n" style="box-sizing: inherit;">fun</span><span class="o" style="box-sizing: inherit; color: #989daa;">(</span><span class="s" style="box-sizing: inherit;">"Java"</span><span class="o" style="box-sizing: inherit; color: #989daa;">)</span> <span class="c1" style="box-sizing: inherit; color: #8d9684;">// "Hey! My name is: Java"
</span></code></pre>
</div>
<div style="box-sizing: inherit; color: #111111; float: none; font-family: "PT Serif", Georgia, Times, serif; font-size: 18px; margin-bottom: 2rem; margin-left: auto; margin-right: 17.0117px; width: 800px;">
<strong style="box-sizing: inherit;">Bonus:</strong> what’s a <code class="highlighter-rouge" style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;">lazy val</code>? It’s almost like a <code class="highlighter-rouge" style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;">val</code>, but its value is only computed when needed. It’s specially useful to avoid heavy computations (using <a href="https://en.wikipedia.org/wiki/Short-circuit_evaluation" style="box-sizing: inherit; color: seagreen;">short-circuit</a> for instance).</div>
<div class="language-scala highlighter-rouge" style="box-sizing: inherit; color: #111111; font-family: "PT Serif", Georgia, Times, serif; font-size: 18px;">
<pre class="highlight" style="background: rgb(52, 54, 66); border-radius: 3px; box-sizing: inherit; clear: both; color: #c1c2c3; font-family: monospace, monospace; font-size: 1em; margin-bottom: 2rem; margin-top: 1.5em; overflow: auto; padding: 20px; word-wrap: normal;"><code style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;"><span class="k" style="box-sizing: inherit; color: #729fcf;">lazy</span> <span class="k" style="box-sizing: inherit; color: #729fcf;">val</span> <span class="n" style="box-sizing: inherit;">x</span> <span class="k" style="box-sizing: inherit; color: #729fcf;">=</span> <span class="o" style="box-sizing: inherit; color: #989daa;">{</span>
<span class="n" style="box-sizing: inherit;">println</span><span class="o" style="box-sizing: inherit; color: #989daa;">(</span><span class="s" style="box-sizing: inherit;">"computing x"</span><span class="o" style="box-sizing: inherit; color: #989daa;">)</span>
<span class="mi" style="box-sizing: inherit; color: #8ae234;">3</span>
<span class="o" style="box-sizing: inherit; color: #989daa;">}</span>
<span class="k" style="box-sizing: inherit; color: #729fcf;">val</span> <span class="n" style="box-sizing: inherit;">y</span> <span class="k" style="box-sizing: inherit; color: #729fcf;">=</span> <span class="o" style="box-sizing: inherit; color: #989daa;">{</span>
<span class="n" style="box-sizing: inherit;">println</span><span class="o" style="box-sizing: inherit; color: #989daa;">(</span><span class="s" style="box-sizing: inherit;">"computing y"</span><span class="o" style="box-sizing: inherit; color: #989daa;">)</span>
<span class="mi" style="box-sizing: inherit; color: #8ae234;">10</span>
<span class="o" style="box-sizing: inherit; color: #989daa;">}</span>
<span class="n" style="box-sizing: inherit;">y</span> <span class="o" style="box-sizing: inherit; color: #989daa;">+</span> <span class="n" style="box-sizing: inherit;">y</span> <span class="c1" style="box-sizing: inherit; color: #8d9684;">// x was still not computed, "computing x" was not yet printed
</span><span class="n" style="box-sizing: inherit;">x</span> <span class="o" style="box-sizing: inherit; color: #989daa;">+</span> <span class="n" style="box-sizing: inherit;">x</span> <span class="c1" style="box-sizing: inherit; color: #8d9684;">// x is required, x is going to be computed, a "computing x" message will be printed *once*
</span></code></pre>
</div>
<h4 id="2-what-is-the-difference-between-a-trait-and-an-abstract-class" style="box-sizing: inherit; color: #111111; float: none; font-family: "Open Sans", Helvetica, Arial, sans-serif; font-size: 1.125em; line-height: 1.22222; margin: 1rem auto; width: 800px;">
2. What is the difference between a <code class="highlighter-rouge" style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;">trait</code> and an <code class="highlighter-rouge" style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;">abstract class</code>?<a class="header-link" href="http://pedrorijo.com/blog/scala-interview-questions/#2-what-is-the-difference-between-a-trait-and-an-abstract-class" style="box-sizing: inherit; color: seagreen; font-size: 0.8em; left: 0.5em; opacity: 0; position: relative; transition: opacity 0.2s ease-in-out 0.1s;"><span class="fa fa-link" style="box-sizing: inherit; display: inline-block; font-family: "fontawesome"; font-size: inherit; font-stretch: normal; font-weight: normal; line-height: 1; transform: translate(0px , 0px);"></span></a></h4>
<div style="box-sizing: inherit; color: #111111; float: none; font-family: "PT Serif", Georgia, Times, serif; font-size: 18px; margin-bottom: 2rem; margin-left: auto; margin-right: 17.0117px; width: 800px;">
The first difference is that a <code class="highlighter-rouge" style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;">class</code> can only extend one other <code class="highlighter-rouge" style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;">class</code>, but an unlimited number of <code class="highlighter-rouge" style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;">traits</code>.</div>
<div style="box-sizing: inherit; color: #111111; float: none; font-family: "PT Serif", Georgia, Times, serif; font-size: 18px; margin-bottom: 2rem; margin-left: auto; margin-right: 17.0117px; width: 800px;">
While <code class="highlighter-rouge" style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;">traits</code> only support <code class="highlighter-rouge" style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;">type parameters</code>, <code class="highlighter-rouge" style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;">abstract classes</code> can have <code class="highlighter-rouge" style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;">constructor parameters</code>.</div>
<div style="box-sizing: inherit; color: #111111; float: none; font-family: "PT Serif", Georgia, Times, serif; font-size: 18px; margin-bottom: 2rem; margin-left: auto; margin-right: 17.0117px; width: 800px;">
Also, <code class="highlighter-rouge" style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;">abstract classes</code> are interoperable with Java, while <code class="highlighter-rouge" style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;">traits</code> are only interoperable with Java if they do not contain any implementation.</div>
<h4 id="3-what-is-the-difference-between-an-object-and-a-class" style="box-sizing: inherit; color: #111111; float: none; font-family: "Open Sans", Helvetica, Arial, sans-serif; font-size: 1.125em; line-height: 1.22222; margin: 1rem auto; width: 800px;">
3. What is the difference between an <code class="highlighter-rouge" style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;">object</code> and a <code class="highlighter-rouge" style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;">class</code>?<a class="header-link" href="http://pedrorijo.com/blog/scala-interview-questions/#3-what-is-the-difference-between-an-object-and-a-class" style="box-sizing: inherit; color: seagreen; font-size: 0.8em; left: 0.5em; opacity: 0; position: relative; transition: opacity 0.2s ease-in-out 0.1s;"><span class="fa fa-link" style="box-sizing: inherit; display: inline-block; font-family: "fontawesome"; font-size: inherit; font-stretch: normal; font-weight: normal; line-height: 1; transform: translate(0px , 0px);"></span></a></h4>
<div style="box-sizing: inherit; color: #111111; float: none; font-family: "PT Serif", Georgia, Times, serif; font-size: 18px; margin-bottom: 2rem; margin-left: auto; margin-right: 17.0117px; width: 800px;">
An <code class="highlighter-rouge" style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;">object</code> is a singleton instance of a <code class="highlighter-rouge" style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;">class</code>. It does not need to be instantiated by the developer.</div>
<div style="box-sizing: inherit; color: #111111; float: none; font-family: "PT Serif", Georgia, Times, serif; font-size: 18px; margin-bottom: 2rem; margin-left: auto; margin-right: 17.0117px; width: 800px;">
If an <code class="highlighter-rouge" style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;">object</code> has the same name that a <code class="highlighter-rouge" style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;">class</code>, the <code class="highlighter-rouge" style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;">object</code> is called a <code class="highlighter-rouge" style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;">companion object</code>.</div>
<div class="language-scala highlighter-rouge" style="box-sizing: inherit; color: #111111; font-family: "PT Serif", Georgia, Times, serif; font-size: 18px;">
<pre class="highlight" style="background: rgb(52, 54, 66); border-radius: 3px; box-sizing: inherit; clear: both; color: #c1c2c3; font-family: monospace, monospace; font-size: 1em; margin-bottom: 2rem; margin-top: 1.5em; overflow: auto; padding: 20px; word-wrap: normal;"><code style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;"><span class="k" style="box-sizing: inherit; color: #729fcf;">class</span> <span class="nc" style="box-sizing: inherit;">MyClass</span><span class="o" style="box-sizing: inherit; color: #989daa;">(</span><span class="n" style="box-sizing: inherit;">number</span><span class="k" style="box-sizing: inherit; color: #729fcf;">:</span> <span class="kt" style="box-sizing: inherit; color: #e3e7df;">Int</span><span class="o" style="box-sizing: inherit; color: #989daa;">,</span> <span class="n" style="box-sizing: inherit;">text</span><span class="k" style="box-sizing: inherit; color: #729fcf;">:</span> <span class="kt" style="box-sizing: inherit; color: #e3e7df;">String</span><span class="o" style="box-sizing: inherit; color: #989daa;">)</span> <span class="o" style="box-sizing: inherit; color: #989daa;">{</span>
<span class="k" style="box-sizing: inherit; color: #729fcf;">def</span> <span class="n" style="box-sizing: inherit;">classMethod</span><span class="o" style="box-sizing: inherit; color: #989daa;">()</span> <span class="k" style="box-sizing: inherit; color: #729fcf;">=</span> <span class="o" style="box-sizing: inherit; color: #989daa;">???</span>
<span class="o" style="box-sizing: inherit; color: #989daa;">}</span>
<span class="k" style="box-sizing: inherit; color: #729fcf;">object</span> <span class="nc" style="box-sizing: inherit;">MyObject</span> <span class="o" style="box-sizing: inherit; color: #989daa;">{</span>
<span class="k" style="box-sizing: inherit; color: #729fcf;">def</span> <span class="n" style="box-sizing: inherit;">objectMethod</span><span class="o" style="box-sizing: inherit; color: #989daa;">()</span> <span class="k" style="box-sizing: inherit; color: #729fcf;">=</span> <span class="o" style="box-sizing: inherit; color: #989daa;">???</span>
<span class="o" style="box-sizing: inherit; color: #989daa;">}</span>
<span class="k" style="box-sizing: inherit; color: #729fcf;">new</span> <span class="nc" style="box-sizing: inherit;">MyClass</span><span class="o" style="box-sizing: inherit; color: #989daa;">(</span><span class="mi" style="box-sizing: inherit; color: #8ae234;">3</span><span class="o" style="box-sizing: inherit; color: #989daa;">,</span> <span class="s" style="box-sizing: inherit;">"text"</span><span class="o" style="box-sizing: inherit; color: #989daa;">).</span><span class="n" style="box-sizing: inherit;">classMethod</span><span class="o" style="box-sizing: inherit; color: #989daa;">()</span>
<span class="nc" style="box-sizing: inherit;">MyClass</span><span class="o" style="box-sizing: inherit; color: #989daa;">.</span><span class="n" style="box-sizing: inherit;">classMethod</span><span class="o" style="box-sizing: inherit; color: #989daa;">()</span> <span class="c1" style="box-sizing: inherit; color: #8d9684;">// won't compile
</span><span class="nc" style="box-sizing: inherit;">MyObject</span><span class="o" style="box-sizing: inherit; color: #989daa;">.</span><span class="n" style="box-sizing: inherit;">objectMethod</span><span class="o" style="box-sizing: inherit; color: #989daa;">()</span> <span class="c1" style="box-sizing: inherit; color: #8d9684;">// you don't need to create an instance to call the method
</span></code></pre>
</div>
<h4 id="4-what-is-a-case-class" style="box-sizing: inherit; color: #111111; float: none; font-family: "Open Sans", Helvetica, Arial, sans-serif; font-size: 1.125em; line-height: 1.22222; margin: 1rem auto; width: 800px;">
4. What is a <code class="highlighter-rouge" style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;">case class</code>?<a class="header-link" href="http://pedrorijo.com/blog/scala-interview-questions/#4-what-is-a-case-class" style="box-sizing: inherit; color: seagreen; font-size: 0.8em; left: 0.5em; opacity: 0; position: relative; transition: opacity 0.2s ease-in-out 0.1s;"><span class="fa fa-link" style="box-sizing: inherit; display: inline-block; font-family: "fontawesome"; font-size: inherit; font-stretch: normal; font-weight: normal; line-height: 1; transform: translate(0px , 0px);"></span></a></h4>
<div style="box-sizing: inherit; color: #111111; float: none; font-family: "PT Serif", Georgia, Times, serif; font-size: 18px; margin-bottom: 2rem; margin-left: auto; margin-right: 17.0117px; width: 800px;">
A <code class="highlighter-rouge" style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;">case class</code> is syntactic sugar for a class that is immutable and decomposable through pattern matching (because they have an <code class="highlighter-rouge" style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;">apply</code> and <code class="highlighter-rouge" style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;">unapply</code> methods). Being decomposable means it is possible to extract its constructors parameters in the pattern matching.</div>
<div style="box-sizing: inherit; color: #111111; float: none; font-family: "PT Serif", Georgia, Times, serif; font-size: 18px; margin-bottom: 2rem; margin-left: auto; margin-right: 17.0117px; width: 800px;">
Case classes contain a companion object which holds the <code class="highlighter-rouge" style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;">apply</code> method. This fact makes possible to instantiate a case class without the <code class="highlighter-rouge" style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;">new</code> keyword. They also come with some helper methods like the <code class="highlighter-rouge" style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;">.copy</code> method, that eases the creation of a slightly changed copy from the original.</div>
<div style="box-sizing: inherit; color: #111111; float: none; font-family: "PT Serif", Georgia, Times, serif; font-size: 18px; margin-bottom: 2rem; margin-left: auto; margin-right: 17.0117px; width: 800px;">
Finally, case classes are compared by structural equality instead of being compared by reference, i.e., they come with a method which compares two case classes by their values/fields, instead of comparing just the references.</div>
<div style="box-sizing: inherit; color: #111111; float: none; font-family: "PT Serif", Georgia, Times, serif; font-size: 18px; margin-bottom: 2rem; margin-left: auto; margin-right: 17.0117px; width: 800px;">
Case classes are specially useful to be used as DTOs.</div>
<div class="language-scala highlighter-rouge" style="box-sizing: inherit; color: #111111; font-family: "PT Serif", Georgia, Times, serif; font-size: 18px;">
<pre class="highlight" style="background: rgb(52, 54, 66); border-radius: 3px; box-sizing: inherit; clear: both; color: #c1c2c3; font-family: monospace, monospace; font-size: 1em; margin-bottom: 2rem; margin-top: 1.5em; overflow: auto; padding: 20px; word-wrap: normal;"><code style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;"><span class="k" style="box-sizing: inherit; color: #729fcf;">case</span> <span class="k" style="box-sizing: inherit; color: #729fcf;">class</span> <span class="nc" style="box-sizing: inherit;">MyCaseClass</span><span class="o" style="box-sizing: inherit; color: #989daa;">(</span><span class="n" style="box-sizing: inherit;">number</span><span class="k" style="box-sizing: inherit; color: #729fcf;">:</span> <span class="kt" style="box-sizing: inherit; color: #e3e7df;">Int</span><span class="o" style="box-sizing: inherit; color: #989daa;">,</span> <span class="n" style="box-sizing: inherit;">text</span><span class="k" style="box-sizing: inherit; color: #729fcf;">:</span> <span class="kt" style="box-sizing: inherit; color: #e3e7df;">String</span><span class="o" style="box-sizing: inherit; color: #989daa;">,</span> <span class="n" style="box-sizing: inherit;">others</span><span class="k" style="box-sizing: inherit; color: #729fcf;">:</span> <span class="kt" style="box-sizing: inherit; color: #e3e7df;">List</span><span class="o" style="box-sizing: inherit; color: #989daa;">[</span><span class="kt" style="box-sizing: inherit; color: #e3e7df;">Int</span><span class="o" style="box-sizing: inherit; color: #989daa;">])</span>
<span class="k" style="box-sizing: inherit; color: #729fcf;">val</span> <span class="n" style="box-sizing: inherit;">dto</span> <span class="k" style="box-sizing: inherit; color: #729fcf;">=</span> <span class="nc" style="box-sizing: inherit;">MyCaseClass</span><span class="o" style="box-sizing: inherit; color: #989daa;">(</span><span class="mi" style="box-sizing: inherit; color: #8ae234;">3</span><span class="o" style="box-sizing: inherit; color: #989daa;">,</span> <span class="s" style="box-sizing: inherit;">"text"</span><span class="o" style="box-sizing: inherit; color: #989daa;">,</span> <span class="nc" style="box-sizing: inherit;">List</span><span class="o" style="box-sizing: inherit; color: #989daa;">.</span><span class="n" style="box-sizing: inherit;">empty</span><span class="o" style="box-sizing: inherit; color: #989daa;">)</span>
<span class="n" style="box-sizing: inherit;">dto</span><span class="o" style="box-sizing: inherit; color: #989daa;">.</span><span class="n" style="box-sizing: inherit;">copy</span><span class="o" style="box-sizing: inherit; color: #989daa;">(</span><span class="n" style="box-sizing: inherit;">number</span> <span class="k" style="box-sizing: inherit; color: #729fcf;">=</span> <span class="mi" style="box-sizing: inherit; color: #8ae234;">5</span><span class="o" style="box-sizing: inherit; color: #989daa;">)</span> <span class="c1" style="box-sizing: inherit; color: #8d9684;">// will produce an instance equal to the original, with number = 5 instead of 3
</span>
<span class="k" style="box-sizing: inherit; color: #729fcf;">val</span> <span class="n" style="box-sizing: inherit;">dto2</span> <span class="k" style="box-sizing: inherit; color: #729fcf;">=</span> <span class="nc" style="box-sizing: inherit;">MyCaseClass</span><span class="o" style="box-sizing: inherit; color: #989daa;">(</span><span class="mi" style="box-sizing: inherit; color: #8ae234;">3</span><span class="o" style="box-sizing: inherit; color: #989daa;">,</span> <span class="s" style="box-sizing: inherit;">"text"</span><span class="o" style="box-sizing: inherit; color: #989daa;">,</span> <span class="nc" style="box-sizing: inherit;">List</span><span class="o" style="box-sizing: inherit; color: #989daa;">.</span><span class="n" style="box-sizing: inherit;">empty</span><span class="o" style="box-sizing: inherit; color: #989daa;">)</span>
<span class="n" style="box-sizing: inherit;">dto</span> <span class="o" style="box-sizing: inherit; color: #989daa;">==</span> <span class="n" style="box-sizing: inherit;">dto2</span> <span class="c1" style="box-sizing: inherit; color: #8d9684;">// will return true even if different references
</span>
<span class="k" style="box-sizing: inherit; color: #729fcf;">class</span> <span class="nc" style="box-sizing: inherit;">MyClass</span><span class="o" style="box-sizing: inherit; color: #989daa;">(</span><span class="n" style="box-sizing: inherit;">number</span><span class="k" style="box-sizing: inherit; color: #729fcf;">:</span> <span class="kt" style="box-sizing: inherit; color: #e3e7df;">Int</span><span class="o" style="box-sizing: inherit; color: #989daa;">,</span> <span class="n" style="box-sizing: inherit;">text</span><span class="k" style="box-sizing: inherit; color: #729fcf;">:</span> <span class="kt" style="box-sizing: inherit; color: #e3e7df;">String</span><span class="o" style="box-sizing: inherit; color: #989daa;">,</span> <span class="n" style="box-sizing: inherit;">others</span><span class="k" style="box-sizing: inherit; color: #729fcf;">:</span> <span class="kt" style="box-sizing: inherit; color: #e3e7df;">List</span><span class="o" style="box-sizing: inherit; color: #989daa;">[</span><span class="kt" style="box-sizing: inherit; color: #e3e7df;">Int</span><span class="o" style="box-sizing: inherit; color: #989daa;">])</span> <span class="o" style="box-sizing: inherit; color: #989daa;">{}</span>
<span class="k" style="box-sizing: inherit; color: #729fcf;">val</span> <span class="n" style="box-sizing: inherit;">c1</span> <span class="k" style="box-sizing: inherit; color: #729fcf;">=</span> <span class="k" style="box-sizing: inherit; color: #729fcf;">new</span> <span class="nc" style="box-sizing: inherit;">MyClass</span><span class="o" style="box-sizing: inherit; color: #989daa;">(</span><span class="mi" style="box-sizing: inherit; color: #8ae234;">1</span><span class="o" style="box-sizing: inherit; color: #989daa;">,</span> <span class="s" style="box-sizing: inherit;">"txt"</span><span class="o" style="box-sizing: inherit; color: #989daa;">,</span> <span class="nc" style="box-sizing: inherit;">List</span><span class="o" style="box-sizing: inherit; color: #989daa;">.</span><span class="n" style="box-sizing: inherit;">empty</span><span class="o" style="box-sizing: inherit; color: #989daa;">)</span>
<span class="k" style="box-sizing: inherit; color: #729fcf;">val</span> <span class="n" style="box-sizing: inherit;">c2</span> <span class="k" style="box-sizing: inherit; color: #729fcf;">=</span> <span class="k" style="box-sizing: inherit; color: #729fcf;">new</span> <span class="nc" style="box-sizing: inherit;">MyClass</span><span class="o" style="box-sizing: inherit; color: #989daa;">(</span><span class="mi" style="box-sizing: inherit; color: #8ae234;">1</span><span class="o" style="box-sizing: inherit; color: #989daa;">,</span> <span class="s" style="box-sizing: inherit;">"txt"</span><span class="o" style="box-sizing: inherit; color: #989daa;">,</span> <span class="nc" style="box-sizing: inherit;">List</span><span class="o" style="box-sizing: inherit; color: #989daa;">.</span><span class="n" style="box-sizing: inherit;">empty</span><span class="o" style="box-sizing: inherit; color: #989daa;">)</span>
<span class="n" style="box-sizing: inherit;">c1</span> <span class="o" style="box-sizing: inherit; color: #989daa;">==</span> <span class="n" style="box-sizing: inherit;">c2</span> <span class="c1" style="box-sizing: inherit; color: #8d9684;">// will return false because they are different references
</span></code></pre>
</div>
<h4 id="5-what-is-the-difference-between-a-java-future-and-a-scala-future" style="box-sizing: inherit; color: #111111; float: none; font-family: "Open Sans", Helvetica, Arial, sans-serif; font-size: 1.125em; line-height: 1.22222; margin: 1rem auto; width: 800px;">
5. What is the difference between a Java future and a Scala future?<a class="header-link" href="http://pedrorijo.com/blog/scala-interview-questions/#5-what-is-the-difference-between-a-java-future-and-a-scala-future" style="box-sizing: inherit; color: seagreen; font-size: 0.8em; left: 0.5em; opacity: 0; position: relative; transition: opacity 0.2s ease-in-out 0.1s;"><span class="fa fa-link" style="box-sizing: inherit; display: inline-block; font-family: "fontawesome"; font-size: inherit; font-stretch: normal; font-weight: normal; line-height: 1; transform: translate(0px , 0px);"></span></a></h4>
<div style="box-sizing: inherit; color: #111111; float: none; font-family: "PT Serif", Georgia, Times, serif; font-size: 18px; margin-bottom: 2rem; margin-left: auto; margin-right: 17.0117px; width: 800px;">
This one I had to google a little about it. I have never used Java futures, so it was impossible for me to answer.</div>
<div style="box-sizing: inherit; color: #111111; float: none; font-family: "PT Serif", Georgia, Times, serif; font-size: 18px; margin-bottom: 2rem; margin-left: auto; margin-right: 17.0117px; width: 800px;">
Obviously, I was not the first to search for the differences between both futures. I found a <a href="http://stackoverflow.com/a/31368177/4398050" style="box-sizing: inherit; color: seagreen;">really clean and simple answer on StackOverflow</a> which highlights the fact that the Scala implementation is in fact asynchronous without blocking, while in Java you can’t get the future value without blocking.</div>
<div style="box-sizing: inherit; color: #111111; float: none; font-family: "PT Serif", Georgia, Times, serif; font-size: 18px; margin-bottom: 2rem; margin-left: auto; margin-right: 17.0117px; width: 800px;">
Scala provides an API to manipulate the future as a monad or by attaching callbacks for completion. Unless you decide to use the <a href="http://docs.scala-lang.org/overviews/core/futures.html#blocking-outside-the-future" style="box-sizing: inherit; color: seagreen;"><code class="highlighter-rouge" style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;">Await</code></a>, you won’t block your program using Scala futures.</div>
<h4 id="6-what-is-the-difference-between-unapply-and-apply-when-would-you-use-them" style="box-sizing: inherit; color: #111111; float: none; font-family: "Open Sans", Helvetica, Arial, sans-serif; font-size: 1.125em; line-height: 1.22222; margin: 1rem auto; width: 800px;">
6. What is the difference between <code class="highlighter-rouge" style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;">unapply</code> and <code class="highlighter-rouge" style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;">apply</code>, when would you use them?<a class="header-link" href="http://pedrorijo.com/blog/scala-interview-questions/#6-what-is-the-difference-between-unapply-and-apply-when-would-you-use-them" style="box-sizing: inherit; color: seagreen; font-size: 0.8em; left: 0.5em; opacity: 0; position: relative; transition: opacity 0.2s ease-in-out 0.1s;"><span class="fa fa-link" style="box-sizing: inherit; display: inline-block; font-family: "fontawesome"; font-size: inherit; font-stretch: normal; font-weight: normal; line-height: 1; transform: translate(0px , 0px);"></span></a></h4>
<div style="box-sizing: inherit; color: #111111; float: none; font-family: "PT Serif", Georgia, Times, serif; font-size: 18px; margin-bottom: 2rem; margin-left: auto; margin-right: 17.0117px; width: 800px;">
<code class="highlighter-rouge" style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;">unapply</code> is a method that needs to be implemented by an object in order for it to be an <a href="http://docs.scala-lang.org/tutorials/tour/extractor-objects.html" style="box-sizing: inherit; color: seagreen;">extractor</a>. Extractors are used in pattern matching to access an object constructor parameters. It’s the opposite of a constructor.</div>
<div style="box-sizing: inherit; color: #111111; float: none; font-family: "PT Serif", Georgia, Times, serif; font-size: 18px; margin-bottom: 2rem; margin-left: auto; margin-right: 17.0117px; width: 800px;">
The <code class="highlighter-rouge" style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;">apply</code> method is a special method that allows you to write <code class="highlighter-rouge" style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;">someObject(params)</code> instead of <code class="highlighter-rouge" style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;">someObject.apply(params)</code>. This usage is common in <code class="highlighter-rouge" style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;">case classes</code>, which contain a companion object with the <code class="highlighter-rouge" style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;">apply</code> method that allows the nice syntax to instantiate a new object without the <code class="highlighter-rouge" style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;">new</code> keyword.</div>
<h4 id="7-what-is-a-companion-object" style="box-sizing: inherit; color: #111111; float: none; font-family: "Open Sans", Helvetica, Arial, sans-serif; font-size: 1.125em; line-height: 1.22222; margin: 1rem auto; width: 800px;">
7. What is a companion object?<a class="header-link" href="http://pedrorijo.com/blog/scala-interview-questions/#7-what-is-a-companion-object" style="box-sizing: inherit; color: seagreen; font-size: 0.8em; left: 0.5em; opacity: 0; position: relative; transition: opacity 0.2s ease-in-out 0.1s;"><span class="fa fa-link" style="box-sizing: inherit; display: inline-block; font-family: "fontawesome"; font-size: inherit; font-stretch: normal; font-weight: normal; line-height: 1; transform: translate(0px , 0px);"></span></a></h4>
<div style="box-sizing: inherit; color: #111111; float: none; font-family: "PT Serif", Georgia, Times, serif; font-size: 18px; margin-bottom: 2rem; margin-left: auto; margin-right: 17.0117px; width: 800px;">
If an <code class="highlighter-rouge" style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;">object</code> has the same name that a <code class="highlighter-rouge" style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;">class</code>, the object is called a <code class="highlighter-rouge" style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;">companion object</code>. A companion object has access to methods of private visibility of the class, and the class also has access to private methods of the object. Doing the comparison with Java, companion objects hold the “static methods” of a class.</div>
<div style="box-sizing: inherit; color: #111111; float: none; font-family: "PT Serif", Georgia, Times, serif; font-size: 18px; margin-bottom: 2rem; margin-left: auto; margin-right: 17.0117px; width: 800px;">
Note that the companion object has to be defined in the same source file that the class.</div>
<div class="language-scala highlighter-rouge" style="box-sizing: inherit; color: #111111; font-family: "PT Serif", Georgia, Times, serif; font-size: 18px;">
<pre class="highlight" style="background: rgb(52, 54, 66); border-radius: 3px; box-sizing: inherit; clear: both; color: #c1c2c3; font-family: monospace, monospace; font-size: 1em; margin-bottom: 2rem; margin-top: 1.5em; overflow: auto; padding: 20px; word-wrap: normal;"><code style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;"><span class="k" style="box-sizing: inherit; color: #729fcf;">class</span> <span class="nc" style="box-sizing: inherit;">MyClass</span><span class="o" style="box-sizing: inherit; color: #989daa;">(</span><span class="n" style="box-sizing: inherit;">number</span><span class="k" style="box-sizing: inherit; color: #729fcf;">:</span> <span class="kt" style="box-sizing: inherit; color: #e3e7df;">Int</span><span class="o" style="box-sizing: inherit; color: #989daa;">,</span> <span class="n" style="box-sizing: inherit;">text</span><span class="k" style="box-sizing: inherit; color: #729fcf;">:</span> <span class="kt" style="box-sizing: inherit; color: #e3e7df;">String</span><span class="o" style="box-sizing: inherit; color: #989daa;">)</span> <span class="o" style="box-sizing: inherit; color: #989daa;">{</span>
<span class="k" style="box-sizing: inherit; color: #729fcf;">private</span> <span class="k" style="box-sizing: inherit; color: #729fcf;">val</span> <span class="n" style="box-sizing: inherit;">classSecret</span> <span class="k" style="box-sizing: inherit; color: #729fcf;">=</span> <span class="mi" style="box-sizing: inherit; color: #8ae234;">42</span>
<span class="k" style="box-sizing: inherit; color: #729fcf;">def</span> <span class="n" style="box-sizing: inherit;">x</span> <span class="k" style="box-sizing: inherit; color: #729fcf;">=</span> <span class="nc" style="box-sizing: inherit;">MyClass</span><span class="o" style="box-sizing: inherit; color: #989daa;">.</span><span class="n" style="box-sizing: inherit;">objectSecret</span> <span class="o" style="box-sizing: inherit; color: #989daa;">+</span> <span class="s" style="box-sizing: inherit;">"?"</span> <span class="c1" style="box-sizing: inherit; color: #8d9684;">// MyClass.objectSecret is accessible because it's inside the class. External classes/objects can't access it
</span><span class="o" style="box-sizing: inherit; color: #989daa;">}</span>
<span class="k" style="box-sizing: inherit; color: #729fcf;">object</span> <span class="nc" style="box-sizing: inherit;">MyClass</span> <span class="o" style="box-sizing: inherit; color: #989daa;">{</span> <span class="c1" style="box-sizing: inherit; color: #8d9684;">// it's a companion object because it has the same name
</span> <span class="k" style="box-sizing: inherit; color: #729fcf;">private</span> <span class="k" style="box-sizing: inherit; color: #729fcf;">val</span> <span class="n" style="box-sizing: inherit;">objectSecret</span> <span class="k" style="box-sizing: inherit; color: #729fcf;">=</span> <span class="s" style="box-sizing: inherit;">"42"</span>
<span class="k" style="box-sizing: inherit; color: #729fcf;">def</span> <span class="n" style="box-sizing: inherit;">y</span><span class="o" style="box-sizing: inherit; color: #989daa;">(</span><span class="n" style="box-sizing: inherit;">arg</span><span class="k" style="box-sizing: inherit; color: #729fcf;">:</span> <span class="kt" style="box-sizing: inherit; color: #e3e7df;">MyClass</span><span class="o" style="box-sizing: inherit; color: #989daa;">)</span> <span class="k" style="box-sizing: inherit; color: #729fcf;">=</span> <span class="n" style="box-sizing: inherit;">arg</span><span class="o" style="box-sizing: inherit; color: #989daa;">.</span><span class="n" style="box-sizing: inherit;">classSecret</span> <span class="o" style="box-sizing: inherit; color: #989daa;">-</span><span class="mi" style="box-sizing: inherit; color: #8ae234;">1</span> <span class="c1" style="box-sizing: inherit; color: #8d9684;">// arg.classSecret is accessible because it's inside the companion object
</span><span class="o" style="box-sizing: inherit; color: #989daa;">}</span>
<span class="nc" style="box-sizing: inherit;">MyClass</span><span class="o" style="box-sizing: inherit; color: #989daa;">.</span><span class="n" style="box-sizing: inherit;">objectSecret</span> <span class="c1" style="box-sizing: inherit; color: #8d9684;">// won't compile
</span><span class="nc" style="box-sizing: inherit;">MyClass</span><span class="o" style="box-sizing: inherit; color: #989daa;">.</span><span class="n" style="box-sizing: inherit;">classSecret</span> <span class="c1" style="box-sizing: inherit; color: #8d9684;">// won't compile
</span>
<span class="k" style="box-sizing: inherit; color: #729fcf;">new</span> <span class="nc" style="box-sizing: inherit;">MyClass</span><span class="o" style="box-sizing: inherit; color: #989daa;">(-</span><span class="mi" style="box-sizing: inherit; color: #8ae234;">1</span><span class="o" style="box-sizing: inherit; color: #989daa;">,</span> <span class="s" style="box-sizing: inherit;">"random"</span><span class="o" style="box-sizing: inherit; color: #989daa;">).</span><span class="n" style="box-sizing: inherit;">objectSecret</span> <span class="c1" style="box-sizing: inherit; color: #8d9684;">// won't compile
</span><span class="k" style="box-sizing: inherit; color: #729fcf;">new</span> <span class="nc" style="box-sizing: inherit;">MyClass</span><span class="o" style="box-sizing: inherit; color: #989daa;">(-</span><span class="mi" style="box-sizing: inherit; color: #8ae234;">1</span><span class="o" style="box-sizing: inherit; color: #989daa;">,</span> <span class="s" style="box-sizing: inherit;">"random"</span><span class="o" style="box-sizing: inherit; color: #989daa;">).</span><span class="n" style="box-sizing: inherit;">classSecret</span> <span class="c1" style="box-sizing: inherit; color: #8d9684;">// won't compile
</span></code></pre>
</div>
<h4 id="8-what-is-the-difference-between-the-following-terms-and-types-in-scala-nil-null-none-nothing" style="box-sizing: inherit; color: #111111; float: none; font-family: "Open Sans", Helvetica, Arial, sans-serif; font-size: 1.125em; line-height: 1.22222; margin: 1rem auto; width: 800px;">
8. What is the difference between the following terms and types in Scala: <code class="highlighter-rouge" style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;">Nil</code>, <code class="highlighter-rouge" style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;">Null</code>, <code class="highlighter-rouge" style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;">None</code>, <code class="highlighter-rouge" style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;">Nothing</code>?<a class="header-link" href="http://pedrorijo.com/blog/scala-interview-questions/#8-what-is-the-difference-between-the-following-terms-and-types-in-scala-nil-null-none-nothing" style="box-sizing: inherit; color: seagreen; font-size: 0.8em; left: 0.5em; opacity: 0; position: relative; transition: opacity 0.2s ease-in-out 0.1s;"><span class="fa fa-link" style="box-sizing: inherit; display: inline-block; font-family: "fontawesome"; font-size: inherit; font-stretch: normal; font-weight: normal; line-height: 1; transform: translate(0px , 0px);"></span></a></h4>
<div style="box-sizing: inherit; color: #111111; float: none; font-family: "PT Serif", Georgia, Times, serif; font-size: 18px; margin-bottom: 2rem; margin-left: auto; margin-right: 17.0117px; width: 800px;">
The <code class="highlighter-rouge" style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;">None</code> is the empty representation of the <code class="highlighter-rouge" style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;">Option</code> monad.</div>
<div style="box-sizing: inherit; color: #111111; float: none; font-family: "PT Serif", Georgia, Times, serif; font-size: 18px; margin-bottom: 2rem; margin-left: auto; margin-right: 17.0117px; width: 800px;">
<code class="highlighter-rouge" style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;">Null</code> is a Scala trait, where <code class="highlighter-rouge" style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;">null</code> is its only instance. The <code class="highlighter-rouge" style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;">null</code> value comes from Java and it’s an instance of any object, i.e., it is a subtype of all reference types, but not of value types. It exists so that reference types can be assigned <code class="highlighter-rouge" style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;">null</code> and value types (like <code class="highlighter-rouge" style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;">Int</code> or <code class="highlighter-rouge" style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;">Long</code>) can’t.</div>
<div style="box-sizing: inherit; color: #111111; float: none; font-family: "PT Serif", Georgia, Times, serif; font-size: 18px; margin-bottom: 2rem; margin-left: auto; margin-right: 17.0117px; width: 800px;">
<code class="highlighter-rouge" style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;">Nothing</code> is another Scala <code class="highlighter-rouge" style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;">trait</code>. It’s a subtype of any other type, and it has no subtypes. It exists due to the complex type system Scala has. It has <em style="box-sizing: inherit;">zero</em> instances. It’s the return type of a method that never returns normally, for instance, a method that always throws an exception. <a href="http://james-iry.blogspot.pt/2009/08/getting-to-bottom-of-nothing-at-all.html" style="box-sizing: inherit; color: seagreen;">The reason Scala has a bottom type is tied to its ability to express variance in type parameters.</a>.</div>
<div style="box-sizing: inherit; color: #111111; float: none; font-family: "PT Serif", Georgia, Times, serif; font-size: 18px; margin-bottom: 2rem; margin-left: auto; margin-right: 17.0117px; width: 800px;">
Finally, <code class="highlighter-rouge" style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;">Nil</code> represents an empty List of anything of size zero. <code class="highlighter-rouge" style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;">Nil</code> is of type <code class="highlighter-rouge" style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;">List[Nothing]</code>.</div>
<div style="box-sizing: inherit; color: #111111; float: none; font-family: "PT Serif", Georgia, Times, serif; font-size: 18px; margin-bottom: 2rem; margin-left: auto; margin-right: 17.0117px; width: 800px;">
All these types can create a sense of emptiness right? Here’s a little help <a href="http://www.nickknowlson.com/blog/2013/03/31/representing-empty-in-scala/" style="box-sizing: inherit; color: seagreen;">understanding emptiness in Scala</a>.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiUMHPqLJKKldxD6jVr96bszj9Q3582zLNqBglPkWiKZdagFPZP01lGcgvn8hBgk6ni1rLJzE9rP-UkGejCXzrJ9hqmjOe_5LF5_PF6ms6BtF-ap3lJbILad-sJe9AHIdoIZrUxi4CqcS8G/s1600/scala_type_hierarchy.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="424" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiUMHPqLJKKldxD6jVr96bszj9Q3582zLNqBglPkWiKZdagFPZP01lGcgvn8hBgk6ni1rLJzE9rP-UkGejCXzrJ9hqmjOe_5LF5_PF6ms6BtF-ap3lJbILad-sJe9AHIdoIZrUxi4CqcS8G/s640/scala_type_hierarchy.png" width="640" /></a></div>
</div>
<div align="center" style="box-sizing: inherit; color: #111111; float: none; font-family: "PT Serif", Georgia, Times, serif; font-size: 18px; margin-bottom: 2rem; margin-left: auto; margin-right: 17.0117px; width: 800px;">
</div>
<h4 id="9-what-is-unit" style="box-sizing: inherit; color: #111111; float: none; font-family: "Open Sans", Helvetica, Arial, sans-serif; font-size: 1.125em; line-height: 1.22222; margin: 1rem auto; width: 800px;">
9. What is <code class="highlighter-rouge" style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;">Unit</code>?<a class="header-link" href="http://pedrorijo.com/blog/scala-interview-questions/#9-what-is-unit" style="box-sizing: inherit; color: seagreen; font-size: 0.8em; left: 0.5em; opacity: 0; position: relative; transition: opacity 0.2s ease-in-out 0.1s;"><span class="fa fa-link" style="box-sizing: inherit; display: inline-block; font-family: "fontawesome"; font-size: inherit; font-stretch: normal; font-weight: normal; line-height: 1; transform: translate(0px , 0px);"></span></a></h4>
<div style="box-sizing: inherit; color: #111111; float: none; font-family: "PT Serif", Georgia, Times, serif; font-size: 18px; margin-bottom: 2rem; margin-left: auto; margin-right: 17.0117px; width: 800px;">
<code class="highlighter-rouge" style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;">Unit</code> is a type which represents the absence of value, just like Java <code class="highlighter-rouge" style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;">void</code>. It is a subtype of <code class="highlighter-rouge" style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;">scala.AnyVal</code>. There is only one value of type Unit, represented by <code class="highlighter-rouge" style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;">()</code>, and it is not represented by any object in the underlying runtime system.</div>
<h4 id="10-what-is-the-difference-between-a-call-by-value-and-call-by-name-parameter" style="box-sizing: inherit; color: #111111; float: none; font-family: "Open Sans", Helvetica, Arial, sans-serif; font-size: 1.125em; line-height: 1.22222; margin: 1rem auto; width: 800px;">
10. What is the difference between a <code class="highlighter-rouge" style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;">call-by-value</code> and <code class="highlighter-rouge" style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;">call-by-name</code>parameter?<a class="header-link" href="http://pedrorijo.com/blog/scala-interview-questions/#10-what-is-the-difference-between-a-call-by-value-and-call-by-name-parameter" style="box-sizing: inherit; color: seagreen; font-size: 0.8em; left: 0.5em; opacity: 0; position: relative; transition: opacity 0.2s ease-in-out 0.1s;"><span class="fa fa-link" style="box-sizing: inherit; display: inline-block; font-family: "fontawesome"; font-size: inherit; font-stretch: normal; font-weight: normal; line-height: 1; transform: translate(0px , 0px);"></span></a></h4>
<div style="box-sizing: inherit; color: #111111; float: none; font-family: "PT Serif", Georgia, Times, serif; font-size: 18px; margin-bottom: 2rem; margin-left: auto; margin-right: 17.0117px; width: 800px;">
The difference between a <code class="highlighter-rouge" style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;">call-by-value</code> and a <code class="highlighter-rouge" style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;">call-by-name</code> parameter, is that the former is computed before calling the function, and the later is evaluated when accessed.</div>
<div style="box-sizing: inherit; color: #111111; float: none; font-family: "PT Serif", Georgia, Times, serif; font-size: 18px; margin-bottom: 2rem; margin-left: auto; margin-right: 17.0117px; width: 800px;">
Example: If we declare the following functions</div>
<div class="language-scala highlighter-rouge" style="box-sizing: inherit; color: #111111; font-family: "PT Serif", Georgia, Times, serif; font-size: 18px;">
<pre class="highlight" style="background: rgb(52, 54, 66); border-radius: 3px; box-sizing: inherit; clear: both; color: #c1c2c3; font-family: monospace, monospace; font-size: 1em; margin-bottom: 2rem; margin-top: 1.5em; overflow: auto; padding: 20px; word-wrap: normal;"><code style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;"><span class="k" style="box-sizing: inherit; color: #729fcf;">def</span> <span class="n" style="box-sizing: inherit;">func</span><span class="o" style="box-sizing: inherit; color: #989daa;">()</span><span class="k" style="box-sizing: inherit; color: #729fcf;">:</span> <span class="kt" style="box-sizing: inherit; color: #e3e7df;">Int</span> <span class="o" style="box-sizing: inherit; color: #989daa;">=</span> <span class="o" style="box-sizing: inherit; color: #989daa;">{</span>
<span class="n" style="box-sizing: inherit;">println</span><span class="o" style="box-sizing: inherit; color: #989daa;">(</span><span class="s" style="box-sizing: inherit;">"computing stuff...."</span><span class="o" style="box-sizing: inherit; color: #989daa;">)</span>
<span class="mi" style="box-sizing: inherit; color: #8ae234;">42</span> <span class="c1" style="box-sizing: inherit; color: #8d9684;">// return something
</span><span class="o" style="box-sizing: inherit; color: #989daa;">}</span>
<span class="k" style="box-sizing: inherit; color: #729fcf;">def</span> <span class="n" style="box-sizing: inherit;">callByValue</span><span class="o" style="box-sizing: inherit; color: #989daa;">(</span><span class="n" style="box-sizing: inherit;">x</span><span class="k" style="box-sizing: inherit; color: #729fcf;">:</span> <span class="kt" style="box-sizing: inherit; color: #e3e7df;">Int</span><span class="o" style="box-sizing: inherit; color: #989daa;">)</span> <span class="k" style="box-sizing: inherit; color: #729fcf;">=</span> <span class="o" style="box-sizing: inherit; color: #989daa;">{</span>
<span class="n" style="box-sizing: inherit;">println</span><span class="o" style="box-sizing: inherit; color: #989daa;">(</span><span class="s" style="box-sizing: inherit;">"1st x: "</span> <span class="o" style="box-sizing: inherit; color: #989daa;">+</span> <span class="n" style="box-sizing: inherit;">x</span><span class="o" style="box-sizing: inherit; color: #989daa;">)</span>
<span class="n" style="box-sizing: inherit;">println</span><span class="o" style="box-sizing: inherit; color: #989daa;">(</span><span class="s" style="box-sizing: inherit;">"2nd x: "</span> <span class="o" style="box-sizing: inherit; color: #989daa;">+</span> <span class="n" style="box-sizing: inherit;">x</span><span class="o" style="box-sizing: inherit; color: #989daa;">)</span>
<span class="o" style="box-sizing: inherit; color: #989daa;">}</span>
<span class="k" style="box-sizing: inherit; color: #729fcf;">def</span> <span class="n" style="box-sizing: inherit;">callByName</span><span class="o" style="box-sizing: inherit; color: #989daa;">(</span><span class="n" style="box-sizing: inherit;">x</span><span class="k" style="box-sizing: inherit; color: #729fcf;">:</span> <span class="o" style="box-sizing: inherit; color: #989daa;">=></span> <span class="nc" style="box-sizing: inherit;">Int</span><span class="o" style="box-sizing: inherit; color: #989daa;">)</span> <span class="k" style="box-sizing: inherit; color: #729fcf;">=</span> <span class="o" style="box-sizing: inherit; color: #989daa;">{</span>
<span class="n" style="box-sizing: inherit;">println</span><span class="o" style="box-sizing: inherit; color: #989daa;">(</span><span class="s" style="box-sizing: inherit;">"1st x: "</span> <span class="o" style="box-sizing: inherit; color: #989daa;">+</span> <span class="n" style="box-sizing: inherit;">x</span><span class="o" style="box-sizing: inherit; color: #989daa;">)</span>
<span class="n" style="box-sizing: inherit;">println</span><span class="o" style="box-sizing: inherit; color: #989daa;">(</span><span class="s" style="box-sizing: inherit;">"2nd x: "</span> <span class="o" style="box-sizing: inherit; color: #989daa;">+</span> <span class="n" style="box-sizing: inherit;">x</span><span class="o" style="box-sizing: inherit; color: #989daa;">)</span>
<span class="o" style="box-sizing: inherit; color: #989daa;">}</span>
</code></pre>
</div>
<div style="box-sizing: inherit; color: #111111; float: none; font-family: "PT Serif", Georgia, Times, serif; font-size: 18px; margin-bottom: 2rem; margin-left: auto; margin-right: 17.0117px; width: 800px;">
and now call them:</div>
<div class="language-scala highlighter-rouge" style="box-sizing: inherit; color: #111111; font-family: "PT Serif", Georgia, Times, serif; font-size: 18px;">
<pre class="highlight" style="background: rgb(52, 54, 66); border-radius: 3px; box-sizing: inherit; clear: both; color: #c1c2c3; font-family: monospace, monospace; font-size: 1em; margin-bottom: 2rem; margin-top: 1.5em; overflow: auto; padding: 20px; word-wrap: normal;"><code style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;"><span class="n" style="box-sizing: inherit;">scala</span><span class="o" style="box-sizing: inherit; color: #989daa;">></span> <span class="n" style="box-sizing: inherit;">callByValue</span><span class="o" style="box-sizing: inherit; color: #989daa;">(</span><span class="n" style="box-sizing: inherit;">func</span><span class="o" style="box-sizing: inherit; color: #989daa;">())</span>
<span class="n" style="box-sizing: inherit;">computing</span> <span class="n" style="box-sizing: inherit;">stuff</span><span class="o" style="box-sizing: inherit; color: #989daa;">....</span>
<span class="mi" style="box-sizing: inherit; color: #8ae234;">1</span><span class="n" style="box-sizing: inherit;">st</span> <span class="n" style="box-sizing: inherit;">x</span><span class="k" style="box-sizing: inherit; color: #729fcf;">:</span> <span class="err" style="box-sizing: inherit;">42</span>
<span class="err" style="box-sizing: inherit;">2</span><span class="kt" style="box-sizing: inherit; color: #e3e7df;">nd</span> <span class="kt" style="box-sizing: inherit; color: #e3e7df;">x:</span> <span class="err" style="box-sizing: inherit;">42</span>
<span class="kt" style="box-sizing: inherit; color: #e3e7df;">scala></span> <span class="kt" style="box-sizing: inherit; color: #e3e7df;">callByName</span><span class="o" style="box-sizing: inherit; color: #989daa;">(</span><span class="kt" style="box-sizing: inherit; color: #e3e7df;">func</span><span class="o" style="box-sizing: inherit; color: #989daa;">())</span>
<span class="kt" style="box-sizing: inherit; color: #e3e7df;">computing</span> <span class="kt" style="box-sizing: inherit; color: #e3e7df;">stuff....</span>
<span class="err" style="box-sizing: inherit;">1</span><span class="kt" style="box-sizing: inherit; color: #e3e7df;">st</span> <span class="kt" style="box-sizing: inherit; color: #e3e7df;">x:</span> <span class="err" style="box-sizing: inherit;">42</span>
<span class="kt" style="box-sizing: inherit; color: #e3e7df;">computing</span> <span class="kt" style="box-sizing: inherit; color: #e3e7df;">stuff....</span>
<span class="err" style="box-sizing: inherit;">2</span><span class="kt" style="box-sizing: inherit; color: #e3e7df;">nd</span> <span class="kt" style="box-sizing: inherit; color: #e3e7df;">x:</span> <span class="err" style="box-sizing: inherit;">42</span>
</code></pre>
</div>
<div style="box-sizing: inherit; color: #111111; float: none; font-family: "PT Serif", Georgia, Times, serif; font-size: 18px; margin-bottom: 2rem; margin-left: auto; margin-right: 17.0117px; width: 800px;">
As it may be seen, the <code class="highlighter-rouge" style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;">call-by-name</code> example makes the computation <strong style="box-sizing: inherit;">only when needed</strong>, and <strong style="box-sizing: inherit;">every time</strong> it is called, while the <code class="highlighter-rouge" style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;">call-by-value</code> <strong style="box-sizing: inherit;">only computes once</strong>, but it <strong style="box-sizing: inherit;">computes before invoking</strong> the function (<code class="highlighter-rouge" style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;">callByName</code>).</div>
<h4 id="11-define-uses-for-the-option-monad-and-good-practices-it-provides" style="box-sizing: inherit; color: #111111; float: none; font-family: "Open Sans", Helvetica, Arial, sans-serif; font-size: 1.125em; line-height: 1.22222; margin: 1rem auto; width: 800px;">
11. Define uses for the <code class="highlighter-rouge" style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;">Option</code> monad and good practices it provides.<a class="header-link" href="http://pedrorijo.com/blog/scala-interview-questions/#11-define-uses-for-the-option-monad-and-good-practices-it-provides" style="box-sizing: inherit; color: seagreen; font-size: 0.8em; left: 0.5em; opacity: 0; position: relative; transition: opacity 0.2s ease-in-out 0.1s;"><span class="fa fa-link" style="box-sizing: inherit; display: inline-block; font-family: "fontawesome"; font-size: inherit; font-stretch: normal; font-weight: normal; line-height: 1; transform: translate(0px , 0px);"></span></a></h4>
<div style="box-sizing: inherit; color: #111111; float: none; font-family: "PT Serif", Georgia, Times, serif; font-size: 18px; margin-bottom: 2rem; margin-left: auto; margin-right: 17.0117px; width: 800px;">
The <code class="highlighter-rouge" style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;">Option</code> monad is the Scala solution to the <code class="highlighter-rouge" style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;">null</code> problem from Java. While in Java the absence of a value is modeled through the <code class="highlighter-rouge" style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;">null</code> value, in Scala its usage is discouraged, in flavor of the <code class="highlighter-rouge" style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;">Option</code>.</div>
<div style="box-sizing: inherit; color: #111111; float: none; font-family: "PT Serif", Georgia, Times, serif; font-size: 18px; margin-bottom: 2rem; margin-left: auto; margin-right: 17.0117px; width: 800px;">
Using <code class="highlighter-rouge" style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;">null</code> values one might try to call a method on a null instance, because the developer was not expecting that there could be no value, and get a <code class="highlighter-rouge" style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;">NullPointerException</code>. Using the <code class="highlighter-rouge" style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;">Option</code>, the developer always knows in which cases it may have to deal with the absence of value.</div>
<div class="language-scala highlighter-rouge" style="box-sizing: inherit; color: #111111; font-family: "PT Serif", Georgia, Times, serif; font-size: 18px;">
<pre class="highlight" style="background: rgb(52, 54, 66); border-radius: 3px; box-sizing: inherit; clear: both; color: #c1c2c3; font-family: monospace, monospace; font-size: 1em; margin-bottom: 2rem; margin-top: 1.5em; overflow: auto; padding: 20px; word-wrap: normal;"><code style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;"><span class="k" style="box-sizing: inherit; color: #729fcf;">val</span> <span class="n" style="box-sizing: inherit;">person</span><span class="k" style="box-sizing: inherit; color: #729fcf;">:</span> <span class="kt" style="box-sizing: inherit; color: #e3e7df;">Person</span> <span class="o" style="box-sizing: inherit; color: #989daa;">=</span> <span class="n" style="box-sizing: inherit;">getPersonByIdOnDatabaseUnsafe</span><span class="o" style="box-sizing: inherit; color: #989daa;">(</span><span class="n" style="box-sizing: inherit;">id</span> <span class="k" style="box-sizing: inherit; color: #729fcf;">=</span> <span class="mi" style="box-sizing: inherit; color: #8ae234;">4</span><span class="o" style="box-sizing: inherit; color: #989daa;">)</span> <span class="c1" style="box-sizing: inherit; color: #8d9684;">// returns null if no person for provided id
</span><span class="n" style="box-sizing: inherit;">println</span><span class="o" style="box-sizing: inherit; color: #989daa;">(</span><span class="n" style="box-sizing: inherit;">s</span><span class="s" style="box-sizing: inherit;">"This person age is ${person.age}"</span><span class="o" style="box-sizing: inherit; color: #989daa;">)</span> <span class="c1" style="box-sizing: inherit; color: #8d9684;">// if null it will throw an exception
</span>
<span class="k" style="box-sizing: inherit; color: #729fcf;">val</span> <span class="n" style="box-sizing: inherit;">personOpt</span><span class="k" style="box-sizing: inherit; color: #729fcf;">:</span> <span class="kt" style="box-sizing: inherit; color: #e3e7df;">Option</span><span class="o" style="box-sizing: inherit; color: #989daa;">[</span><span class="kt" style="box-sizing: inherit; color: #e3e7df;">Person</span><span class="o" style="box-sizing: inherit; color: #989daa;">]</span> <span class="k" style="box-sizing: inherit; color: #729fcf;">=</span> <span class="n" style="box-sizing: inherit;">getPersonByIdOnDatabaseSafe</span><span class="o" style="box-sizing: inherit; color: #989daa;">(</span><span class="n" style="box-sizing: inherit;">id</span> <span class="k" style="box-sizing: inherit; color: #729fcf;">=</span> <span class="mi" style="box-sizing: inherit; color: #8ae234;">4</span><span class="o" style="box-sizing: inherit; color: #989daa;">)</span> <span class="c1" style="box-sizing: inherit; color: #8d9684;">// returns an empty Option if no person for provided id
</span>
<span class="n" style="box-sizing: inherit;">personOpt</span> <span class="k" style="box-sizing: inherit; color: #729fcf;">match</span> <span class="o" style="box-sizing: inherit; color: #989daa;">{</span>
<span class="k" style="box-sizing: inherit; color: #729fcf;">case</span> <span class="nc" style="box-sizing: inherit;">Some</span><span class="o" style="box-sizing: inherit; color: #989daa;">(</span><span class="n" style="box-sizing: inherit;">p</span><span class="o" style="box-sizing: inherit; color: #989daa;">)</span> <span class="k" style="box-sizing: inherit; color: #729fcf;">=></span> <span class="n" style="box-sizing: inherit;">println</span><span class="o" style="box-sizing: inherit; color: #989daa;">(</span><span class="n" style="box-sizing: inherit;">s</span><span class="s" style="box-sizing: inherit;">"This person age is ${p.age}"</span><span class="o" style="box-sizing: inherit; color: #989daa;">)</span>
<span class="k" style="box-sizing: inherit; color: #729fcf;">case</span> <span class="nc" style="box-sizing: inherit;">None</span> <span class="k" style="box-sizing: inherit; color: #729fcf;">=></span> <span class="n" style="box-sizing: inherit;">println</span><span class="o" style="box-sizing: inherit; color: #989daa;">(</span><span class="s" style="box-sizing: inherit;">"There is no person with that id"</span><span class="o" style="box-sizing: inherit; color: #989daa;">)</span>
<span class="o" style="box-sizing: inherit; color: #989daa;">}</span>
</code></pre>
</div>
<h4 id="12-how-does-yield-work" style="box-sizing: inherit; color: #111111; float: none; font-family: "Open Sans", Helvetica, Arial, sans-serif; font-size: 1.125em; line-height: 1.22222; margin: 1rem auto; width: 800px;">
12. How does <code class="highlighter-rouge" style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;">yield</code> work?<a class="header-link" href="http://pedrorijo.com/blog/scala-interview-questions/#12-how-does-yield-work" style="box-sizing: inherit; color: seagreen; font-size: 0.8em; left: 0.5em; opacity: 0; position: relative; transition: opacity 0.2s ease-in-out 0.1s;"><span class="fa fa-link" style="box-sizing: inherit; display: inline-block; font-family: "fontawesome"; font-size: inherit; font-stretch: normal; font-weight: normal; line-height: 1; transform: translate(0px , 0px);"></span></a></h4>
<div style="box-sizing: inherit; color: #111111; float: none; font-family: "PT Serif", Georgia, Times, serif; font-size: 18px; margin-bottom: 2rem; margin-left: auto; margin-right: 17.0117px; width: 800px;">
<code class="highlighter-rouge" style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;">yield</code> generates a value to be kept in each iteration of a loop. <code class="highlighter-rouge" style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;">yield</code> is used in for comprehensions as to provide a syntactic alternative to the combined usage of <code class="highlighter-rouge" style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;">map</code>/<code class="highlighter-rouge" style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;">flatMap</code> and <code class="highlighter-rouge" style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;">filter</code>operations on monads.</div>
<div class="language-scala highlighter-rouge" style="box-sizing: inherit; color: #111111; font-family: "PT Serif", Georgia, Times, serif; font-size: 18px;">
<pre class="highlight" style="background: rgb(52, 54, 66); border-radius: 3px; box-sizing: inherit; clear: both; color: #c1c2c3; font-family: monospace, monospace; font-size: 1em; margin-bottom: 2rem; margin-top: 1.5em; overflow: auto; padding: 20px; word-wrap: normal;"><code style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;"><span class="n" style="box-sizing: inherit;">scala</span><span class="o" style="box-sizing: inherit; color: #989daa;">></span> <span class="k" style="box-sizing: inherit; color: #729fcf;">for</span> <span class="o" style="box-sizing: inherit; color: #989daa;">(</span><span class="n" style="box-sizing: inherit;">i</span> <span class="k" style="box-sizing: inherit; color: #729fcf;"><-</span> <span class="mi" style="box-sizing: inherit; color: #8ae234;">1</span> <span class="n" style="box-sizing: inherit;">to</span> <span class="mi" style="box-sizing: inherit; color: #8ae234;">5</span><span class="o" style="box-sizing: inherit; color: #989daa;">)</span> <span class="k" style="box-sizing: inherit; color: #729fcf;">yield</span> <span class="n" style="box-sizing: inherit;">i</span> <span class="o" style="box-sizing: inherit; color: #989daa;">*</span> <span class="mi" style="box-sizing: inherit; color: #8ae234;">2</span>
<span class="n" style="box-sizing: inherit;">res0</span><span class="k" style="box-sizing: inherit; color: #729fcf;">:</span> <span class="kt" style="box-sizing: inherit; color: #e3e7df;">scala.collection.immutable.IndexedSeq</span><span class="o" style="box-sizing: inherit; color: #989daa;">[</span><span class="kt" style="box-sizing: inherit; color: #e3e7df;">Int</span><span class="o" style="box-sizing: inherit; color: #989daa;">]</span> <span class="k" style="box-sizing: inherit; color: #729fcf;">=</span> <span class="nc" style="box-sizing: inherit;">Vector</span><span class="o" style="box-sizing: inherit; color: #989daa;">(</span><span class="mi" style="box-sizing: inherit; color: #8ae234;">2</span><span class="o" style="box-sizing: inherit; color: #989daa;">,</span> <span class="mi" style="box-sizing: inherit; color: #8ae234;">4</span><span class="o" style="box-sizing: inherit; color: #989daa;">,</span> <span class="mi" style="box-sizing: inherit; color: #8ae234;">6</span><span class="o" style="box-sizing: inherit; color: #989daa;">,</span> <span class="mi" style="box-sizing: inherit; color: #8ae234;">8</span><span class="o" style="box-sizing: inherit; color: #989daa;">,</span> <span class="mi" style="box-sizing: inherit; color: #8ae234;">10</span><span class="o" style="box-sizing: inherit; color: #989daa;">)</span>
</code></pre>
</div>
<h4 id="13-explain-the-implicit-parameter-precedence" style="box-sizing: inherit; color: #111111; float: none; font-family: "Open Sans", Helvetica, Arial, sans-serif; font-size: 1.125em; line-height: 1.22222; margin: 1rem auto; width: 800px;">
13. Explain the implicit parameter precedence.<a class="header-link" href="http://pedrorijo.com/blog/scala-interview-questions/#13-explain-the-implicit-parameter-precedence" style="box-sizing: inherit; color: seagreen; font-size: 0.8em; left: 0.5em; opacity: 0; position: relative; transition: opacity 0.2s ease-in-out 0.1s;"><span class="fa fa-link" style="box-sizing: inherit; display: inline-block; font-family: "fontawesome"; font-size: inherit; font-stretch: normal; font-weight: normal; line-height: 1; transform: translate(0px , 0px);"></span></a></h4>
<div style="box-sizing: inherit; color: #111111; float: none; font-family: "PT Serif", Georgia, Times, serif; font-size: 18px; margin-bottom: 2rem; margin-left: auto; margin-right: 17.0117px; width: 800px;">
Implicit parameters can lead to unexpected behavior if one is not aware of the precedence when looking up.</div>
<div style="box-sizing: inherit; color: #111111; float: none; font-family: "PT Serif", Georgia, Times, serif; font-size: 18px; margin-bottom: 2rem; margin-left: auto; margin-right: 17.0117px; width: 800px;">
So, what’s the order the compiler will look up for implicits?</div>
<ol style="box-sizing: inherit; color: #111111; float: none; font-family: "PT Serif", Georgia, Times, serif; font-size: 18px; margin: 0px 17.0117px 2rem auto; padding: 0px 0px 0px 40px; width: 800px;">
<li style="box-sizing: inherit;">implicits declared locally</li>
<li style="box-sizing: inherit;">imported implicits</li>
<li style="box-sizing: inherit;">outer scope (implicits declared in the class are considered outer scope in a class method for instance)</li>
<li style="box-sizing: inherit;">inheritance</li>
<li style="box-sizing: inherit;">package object</li>
<li style="box-sizing: inherit;">implicit scope like companion objects</li>
</ol>
<div style="box-sizing: inherit; color: #111111; float: none; font-family: "PT Serif", Georgia, Times, serif; font-size: 18px; margin-bottom: 2rem; margin-left: auto; margin-right: 17.0117px; width: 800px;">
A nice <a href="http://eed3si9n.com/implicit-parameter-precedence-again" style="box-sizing: inherit; color: seagreen;">set of examples can be found here</a>.</div>
<h4 id="14-what-operations-is-a-for-comprehension-syntactic-sugar-for" style="box-sizing: inherit; color: #111111; float: none; font-family: "Open Sans", Helvetica, Arial, sans-serif; font-size: 1.125em; line-height: 1.22222; margin: 1rem auto; width: 800px;">
14. What operations is a <code class="highlighter-rouge" style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;">for comprehension</code> syntactic sugar for?<a class="header-link" href="http://pedrorijo.com/blog/scala-interview-questions/#14-what-operations-is-a-for-comprehension-syntactic-sugar-for" style="box-sizing: inherit; color: seagreen; font-size: 0.8em; left: 0.5em; opacity: 0; position: relative; transition: opacity 0.2s ease-in-out 0.1s;"><span class="fa fa-link" style="box-sizing: inherit; display: inline-block; font-family: "fontawesome"; font-size: inherit; font-stretch: normal; font-weight: normal; line-height: 1; transform: translate(0px , 0px);"></span></a></h4>
<div style="box-sizing: inherit; color: #111111; float: none; font-family: "PT Serif", Georgia, Times, serif; font-size: 18px; margin-bottom: 2rem; margin-left: auto; margin-right: 17.0117px; width: 800px;">
A <code class="highlighter-rouge" style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;">for comprehension</code> is a alternative syntax for the composition of several operations on monads.</div>
<div style="box-sizing: inherit; color: #111111; float: none; font-family: "PT Serif", Georgia, Times, serif; font-size: 18px; margin-bottom: 2rem; margin-left: auto; margin-right: 17.0117px; width: 800px;">
A <code class="highlighter-rouge" style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;">for comprehension</code> can be replaced by <code class="highlighter-rouge" style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;">foreach</code> operations (if no <code class="highlighter-rouge" style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;">yield</code> keyword is being used), or by <code class="highlighter-rouge" style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;">map</code>/<code class="highlighter-rouge" style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;">flatMap</code> and <code class="highlighter-rouge" style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;">filter</code> (actually, while confirming my words I <a href="http://docs.scala-lang.org/tutorials/FAQ/yield.html#translating-for-comprehensions" style="box-sizing: inherit; color: seagreen;">found out about the <code class="highlighter-rouge" style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;">withFilter</code> method</a>).</div>
<div class="language-scala highlighter-rouge" style="box-sizing: inherit; color: #111111; font-family: "PT Serif", Georgia, Times, serif; font-size: 18px;">
<pre class="highlight" style="background: rgb(52, 54, 66); border-radius: 3px; box-sizing: inherit; clear: both; color: #c1c2c3; font-family: monospace, monospace; font-size: 1em; margin-bottom: 2rem; margin-top: 1.5em; overflow: auto; padding: 20px; word-wrap: normal;"><code style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;"><span class="k" style="box-sizing: inherit; color: #729fcf;">for</span> <span class="o" style="box-sizing: inherit; color: #989daa;">{</span>
<span class="n" style="box-sizing: inherit;">x</span> <span class="k" style="box-sizing: inherit; color: #729fcf;"><-</span> <span class="n" style="box-sizing: inherit;">c1</span>
<span class="n" style="box-sizing: inherit;">y</span> <span class="k" style="box-sizing: inherit; color: #729fcf;"><-</span> <span class="n" style="box-sizing: inherit;">c2</span>
<span class="n" style="box-sizing: inherit;">z</span> <span class="k" style="box-sizing: inherit; color: #729fcf;"><-</span> <span class="n" style="box-sizing: inherit;">c3</span> <span class="k" style="box-sizing: inherit; color: #729fcf;">if</span> <span class="n" style="box-sizing: inherit;">z</span> <span class="o" style="box-sizing: inherit; color: #989daa;">></span> <span class="mi" style="box-sizing: inherit; color: #8ae234;">0</span>
<span class="o" style="box-sizing: inherit; color: #989daa;">}</span> <span class="k" style="box-sizing: inherit; color: #729fcf;">yield</span> <span class="o" style="box-sizing: inherit; color: #989daa;">{...}</span>
</code></pre>
</div>
<div style="box-sizing: inherit; color: #111111; float: none; font-family: "PT Serif", Georgia, Times, serif; font-size: 18px; margin-bottom: 2rem; margin-left: auto; margin-right: 17.0117px; width: 800px;">
is translated into</div>
<div class="language-scala highlighter-rouge" style="box-sizing: inherit; color: #111111; font-family: "PT Serif", Georgia, Times, serif; font-size: 18px;">
<pre class="highlight" style="background: rgb(52, 54, 66); border-radius: 3px; box-sizing: inherit; clear: both; color: #c1c2c3; font-family: monospace, monospace; font-size: 1em; margin-bottom: 2rem; margin-top: 1.5em; overflow: auto; padding: 20px; word-wrap: normal;"><code style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;"><span class="n" style="box-sizing: inherit;">c1</span><span class="o" style="box-sizing: inherit; color: #989daa;">.</span><span class="n" style="box-sizing: inherit;">flatMap</span><span class="o" style="box-sizing: inherit; color: #989daa;">(</span><span class="n" style="box-sizing: inherit;">x</span> <span class="k" style="box-sizing: inherit; color: #729fcf;">=></span> <span class="n" style="box-sizing: inherit;">c2</span><span class="o" style="box-sizing: inherit; color: #989daa;">.</span><span class="n" style="box-sizing: inherit;">flatMap</span><span class="o" style="box-sizing: inherit; color: #989daa;">(</span><span class="n" style="box-sizing: inherit;">y</span> <span class="k" style="box-sizing: inherit; color: #729fcf;">=></span> <span class="n" style="box-sizing: inherit;">c3</span><span class="o" style="box-sizing: inherit; color: #989daa;">.</span><span class="n" style="box-sizing: inherit;">withFilter</span><span class="o" style="box-sizing: inherit; color: #989daa;">(</span><span class="n" style="box-sizing: inherit;">z</span> <span class="k" style="box-sizing: inherit; color: #729fcf;">=></span> <span class="n" style="box-sizing: inherit;">z</span> <span class="o" style="box-sizing: inherit; color: #989daa;">></span> <span class="mi" style="box-sizing: inherit; color: #8ae234;">0</span><span class="o" style="box-sizing: inherit; color: #989daa;">).</span><span class="n" style="box-sizing: inherit;">map</span><span class="o" style="box-sizing: inherit; color: #989daa;">(</span><span class="n" style="box-sizing: inherit;">z</span> <span class="k" style="box-sizing: inherit; color: #729fcf;">=></span> <span class="o" style="box-sizing: inherit; color: #989daa;">{...})))</span>
</code></pre>
</div>
<div style="box-sizing: inherit; color: #111111; float: none; font-family: "PT Serif", Georgia, Times, serif; font-size: 18px; margin-bottom: 2rem; margin-left: auto; margin-right: 17.0117px; width: 800px;">
<a href="https://gist.github.com/loicdescotte/4044169" style="box-sizing: inherit; color: seagreen;">More examples by Loïc Descotte</a>.</div>
<h4 id="15-streams-what-consideration-you-need-to-have-when-you-use-scalas-streams-what-technique-does-the-scalas-streams-use-internally" style="box-sizing: inherit; color: #111111; float: none; font-family: "Open Sans", Helvetica, Arial, sans-serif; font-size: 1.125em; line-height: 1.22222; margin: 1rem auto; width: 800px;">
15. Streams: What consideration you need to have when you use Scala’s <code class="highlighter-rouge" style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;">Streams</code>? What technique does the Scala’s <code class="highlighter-rouge" style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;">Streams</code> use internally?<a class="header-link" href="http://pedrorijo.com/blog/scala-interview-questions/#15-streams-what-consideration-you-need-to-have-when-you-use-scalas-streams-what-technique-does-the-scalas-streams-use-internally" style="box-sizing: inherit; color: seagreen; font-size: 0.8em; left: 0.5em; opacity: 0; position: relative; transition: opacity 0.2s ease-in-out 0.1s;"><span class="fa fa-link" style="box-sizing: inherit; display: inline-block; font-family: "fontawesome"; font-size: inherit; font-stretch: normal; font-weight: normal; line-height: 1; transform: translate(0px , 0px);"></span></a></h4>
<div style="box-sizing: inherit; color: #111111; float: none; font-family: "PT Serif", Georgia, Times, serif; font-size: 18px; margin-bottom: 2rem; margin-left: auto; margin-right: 17.0117px; width: 800px;">
While Scala <code class="highlighter-rouge" style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;">Streams</code> can be really useful due to its lazy nature, it may also come with some unexpected problems.</div>
<div style="box-sizing: inherit; color: #111111; float: none; font-family: "PT Serif", Georgia, Times, serif; font-size: 18px; margin-bottom: 2rem; margin-left: auto; margin-right: 17.0117px; width: 800px;">
The biggest problem is that Scala <code class="highlighter-rouge" style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;">Streams</code> can be infinite, but your memory isn’t. If used wrongly, streams can lead to memory consumption problems. One must be cautious when saving references to a <code class="highlighter-rouge" style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;">stream</code>. One common guideline, is not to assign a stream (head) to a <code class="highlighter-rouge" style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;">val</code>, but instead, make it a <code class="highlighter-rouge" style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;">def</code>.</div>
<div style="box-sizing: inherit; color: #111111; float: none; font-family: "PT Serif", Georgia, Times, serif; font-size: 18px; margin-bottom: 2rem; margin-left: auto; margin-right: 17.0117px; width: 800px;">
This is a consequence of the technique behind <code class="highlighter-rouge" style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;">streams</code>: <a href="https://en.wikipedia.org/wiki/Memoization" style="box-sizing: inherit; color: seagreen;">memoization</a></div>
<h4 id="16-what-is-a-value-class" style="box-sizing: inherit; color: #111111; float: none; font-family: "Open Sans", Helvetica, Arial, sans-serif; font-size: 1.125em; line-height: 1.22222; margin: 1rem auto; width: 800px;">
16. What is a <code class="highlighter-rouge" style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;">value class</code>?<a class="header-link" href="http://pedrorijo.com/blog/scala-interview-questions/#16-what-is-a-value-class" style="box-sizing: inherit; color: seagreen; font-size: 0.8em; left: 0.5em; opacity: 0; position: relative; transition: opacity 0.2s ease-in-out 0.1s;"><span class="fa fa-link" style="box-sizing: inherit; display: inline-block; font-family: "fontawesome"; font-size: inherit; font-stretch: normal; font-weight: normal; line-height: 1; transform: translate(0px , 0px);"></span></a></h4>
<div style="box-sizing: inherit; color: #111111; float: none; font-family: "PT Serif", Georgia, Times, serif; font-size: 18px; margin-bottom: 2rem; margin-left: auto; margin-right: 17.0117px; width: 800px;">
Have you ever had one of those nasty bugs where you were using an <code class="highlighter-rouge" style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;">integer</code> thinking it would represent something but it actually represented a totally different thing ? For instance, an <code class="highlighter-rouge" style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;">integer</code>representing an age, and another representing an height getting mixed (180 years old and 25 centimetres tall do look weird no?).</div>
<div style="box-sizing: inherit; color: #111111; float: none; font-family: "PT Serif", Georgia, Times, serif; font-size: 18px; margin-bottom: 2rem; margin-left: auto; margin-right: 17.0117px; width: 800px;">
Because of that, it’s considered a good practice to wrap primitive types into more meaningful types.</div>
<div style="box-sizing: inherit; color: #111111; float: none; font-family: "PT Serif", Georgia, Times, serif; font-size: 18px; margin-bottom: 2rem; margin-left: auto; margin-right: 17.0117px; width: 800px;">
<a href="http://docs.scala-lang.org/overviews/core/value-classes.html" style="box-sizing: inherit; color: seagreen;">Value classes</a> allow a developer to increase the program type safety without incurring into penalties from allocating runtime objects.</div>
<div style="box-sizing: inherit; color: #111111; float: none; font-family: "PT Serif", Georgia, Times, serif; font-size: 18px; margin-bottom: 2rem; margin-left: auto; margin-right: 17.0117px; width: 800px;">
There are some <a href="http://docs.scala-lang.org/overviews/core/value-classes.html#when-allocation-is-necessary" style="box-sizing: inherit; color: seagreen;">constraints</a> and <a href="http://docs.scala-lang.org/overviews/core/value-classes.html#limitations" style="box-sizing: inherit; color: seagreen;">limitations</a>, but the basic idea is that at compile time the object allocation is removed, by replacing the value classes instance by primitive types. <a href="http://docs.scala-lang.org/sips/completed/value-classes.html#expansion-of-value-classes" style="box-sizing: inherit; color: seagreen;">More details can be found on its SIP</a>.</div>
<h4 id="17-option-vs-try-vs-either" style="box-sizing: inherit; color: #111111; float: none; font-family: "Open Sans", Helvetica, Arial, sans-serif; font-size: 1.125em; line-height: 1.22222; margin: 1rem auto; width: 800px;">
17. Option vs Try vs Either<a class="header-link" href="http://pedrorijo.com/blog/scala-interview-questions/#17-option-vs-try-vs-either" style="box-sizing: inherit; color: seagreen; font-size: 0.8em; left: 0.5em; opacity: 0; position: relative; transition: opacity 0.2s ease-in-out 0.1s;"><span class="fa fa-link" style="box-sizing: inherit; display: inline-block; font-family: "fontawesome"; font-size: inherit; font-stretch: normal; font-weight: normal; line-height: 1; transform: translate(0px , 0px);"></span></a></h4>
<div style="box-sizing: inherit; color: #111111; float: none; font-family: "PT Serif", Georgia, Times, serif; font-size: 18px; margin-bottom: 2rem; margin-left: auto; margin-right: 17.0117px; width: 800px;">
All of these 3 monads allow us to represent a computation that did not executed as expected.</div>
<div style="box-sizing: inherit; color: #111111; float: none; font-family: "PT Serif", Georgia, Times, serif; font-size: 18px; margin-bottom: 2rem; margin-left: auto; margin-right: 17.0117px; width: 800px;">
An <code class="highlighter-rouge" style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;">Option</code>, as explained on answer #11, represents the absence of value. It can be used when searching for something. For instance, database accesses often return <code class="highlighter-rouge" style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;">Option</code> in lookup queries.</div>
<div style="box-sizing: inherit; color: #111111; float: none; font-family: "PT Serif", Georgia, Times, serif; font-size: 18px; margin-bottom: 2rem; margin-left: auto; margin-right: 17.0117px; width: 800px;">
<code class="highlighter-rouge" style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;">Try</code> is the monad approach to the Java <code class="highlighter-rouge" style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;">try/catch</code> block. It wraps runtime exceptions.</div>
<div style="box-sizing: inherit; color: #111111; float: none; font-family: "PT Serif", Georgia, Times, serif; font-size: 18px; margin-bottom: 2rem; margin-left: auto; margin-right: 17.0117px; width: 800px;">
If you need to provide a little more info about the reason the computation has failed, <code class="highlighter-rouge" style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;">Either</code> may be very useful. With <code class="highlighter-rouge" style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;">Either</code> you specify two possible return types: the expected/correct/successful and the error case which can be as simple as a <code class="highlighter-rouge" style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;">String</code> message to be displayed to the user, or a full <a href="https://en.wikipedia.org/wiki/Algebraic_data_type" style="box-sizing: inherit; color: seagreen;">ADT</a>.</div>
<div class="language-scala highlighter-rouge" style="box-sizing: inherit; color: #111111; font-family: "PT Serif", Georgia, Times, serif; font-size: 18px;">
<pre class="highlight" style="background: rgb(52, 54, 66); border-radius: 3px; box-sizing: inherit; clear: both; color: #c1c2c3; font-family: monospace, monospace; font-size: 1em; margin-bottom: 2rem; margin-top: 1.5em; overflow: auto; padding: 20px; word-wrap: normal;"><code style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;"><span class="k" style="box-sizing: inherit; color: #729fcf;">def</span> <span class="n" style="box-sizing: inherit;">personAge</span><span class="o" style="box-sizing: inherit; color: #989daa;">(</span><span class="n" style="box-sizing: inherit;">id</span><span class="k" style="box-sizing: inherit; color: #729fcf;">:</span> <span class="kt" style="box-sizing: inherit; color: #e3e7df;">Int</span><span class="o" style="box-sizing: inherit; color: #989daa;">)</span><span class="k" style="box-sizing: inherit; color: #729fcf;">:</span> <span class="kt" style="box-sizing: inherit; color: #e3e7df;">Either</span><span class="o" style="box-sizing: inherit; color: #989daa;">[</span><span class="kt" style="box-sizing: inherit; color: #e3e7df;">String</span>, <span class="kt" style="box-sizing: inherit; color: #e3e7df;">Int</span><span class="o" style="box-sizing: inherit; color: #989daa;">]</span> <span class="k" style="box-sizing: inherit; color: #729fcf;">=</span> <span class="o" style="box-sizing: inherit; color: #989daa;">{</span>
<span class="k" style="box-sizing: inherit; color: #729fcf;">val</span> <span class="n" style="box-sizing: inherit;">personOpt</span><span class="k" style="box-sizing: inherit; color: #729fcf;">:</span> <span class="kt" style="box-sizing: inherit; color: #e3e7df;">Option</span><span class="o" style="box-sizing: inherit; color: #989daa;">[</span><span class="kt" style="box-sizing: inherit; color: #e3e7df;">Person</span><span class="o" style="box-sizing: inherit; color: #989daa;">]</span> <span class="k" style="box-sizing: inherit; color: #729fcf;">=</span> <span class="nc" style="box-sizing: inherit;">DB</span><span class="o" style="box-sizing: inherit; color: #989daa;">.</span><span class="n" style="box-sizing: inherit;">getPersonById</span><span class="o" style="box-sizing: inherit; color: #989daa;">(</span><span class="n" style="box-sizing: inherit;">id</span><span class="o" style="box-sizing: inherit; color: #989daa;">)</span>
<span class="n" style="box-sizing: inherit;">personOpt</span> <span class="k" style="box-sizing: inherit; color: #729fcf;">match</span> <span class="o" style="box-sizing: inherit; color: #989daa;">{</span>
<span class="k" style="box-sizing: inherit; color: #729fcf;">case</span> <span class="nc" style="box-sizing: inherit;">None</span> <span class="k" style="box-sizing: inherit; color: #729fcf;">=></span> <span class="nc" style="box-sizing: inherit;">Left</span><span class="o" style="box-sizing: inherit; color: #989daa;">(</span><span class="n" style="box-sizing: inherit;">s</span><span class="s" style="box-sizing: inherit;">"Could not get person with id: $id"</span><span class="o" style="box-sizing: inherit; color: #989daa;">)</span>
<span class="k" style="box-sizing: inherit; color: #729fcf;">case</span> <span class="nc" style="box-sizing: inherit;">Some</span><span class="o" style="box-sizing: inherit; color: #989daa;">(</span><span class="n" style="box-sizing: inherit;">person</span><span class="o" style="box-sizing: inherit; color: #989daa;">)</span> <span class="k" style="box-sizing: inherit; color: #729fcf;">=></span> <span class="nc" style="box-sizing: inherit;">Right</span><span class="o" style="box-sizing: inherit; color: #989daa;">(</span><span class="n" style="box-sizing: inherit;">person</span><span class="o" style="box-sizing: inherit; color: #989daa;">.</span><span class="n" style="box-sizing: inherit;">age</span><span class="o" style="box-sizing: inherit; color: #989daa;">)</span>
<span class="o" style="box-sizing: inherit; color: #989daa;">}</span>
<span class="o" style="box-sizing: inherit; color: #989daa;">}</span>
</code></pre>
</div>
<h4 id="18-what-is-function-currying" style="box-sizing: inherit; color: #111111; float: none; font-family: "Open Sans", Helvetica, Arial, sans-serif; font-size: 1.125em; line-height: 1.22222; margin: 1rem auto; width: 800px;">
18. What is function currying?<a class="header-link" href="http://pedrorijo.com/blog/scala-interview-questions/#18-what-is-function-currying" style="box-sizing: inherit; color: seagreen; font-size: 0.8em; left: 0.5em; opacity: 0; position: relative; transition: opacity 0.2s ease-in-out 0.1s;"><span class="fa fa-link" style="box-sizing: inherit; display: inline-block; font-family: "fontawesome"; font-size: inherit; font-stretch: normal; font-weight: normal; line-height: 1; transform: translate(0px , 0px);"></span></a></h4>
<div style="box-sizing: inherit; color: #111111; float: none; font-family: "PT Serif", Georgia, Times, serif; font-size: 18px; margin-bottom: 2rem; margin-left: auto; margin-right: 17.0117px; width: 800px;">
Currying is a technique of making a function that takes multiple arguments into a series of functions that take a part of the arguments.</div>
<div class="language-scala highlighter-rouge" style="box-sizing: inherit; color: #111111; font-family: "PT Serif", Georgia, Times, serif; font-size: 18px;">
<pre class="highlight" style="background: rgb(52, 54, 66); border-radius: 3px; box-sizing: inherit; clear: both; color: #c1c2c3; font-family: monospace, monospace; font-size: 1em; margin-bottom: 2rem; margin-top: 1.5em; overflow: auto; padding: 20px; word-wrap: normal;"><code style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;"><span class="k" style="box-sizing: inherit; color: #729fcf;">def</span> <span class="n" style="box-sizing: inherit;">add</span><span class="o" style="box-sizing: inherit; color: #989daa;">(</span><span class="n" style="box-sizing: inherit;">a</span><span class="k" style="box-sizing: inherit; color: #729fcf;">:</span> <span class="kt" style="box-sizing: inherit; color: #e3e7df;">Int</span><span class="o" style="box-sizing: inherit; color: #989daa;">)(</span><span class="n" style="box-sizing: inherit;">b</span><span class="k" style="box-sizing: inherit; color: #729fcf;">:</span> <span class="kt" style="box-sizing: inherit; color: #e3e7df;">Int</span><span class="o" style="box-sizing: inherit; color: #989daa;">)</span> <span class="k" style="box-sizing: inherit; color: #729fcf;">=</span> <span class="n" style="box-sizing: inherit;">a</span> <span class="o" style="box-sizing: inherit; color: #989daa;">+</span> <span class="n" style="box-sizing: inherit;">b</span>
<span class="k" style="box-sizing: inherit; color: #729fcf;">val</span> <span class="n" style="box-sizing: inherit;">add2</span> <span class="k" style="box-sizing: inherit; color: #729fcf;">=</span> <span class="n" style="box-sizing: inherit;">add</span><span class="o" style="box-sizing: inherit; color: #989daa;">(</span><span class="mi" style="box-sizing: inherit; color: #8ae234;">2</span><span class="o" style="box-sizing: inherit; color: #989daa;">)(</span><span class="k" style="box-sizing: inherit; color: #729fcf;">_</span><span class="o" style="box-sizing: inherit; color: #989daa;">)</span>
<span class="n" style="box-sizing: inherit;">scala</span><span class="o" style="box-sizing: inherit; color: #989daa;">></span> <span class="n" style="box-sizing: inherit;">add2</span><span class="o" style="box-sizing: inherit; color: #989daa;">(</span><span class="mi" style="box-sizing: inherit; color: #8ae234;">3</span><span class="o" style="box-sizing: inherit; color: #989daa;">)</span>
<span class="n" style="box-sizing: inherit;">res0</span><span class="k" style="box-sizing: inherit; color: #729fcf;">:</span> <span class="kt" style="box-sizing: inherit; color: #e3e7df;">Int</span> <span class="o" style="box-sizing: inherit; color: #989daa;">=</span> <span class="mi" style="box-sizing: inherit; color: #8ae234;">5</span>
</code></pre>
</div>
<div style="box-sizing: inherit; color: #111111; float: none; font-family: "PT Serif", Georgia, Times, serif; font-size: 18px; margin-bottom: 2rem; margin-left: auto; margin-right: 17.0117px; width: 800px;">
Currying is useful in many different contexts, but most often when you have to deal with <code class="highlighter-rouge" style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;">Higher order functions</code>.</div>
<h4 id="19-what-is-tail-recursion" style="box-sizing: inherit; color: #111111; float: none; font-family: "Open Sans", Helvetica, Arial, sans-serif; font-size: 1.125em; line-height: 1.22222; margin: 1rem auto; width: 800px;">
19. What is Tail recursion?<a class="header-link" href="http://pedrorijo.com/blog/scala-interview-questions/#19-what-is-tail-recursion" style="box-sizing: inherit; color: seagreen; font-size: 0.8em; left: 0.5em; opacity: 0; position: relative; transition: opacity 0.2s ease-in-out 0.1s;"><span class="fa fa-link" style="box-sizing: inherit; display: inline-block; font-family: "fontawesome"; font-size: inherit; font-stretch: normal; font-weight: normal; line-height: 1; transform: translate(0px , 0px);"></span></a></h4>
<div style="box-sizing: inherit; color: #111111; float: none; font-family: "PT Serif", Georgia, Times, serif; font-size: 18px; margin-bottom: 2rem; margin-left: auto; margin-right: 17.0117px; width: 800px;">
In “normal” recursive methods, a method calls itself at some point. This technique is used in the naive implementation of the <a href="https://en.wikipedia.org/wiki/Fibonacci_number" style="box-sizing: inherit; color: seagreen;">Fibonacci number</a>, for instance. The problem with this approach is that at each recursive step, another chunk of information needs to be saved on the stack. In some cases, an huge number of recursive steps can occur, leading to stack overflow errors.</div>
<div style="box-sizing: inherit; color: #111111; float: none; font-family: "PT Serif", Georgia, Times, serif; font-size: 18px; margin-bottom: 2rem; margin-left: auto; margin-right: 17.0117px; width: 800px;">
Tail recursion solves this problem. In tail recursive methods, all the computations are done before the recursive call, and the last statement is the recursive call. Compilers can then take advantage of this property to avoid stack overflow errors, since tail recursive calls can be optimized by not inserting info into the stack.</div>
<div style="box-sizing: inherit; color: #111111; float: none; font-family: "PT Serif", Georgia, Times, serif; font-size: 18px; margin-bottom: 2rem; margin-left: auto; margin-right: 17.0117px; width: 800px;">
You can ask the compiler to enforce tail recursion in a method with <a href="https://www.scala-lang.org/api/current/scala/annotation/tailrec.html" style="box-sizing: inherit; color: seagreen;">@tailrec</a></div>
<div class="language-scala highlighter-rouge" style="box-sizing: inherit; color: #111111; font-family: "PT Serif", Georgia, Times, serif; font-size: 18px;">
<pre class="highlight" style="background: rgb(52, 54, 66); border-radius: 3px; box-sizing: inherit; clear: both; color: #c1c2c3; font-family: monospace, monospace; font-size: 1em; margin-bottom: 2rem; margin-top: 1.5em; overflow: auto; padding: 20px; word-wrap: normal;"><code style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;"><span class="k" style="box-sizing: inherit; color: #729fcf;">def</span> <span class="n" style="box-sizing: inherit;">sum</span><span class="o" style="box-sizing: inherit; color: #989daa;">(</span><span class="n" style="box-sizing: inherit;">n</span><span class="k" style="box-sizing: inherit; color: #729fcf;">:</span> <span class="kt" style="box-sizing: inherit; color: #e3e7df;">Int</span><span class="o" style="box-sizing: inherit; color: #989daa;">)</span><span class="k" style="box-sizing: inherit; color: #729fcf;">:</span> <span class="kt" style="box-sizing: inherit; color: #e3e7df;">Int</span> <span class="o" style="box-sizing: inherit; color: #989daa;">=</span> <span class="o" style="box-sizing: inherit; color: #989daa;">{</span> <span class="c1" style="box-sizing: inherit; color: #8d9684;">// computes the sum of the first n natural numbers
</span> <span class="k" style="box-sizing: inherit; color: #729fcf;">if</span><span class="o" style="box-sizing: inherit; color: #989daa;">(</span><span class="n" style="box-sizing: inherit;">n</span> <span class="o" style="box-sizing: inherit; color: #989daa;">==</span> <span class="mi" style="box-sizing: inherit; color: #8ae234;">0</span><span class="o" style="box-sizing: inherit; color: #989daa;">)</span> <span class="o" style="box-sizing: inherit; color: #989daa;">{</span>
<span class="n" style="box-sizing: inherit;">n</span>
<span class="o" style="box-sizing: inherit; color: #989daa;">}</span> <span class="k" style="box-sizing: inherit; color: #729fcf;">else</span> <span class="o" style="box-sizing: inherit; color: #989daa;">{</span>
<span class="n" style="box-sizing: inherit;">n</span> <span class="o" style="box-sizing: inherit; color: #989daa;">+</span> <span class="n" style="box-sizing: inherit;">sum</span><span class="o" style="box-sizing: inherit; color: #989daa;">(</span><span class="n" style="box-sizing: inherit;">n</span> <span class="o" style="box-sizing: inherit; color: #989daa;">-</span> <span class="mi" style="box-sizing: inherit; color: #8ae234;">1</span><span class="o" style="box-sizing: inherit; color: #989daa;">)</span>
<span class="o" style="box-sizing: inherit; color: #989daa;">}</span>
<span class="o" style="box-sizing: inherit; color: #989daa;">}</span>
<span class="nd" style="box-sizing: inherit;">@tailrec</span> <span class="c1" style="box-sizing: inherit; color: #8d9684;">// just to ensure at compile time
</span><span class="k" style="box-sizing: inherit; color: #729fcf;">def</span> <span class="n" style="box-sizing: inherit;">tailSum</span><span class="o" style="box-sizing: inherit; color: #989daa;">(</span><span class="n" style="box-sizing: inherit;">n</span><span class="k" style="box-sizing: inherit; color: #729fcf;">:</span> <span class="kt" style="box-sizing: inherit; color: #e3e7df;">Int</span><span class="o" style="box-sizing: inherit; color: #989daa;">,</span> <span class="n" style="box-sizing: inherit;">acc</span><span class="k" style="box-sizing: inherit; color: #729fcf;">:</span> <span class="kt" style="box-sizing: inherit; color: #e3e7df;">Int</span> <span class="o" style="box-sizing: inherit; color: #989daa;">=</span> <span class="mi" style="box-sizing: inherit; color: #8ae234;">0</span><span class="o" style="box-sizing: inherit; color: #989daa;">)</span><span class="k" style="box-sizing: inherit; color: #729fcf;">:</span> <span class="kt" style="box-sizing: inherit; color: #e3e7df;">Int</span> <span class="o" style="box-sizing: inherit; color: #989daa;">=</span> <span class="o" style="box-sizing: inherit; color: #989daa;">{</span>
<span class="k" style="box-sizing: inherit; color: #729fcf;">if</span><span class="o" style="box-sizing: inherit; color: #989daa;">(</span><span class="n" style="box-sizing: inherit;">n</span> <span class="o" style="box-sizing: inherit; color: #989daa;">==</span> <span class="mi" style="box-sizing: inherit; color: #8ae234;">0</span><span class="o" style="box-sizing: inherit; color: #989daa;">)</span> <span class="o" style="box-sizing: inherit; color: #989daa;">{</span>
<span class="n" style="box-sizing: inherit;">acc</span>
<span class="o" style="box-sizing: inherit; color: #989daa;">}</span> <span class="k" style="box-sizing: inherit; color: #729fcf;">else</span> <span class="o" style="box-sizing: inherit; color: #989daa;">{</span>
<span class="n" style="box-sizing: inherit;">tailSum</span><span class="o" style="box-sizing: inherit; color: #989daa;">(</span><span class="n" style="box-sizing: inherit;">n</span> <span class="o" style="box-sizing: inherit; color: #989daa;">-</span> <span class="mi" style="box-sizing: inherit; color: #8ae234;">1</span><span class="o" style="box-sizing: inherit; color: #989daa;">,</span> <span class="n" style="box-sizing: inherit;">acc</span> <span class="o" style="box-sizing: inherit; color: #989daa;">+</span> <span class="n" style="box-sizing: inherit;">n</span><span class="o" style="box-sizing: inherit; color: #989daa;">)</span>
<span class="o" style="box-sizing: inherit; color: #989daa;">}</span>
<span class="o" style="box-sizing: inherit; color: #989daa;">}</span>
</code></pre>
</div>
<div style="box-sizing: inherit; color: #111111; float: none; font-family: "PT Serif", Georgia, Times, serif; font-size: 18px; margin-bottom: 2rem; margin-left: auto; margin-right: 17.0117px; width: 800px;">
if we now run both versions, what would happen?</div>
<div class="language-scala highlighter-rouge" style="box-sizing: inherit; color: #111111; font-family: "PT Serif", Georgia, Times, serif; font-size: 18px;">
<pre class="highlight" style="background: rgb(52, 54, 66); border-radius: 3px; box-sizing: inherit; clear: both; color: #c1c2c3; font-family: monospace, monospace; font-size: 1em; margin-bottom: 2rem; margin-top: 1.5em; overflow: auto; padding: 20px; word-wrap: normal;"><code style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;"><span class="o" style="box-sizing: inherit; color: #989daa;">></span> <span class="n" style="box-sizing: inherit;">sum</span><span class="o" style="box-sizing: inherit; color: #989daa;">(</span><span class="mi" style="box-sizing: inherit; color: #8ae234;">5</span><span class="o" style="box-sizing: inherit; color: #989daa;">)</span>
<span class="n" style="box-sizing: inherit;">sum</span><span class="o" style="box-sizing: inherit; color: #989daa;">(</span><span class="mi" style="box-sizing: inherit; color: #8ae234;">5</span><span class="o" style="box-sizing: inherit; color: #989daa;">)</span>
<span class="mi" style="box-sizing: inherit; color: #8ae234;">5</span> <span class="o" style="box-sizing: inherit; color: #989daa;">+</span> <span class="n" style="box-sizing: inherit;">sum</span><span class="o" style="box-sizing: inherit; color: #989daa;">(</span><span class="mi" style="box-sizing: inherit; color: #8ae234;">4</span><span class="o" style="box-sizing: inherit; color: #989daa;">)</span> <span class="c1" style="box-sizing: inherit; color: #8d9684;">// computation on hold => needs to add info into the stack
</span><span class="mi" style="box-sizing: inherit; color: #8ae234;">5</span> <span class="o" style="box-sizing: inherit; color: #989daa;">+</span> <span class="o" style="box-sizing: inherit; color: #989daa;">(</span><span class="mi" style="box-sizing: inherit; color: #8ae234;">4</span> <span class="o" style="box-sizing: inherit; color: #989daa;">+</span> <span class="n" style="box-sizing: inherit;">sum</span><span class="o" style="box-sizing: inherit; color: #989daa;">(</span><span class="mi" style="box-sizing: inherit; color: #8ae234;">3</span><span class="o" style="box-sizing: inherit; color: #989daa;">))</span>
<span class="mi" style="box-sizing: inherit; color: #8ae234;">5</span> <span class="o" style="box-sizing: inherit; color: #989daa;">+</span> <span class="o" style="box-sizing: inherit; color: #989daa;">(</span><span class="mi" style="box-sizing: inherit; color: #8ae234;">4</span> <span class="o" style="box-sizing: inherit; color: #989daa;">+</span> <span class="o" style="box-sizing: inherit; color: #989daa;">(</span><span class="mi" style="box-sizing: inherit; color: #8ae234;">3</span> <span class="o" style="box-sizing: inherit; color: #989daa;">+</span> <span class="n" style="box-sizing: inherit;">sum</span><span class="o" style="box-sizing: inherit; color: #989daa;">(</span><span class="mi" style="box-sizing: inherit; color: #8ae234;">2</span><span class="o" style="box-sizing: inherit; color: #989daa;">)))</span>
<span class="mi" style="box-sizing: inherit; color: #8ae234;">5</span> <span class="o" style="box-sizing: inherit; color: #989daa;">+</span> <span class="o" style="box-sizing: inherit; color: #989daa;">(</span><span class="mi" style="box-sizing: inherit; color: #8ae234;">4</span> <span class="o" style="box-sizing: inherit; color: #989daa;">+</span> <span class="o" style="box-sizing: inherit; color: #989daa;">(</span><span class="mi" style="box-sizing: inherit; color: #8ae234;">3</span> <span class="o" style="box-sizing: inherit; color: #989daa;">+</span> <span class="o" style="box-sizing: inherit; color: #989daa;">(</span><span class="mi" style="box-sizing: inherit; color: #8ae234;">2</span> <span class="o" style="box-sizing: inherit; color: #989daa;">+</span> <span class="n" style="box-sizing: inherit;">sum</span><span class="o" style="box-sizing: inherit; color: #989daa;">(</span><span class="mi" style="box-sizing: inherit; color: #8ae234;">1</span><span class="o" style="box-sizing: inherit; color: #989daa;">))))</span>
<span class="mi" style="box-sizing: inherit; color: #8ae234;">5</span> <span class="o" style="box-sizing: inherit; color: #989daa;">+</span> <span class="o" style="box-sizing: inherit; color: #989daa;">(</span><span class="mi" style="box-sizing: inherit; color: #8ae234;">4</span> <span class="o" style="box-sizing: inherit; color: #989daa;">+</span> <span class="o" style="box-sizing: inherit; color: #989daa;">(</span><span class="mi" style="box-sizing: inherit; color: #8ae234;">3</span> <span class="o" style="box-sizing: inherit; color: #989daa;">+</span> <span class="o" style="box-sizing: inherit; color: #989daa;">(</span><span class="mi" style="box-sizing: inherit; color: #8ae234;">2</span> <span class="o" style="box-sizing: inherit; color: #989daa;">+</span> <span class="mi" style="box-sizing: inherit; color: #8ae234;">1</span><span class="o" style="box-sizing: inherit; color: #989daa;">)))</span>
<span class="mi" style="box-sizing: inherit; color: #8ae234;">15</span>
<span class="n" style="box-sizing: inherit;">tailSum</span><span class="o" style="box-sizing: inherit; color: #989daa;">(</span><span class="mi" style="box-sizing: inherit; color: #8ae234;">5</span><span class="o" style="box-sizing: inherit; color: #989daa;">)</span> <span class="c1" style="box-sizing: inherit; color: #8d9684;">// tailSum(5, 0) because the default value
</span><span class="n" style="box-sizing: inherit;">tailSum</span><span class="o" style="box-sizing: inherit; color: #989daa;">(</span><span class="mi" style="box-sizing: inherit; color: #8ae234;">4</span><span class="o" style="box-sizing: inherit; color: #989daa;">,</span> <span class="mi" style="box-sizing: inherit; color: #8ae234;">5</span><span class="o" style="box-sizing: inherit; color: #989daa;">)</span> <span class="c1" style="box-sizing: inherit; color: #8d9684;">// no computations on hold
</span><span class="n" style="box-sizing: inherit;">tailSum</span><span class="o" style="box-sizing: inherit; color: #989daa;">(</span><span class="mi" style="box-sizing: inherit; color: #8ae234;">3</span><span class="o" style="box-sizing: inherit; color: #989daa;">,</span> <span class="mi" style="box-sizing: inherit; color: #8ae234;">9</span><span class="o" style="box-sizing: inherit; color: #989daa;">)</span>
<span class="n" style="box-sizing: inherit;">tailSum</span><span class="o" style="box-sizing: inherit; color: #989daa;">(</span><span class="mi" style="box-sizing: inherit; color: #8ae234;">2</span><span class="o" style="box-sizing: inherit; color: #989daa;">,</span> <span class="mi" style="box-sizing: inherit; color: #8ae234;">12</span><span class="o" style="box-sizing: inherit; color: #989daa;">)</span>
<span class="n" style="box-sizing: inherit;">tailSum</span><span class="o" style="box-sizing: inherit; color: #989daa;">(</span><span class="mi" style="box-sizing: inherit; color: #8ae234;">1</span><span class="o" style="box-sizing: inherit; color: #989daa;">,</span> <span class="mi" style="box-sizing: inherit; color: #8ae234;">14</span><span class="o" style="box-sizing: inherit; color: #989daa;">)</span>
<span class="n" style="box-sizing: inherit;">tailSum</span><span class="o" style="box-sizing: inherit; color: #989daa;">(</span><span class="mi" style="box-sizing: inherit; color: #8ae234;">0</span><span class="o" style="box-sizing: inherit; color: #989daa;">,</span> <span class="mi" style="box-sizing: inherit; color: #8ae234;">15</span><span class="o" style="box-sizing: inherit; color: #989daa;">)</span>
<span class="mi" style="box-sizing: inherit; color: #8ae234;">15</span>
</code></pre>
</div>
<h4 id="20-what-are-high-order-functions" style="box-sizing: inherit; color: #111111; float: none; font-family: "Open Sans", Helvetica, Arial, sans-serif; font-size: 1.125em; line-height: 1.22222; margin: 1rem auto; width: 800px;">
20. What are High Order Functions?<a class="header-link" href="http://pedrorijo.com/blog/scala-interview-questions/#20-what-are-high-order-functions" style="box-sizing: inherit; color: seagreen; font-size: 0.8em; left: 0.5em; opacity: 0; position: relative; transition: opacity 0.2s ease-in-out 0.1s;"><span class="fa fa-link" style="box-sizing: inherit; display: inline-block; font-family: "fontawesome"; font-size: inherit; font-stretch: normal; font-weight: normal; line-height: 1; transform: translate(0px , 0px);"></span></a></h4>
<div style="box-sizing: inherit; color: #111111; float: none; font-family: "PT Serif", Georgia, Times, serif; font-size: 18px; margin-bottom: 2rem; margin-left: auto; margin-right: 17.0117px; width: 800px;">
High order functions are functions that can receive or return other functions. Common examples in Scala are the <code class="highlighter-rouge" style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;">.filter</code>, <code class="highlighter-rouge" style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;">.map</code>, <code class="highlighter-rouge" style="box-sizing: inherit; font-family: monospace, monospace; font-size: 1em;">.flatMap</code> functions, which receive other functions as arguments.<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
(reference: http://pedrorijo.com/blog/scala-interview-questions/)<br />
<br /></div>
</div>
Anonymoushttp://www.blogger.com/profile/01685371156177399870noreply@blogger.com3tag:blogger.com,1999:blog-2182833570422175384.post-2041934225316904962017-02-19T16:04:00.001-08:002017-02-19T16:04:22.979-08:00Phoenix Kakfa Integration {Kalyan Contribution to Apache}<div dir="ltr" style="text-align: left;" trbidi="on">
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhVEpqMzh0PvAL1RymT_8zgKm4bPuWx8CSGvPg_UlGyjYtZ2wOik1ajy35KBYmeu8Gwo5njVS_SsHBJWlmpZNVkDzOoaUOnnDmC8CbmumoFaTS-DVPKcDh7uI7KqNQ7culN2256JyVyHUDa/s1600/producer_consumer.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="444" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhVEpqMzh0PvAL1RymT_8zgKm4bPuWx8CSGvPg_UlGyjYtZ2wOik1ajy35KBYmeu8Gwo5njVS_SsHBJWlmpZNVkDzOoaUOnnDmC8CbmumoFaTS-DVPKcDh7uI7KqNQ7culN2256JyVyHUDa/s640/producer_consumer.png" width="640" /></a></div>
<span style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: x-large;"><br /></span>
<br />
<span style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: x-large;">Hi All,</span><br />
<div style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 12.8px;">
<span style="font-size: x-large;"><br /></span></div>
<div style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 12.8px;">
<span style="font-size: x-large;">Phoenix Kafka is new Integration. You can follow the below link</span></div>
<div style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 12.8px;">
<span style="font-size: x-large;"><br /></span></div>
<div style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 12.8px;">
<span style="font-size: x-large;"><a data-saferedirecturl="https://www.google.com/url?hl=en&q=http://phoenix.apache.org/kafka.html&source=gmail&ust=1487634865662000&usg=AFQjCNFHWDcXxUWC9ck7OraOEt_QSdQy_w" href="http://phoenix.apache.org/kafka.html" style="color: #1155cc;" target="_blank">http://phoenix.apache.org/<wbr></wbr>kafka.html</a></span><div>
<div>
<span style="font-size: x-large;"><br /></span></div>
<div>
<span style="font-size: x-large;"><br /></span></div>
<div>
<span style="font-size: x-large;">This is new feature, require now a days .. </span></div>
<div>
<span style="font-size: x-large;"><br /></span></div>
<div>
<span style="font-size: x-large;"><b>Real time Streaming</b> data pushing into <b>OLTP</b> system like <b>Phoenix</b>.</span></div>
<div>
<span style="font-size: x-large;"><br /></span></div>
<div>
<span style="font-size: x-large;"><br /></span></div>
<div>
<span style="font-size: x-large;">You can see similar use cases in below links from my blog.</span></div>
<div>
<span style="font-size: x-large;"><br /></span></div>
<div>
<h3 class="m_6975531601483213267gmail-post-title m_6975531601483213267entry-title" style="color: yellowgreen; font-family: arial, tahoma, helvetica, freesans, sans-serif; font-stretch: normal; line-height: normal; margin: 0px;">
<span style="font-size: x-large;">Flume Real Time Projects</span></h3>
<div class="m_6975531601483213267gmail-post-header" style="line-height: 1.6; margin: 0px 0px 1.5em;">
<div class="m_6975531601483213267gmail-post-header-line-1">
</div>
</div>
<div class="m_6975531601483213267gmail-post-body m_6975531601483213267entry-content" id="m_6975531601483213267gmail-post-body-1207980900811969962" style="line-height: 1.4; width: 830px;">
<div dir="ltr">
<div style="line-height: 13.2px; margin-bottom: 0cm;">
<div style="font-family: arial, tahoma, helvetica, freesans, sans-serif;">
</div>
</div>
</div>
</div>
</div>
<div>
<span style="font-size: x-large;"><br /></span></div>
<div>
<span style="font-size: x-large;"><a data-saferedirecturl="https://www.google.com/url?hl=en&q=http://kalyanbigdatatraining.blogspot.com/p/learn-flume-from-basics-step-by-step.html&source=gmail&ust=1487634865662000&usg=AFQjCNHQjWyiM2OsqPRNKJW_RjzUGfDDbA" href="http://kalyanbigdatatraining.blogspot.com/p/learn-flume-from-basics-step-by-step.html" style="color: #1155cc;" target="_blank">http://kalyanbigdatatraining.<wbr></wbr>blogspot.com/p/learn-flume-<wbr></wbr>from-basics-step-by-step.html</a></span></div>
<div>
<span style="font-size: x-large;"><br /></span></div>
<div>
<span style="font-size: x-large;"><br /></span></div>
<div>
<h3 class="m_6975531601483213267gmail-post-title m_6975531601483213267entry-title" style="color: yellowgreen; font-family: arial, tahoma, helvetica, freesans, sans-serif; font-stretch: normal; line-height: normal; margin: 0px;">
<span style="font-size: x-large;">Kafka Real Time Projects</span></h3>
<div class="m_6975531601483213267gmail-post-header" style="line-height: 1.6; margin: 0px 0px 1.5em;">
<div class="m_6975531601483213267gmail-post-header-line-1">
</div>
</div>
<div class="m_6975531601483213267gmail-post-body m_6975531601483213267entry-content" id="m_6975531601483213267gmail-post-body-1207980900811969962" style="line-height: 1.4; width: 830px;">
<div dir="ltr">
<div style="line-height: 13.2px; margin-bottom: 0cm;">
<div style="font-family: arial, tahoma, helvetica, freesans, sans-serif;">
</div>
</div>
</div>
</div>
</div>
<div>
<span style="font-size: x-large;"><a data-saferedirecturl="https://www.google.com/url?hl=en&q=http://kalyanbigdatatraining.blogspot.com/p/learn-kafka-from-basics-step-by-step.html&source=gmail&ust=1487634865662000&usg=AFQjCNEtLviiUmdOB73gbhmwgJAfQ1mWJQ" href="http://kalyanbigdatatraining.blogspot.com/p/learn-kafka-from-basics-step-by-step.html" style="color: #1155cc;" target="_blank">http://kalyanbigdatatraining.<wbr></wbr>blogspot.com/p/learn-kafka-<wbr></wbr>from-basics-step-by-step.html</a></span></div>
<div>
<span style="font-size: x-large;"><br /></span></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<span style="font-size: x-large;">How to track the `<b>Production Level of Tracking</b>` for any <b>issues</b>, like below</span></div>
<div>
<span style="font-size: x-large;"><br /></span></div>
<div>
<a data-saferedirecturl="https://www.google.com/url?hl=en&q=https://issues.apache.org/jira/browse/PHOENIX-3214&source=gmail&ust=1487634865662000&usg=AFQjCNHvbtB1dkExWW7MMK6-nzy-xeZDNg" href="https://issues.apache.org/jira/browse/PHOENIX-3214" style="color: #1155cc;" target="_blank"><span style="font-size: x-large;">https://issues.apache.org/<wbr></wbr>jira/browse/PHOENIX-3214</span></a></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<span style="font-size: x-large;">Please share to you <b>friends & colleagues</b> to know more on these kind of use cases.</span></div>
</div>
<div>
<span style="font-size: x-large;"><br /></span></div>
<div>
<span style="font-size: x-large;"><br /></span></div>
<div>
<span style="font-size: x-large;"><br /></span></div>
</div>
</div>
Anonymoushttp://www.blogger.com/profile/01685371156177399870noreply@blogger.com2tag:blogger.com,1999:blog-2182833570422175384.post-11941379806197096882017-02-19T16:00:00.003-08:002017-02-19T16:00:52.887-08:00SPARK BASICS Practice on 05 Feb 2017<div dir="ltr" style="text-align: left;" trbidi="on">
<span style="font-size: large;">Spark:</span><br />
<span style="font-size: large;">--------------------</span><br />
<span style="font-size: large;">`Spark Context` => Entry point for Spark Operations</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">`RDD` => Resilient Distributed DataSets</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">`RDD` features:</span><br />
<span style="font-size: large;">----------------------</span><br />
<span style="font-size: large;">1. Immutability</span><br />
<span style="font-size: large;">2. Lazy Evaluation</span><br />
<span style="font-size: large;">3. Cacheable</span><br />
<span style="font-size: large;">4. Type Infer</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">`RDD` operations:</span><br />
<span style="font-size: large;">----------------------</span><br />
<span style="font-size: large;">1. Transformations</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">(input is RDD) => (ouput is RDD)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">2. Actions</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">(input is RDD) => (output is Result)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">Examples on RDD:</span><br />
<span style="font-size: large;">-----------------------</span><br />
<span style="font-size: large;">list <- {1,2,3,4}</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">Transformations:</span><br />
<span style="font-size: large;">----------------------</span><br />
<span style="font-size: large;">f(x) <- {x + 1}</span><br />
<span style="font-size: large;">f(list) <- {2,3,4,5}</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">f(x) <- {x * x}</span><br />
<span style="font-size: large;">f(list) <- {1,4,9,16}</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">Actions:</span><br />
<span style="font-size: large;">----------------------</span><br />
<span style="font-size: large;">sum(list) <- 10</span><br />
<span style="font-size: large;">min(list) <- 1</span><br />
<span style="font-size: large;">max(list) <- 4</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">Spark Libraries:</span><br />
<span style="font-size: large;">------------------</span><br />
<span style="font-size: large;">1. Spark SQL</span><br />
<span style="font-size: large;">2. Spark Streaming</span><br />
<span style="font-size: large;">3. Spark MLLib</span><br />
<span style="font-size: large;">4. Spark GraphX</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">Command Line approach:</span><br />
<span style="font-size: large;">---------------------------</span><br />
<span style="font-size: large;">Scala<span class="Apple-tab-span" style="white-space: pre;"> </span>=> <span class="Apple-tab-span" style="white-space: pre;"> </span>spark-shell</span><br />
<span style="font-size: large;">Python<span class="Apple-tab-span" style="white-space: pre;"> </span>=> <span class="Apple-tab-span" style="white-space: pre;"> </span>pyspark</span><br />
<span style="font-size: large;">R <span class="Apple-tab-span" style="white-space: pre;"> </span>=> <span class="Apple-tab-span" style="white-space: pre;"> </span>sparkR</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">Spark context available as sc</span><br />
<span style="font-size: large;">SQL context available as sqlContext</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">How to Create RDD from Spark Context:</span><br />
<span style="font-size: large;">----------------------------------------</span><br />
<span style="font-size: large;">We can create RDD from Spark Context, using</span><br />
<span style="font-size: large;">1. From Collections (List, Seq, Set, ..)</span><br />
<span style="font-size: large;">2. From Data Sets (text, csv, tsv, ...)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">Creating RDD from Collections:</span><br />
<span style="font-size: large;">-------------------------------</span><br />
<span style="font-size: large;">val list = List(1,2,3,4)</span><br />
<span style="font-size: large;">val rdd = sc.parallelize(list)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">-------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val list = List(1,2,3,4)</span><br />
<span style="font-size: large;">list: List[Int] = List(1, 2, 3, 4)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val rdd = sc.parallelize(list)</span><br />
<span style="font-size: large;">rdd: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[0] at parallelize at <console>:29</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> rdd.getNumPartitions</span><br />
<span style="font-size: large;">res1: Int = 4</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val rdd = sc.parallelize(list, 2)</span><br />
<span style="font-size: large;">rdd: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[1] at parallelize at <console>:29</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> rdd.getNumPartitions</span><br />
<span style="font-size: large;">res2: Int = 2</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">-------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> rdd.collect()</span><br />
<span style="font-size: large;">res3: Array[Int] = Array(1, 2, 3, 4)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> rdd.collect().foreach(println)</span><br />
<span style="font-size: large;">1</span><br />
<span style="font-size: large;">2</span><br />
<span style="font-size: large;">3</span><br />
<span style="font-size: large;">4</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> rdd.foreach(println)</span><br />
<span style="font-size: large;">3</span><br />
<span style="font-size: large;">4</span><br />
<span style="font-size: large;">1</span><br />
<span style="font-size: large;">2</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> rdd.foreach(println)</span><br />
<span style="font-size: large;">1</span><br />
<span style="font-size: large;">2</span><br />
<span style="font-size: large;">3</span><br />
<span style="font-size: large;">4</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">-------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> rdd.collect</span><br />
<span style="font-size: large;">res10: Array[Int] = Array(1, 2, 3, 4)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> rdd.map(x => x + 1)</span><br />
<span style="font-size: large;">res11: org.apache.spark.rdd.RDD[Int] = MapPartitionsRDD[2] at map at <console>:32</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> rdd.map(x => x + 1).collect</span><br />
<span style="font-size: large;">res12: Array[Int] = Array(2, 3, 4, 5)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> rdd.map(x => x * x).collect</span><br />
<span style="font-size: large;">res13: Array[Int] = Array(1, 4, 9, 16)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">-------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> rdd.collect</span><br />
<span style="font-size: large;">res14: Array[Int] = Array(1, 2, 3, 4)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> rdd.sum</span><br />
<span style="font-size: large;">res15: Double = 10.0</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> rdd.min</span><br />
<span style="font-size: large;">res16: Int = 1</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> rdd.max</span><br />
<span style="font-size: large;">res17: Int = 4</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> rdd.count</span><br />
<span style="font-size: large;">res18: Long = 4</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">-------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">Creating RDD from DataSets:</span><br />
<span style="font-size: large;">-------------------------------</span><br />
<span style="font-size: large;">val file = "file:///home/orienit/work/input/demoinput"</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">val rdd = sc.textFile(file)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">-------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val file = "file:///home/orienit/work/input/demoinput"</span><br />
<span style="font-size: large;">file: String = file:///home/orienit/work/input/demoinput</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val rdd = sc.textFile(file)</span><br />
<span style="font-size: large;">rdd: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[7] at textFile at <console>:29</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> rdd.getNumPartitions</span><br />
<span style="font-size: large;">res19: Int = 2</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val rdd = sc.textFile(file, 1)</span><br />
<span style="font-size: large;">rdd: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[9] at textFile at <console>:29</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> rdd.getNumPartitions</span><br />
<span style="font-size: large;">res20: Int = 1</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">-------------------------------------------</span><br />
<span style="font-size: large;">Word Count Job in Spark</span><br />
<span style="font-size: large;">-------------------------------------------</span><br />
<span style="font-size: large;">val file = "file:///home/orienit/work/input/demoinput"</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">val output = "file:///home/orienit/work/output/demooutput"</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">// creating a rdd from text file</span><br />
<span style="font-size: large;">val rdd = sc.textFile(file)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">// get all the words from each line</span><br />
<span style="font-size: large;">val words = rdd.flatMap(line => line.split(" "))</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">// create tuple for each word</span><br />
<span style="font-size: large;">val tuple = words.map(word => (word, 1))</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">// final word & count from tuples</span><br />
<span style="font-size: large;">val wordcount = tuple.reduceByKey((a,b) => a + b)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">// sort the data based on word</span><br />
<span style="font-size: large;">val sorted = wordcount.sortBy(word => word)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">// save the data into output folder</span><br />
<span style="font-size: large;">sorted.saveAsTextFile(output)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">-------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> rdd.collect</span><br />
<span style="font-size: large;">res21: Array[String] = Array(I am going, to hyd, I am learning, hadoop course)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> rdd.flatMap(line => line.split(" ")).collect</span><br />
<span style="font-size: large;">res22: Array[String] = Array(I, am, going, to, hyd, I, am, learning, hadoop, course)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> rdd.flatMap(line => line.split(" ")).map(word => (word, 1)).collect</span><br />
<span style="font-size: large;">res23: Array[(String, Int)] = Array((I,1), (am,1), (going,1), (to,1), (hyd,1), (I,1), (am,1), (learning,1), (hadoop,1), (course,1))</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> rdd.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey((a,b) => a + b).collect</span><br />
<span style="font-size: large;">res24: Array[(String, Int)] = Array((learning,1), (hadoop,1), (am,2), (hyd,1), (I,2), (to,1), (going,1), (course,1))</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> rdd.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey((a,b) => a + b).sortBy(word => word)</span><br />
<span style="font-size: large;">res25: org.apache.spark.rdd.RDD[(String, Int)] = MapPartitionsRDD[21] at sortBy at <console>:32</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> rdd.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey((a,b) => a + b).sortBy(word => word).collect</span><br />
<span style="font-size: large;">res26: Array[(String, Int)] = Array((I,2), (am,2), (course,1), (going,1), (hadoop,1), (hyd,1), (learning,1), (to,1))</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> rdd.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey((a,b) => a + b).sortBy(word => word).collect.foreach(println)</span><br />
<span style="font-size: large;">(I,2)</span><br />
<span style="font-size: large;">(am,2)</span><br />
<span style="font-size: large;">(course,1)</span><br />
<span style="font-size: large;">(going,1)</span><br />
<span style="font-size: large;">(hadoop,1)</span><br />
<span style="font-size: large;">(hyd,1)</span><br />
<span style="font-size: large;">(learning,1)</span><br />
<span style="font-size: large;">(to,1)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">-------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">-------------------------------------------</span><br />
<span style="font-size: large;">Grep Job in Spark</span><br />
<span style="font-size: large;">-------------------------------------------</span><br />
<span style="font-size: large;">val file = "file:///home/orienit/work/input/demoinput"</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">val output = "file:///home/orienit/work/output/demooutput-1"</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">// creating a rdd from text file</span><br />
<span style="font-size: large;">val rdd = sc.textFile(file)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">// filter the data based on pattern</span><br />
<span style="font-size: large;">val filterData = rdd.filter(line => line.contains("am"))</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">// save the data into output folder</span><br />
<span style="font-size: large;">filterData.saveAsTextFile(output)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">-------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">val df1 = sqlContext.sql("SELECT * from kalyan.src");</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">val df2 = sqlContext.sql("SELECT * from student")</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">df1.show</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">df2.show</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">-------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">df1.createOrReplaceTempView("tbl1")</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">df2.createOrReplaceTempView("tbl2")</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">sqlContext.sql("SELECT * from tbl1").show()</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">sqlContext.sql("SELECT * from tbl2").show()</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">sqlContext.sql("SELECT tbl1.*, tbl2.* from tbl1 join tbl2 on tbl1.key = tbl2.id").show()</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">-------------------------------------------</span><br />
<br /></div>
Anonymoushttp://www.blogger.com/profile/01685371156177399870noreply@blogger.com0tag:blogger.com,1999:blog-2182833570422175384.post-89377670943773397312017-02-04T06:57:00.000-08:002017-02-04T06:57:19.005-08:00SCALA BASICS Practice on 04 Feb 2017<div dir="ltr" style="text-align: left;" trbidi="on">
<span style="font-size: large;">Scala (Scalable Language)</span><br />
<span style="font-size: large;">---------------------------------</span><br />
<span style="font-size: large;">1. Object Orieted + Functional Oriented Programming Language</span><br />
<span style="font-size: large;">2. In Scala every thing is Object</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">Java:</span><br />
<span style="font-size: large;">----------------</span><br />
<span style="font-size: large;">"String" is "immutable"</span><br />
<span style="font-size: large;">"StringBuffer & StringBuilder" is "mutable"</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">"immutable" => we can't change the data</span><br />
<span style="font-size: large;">"mutable" => we can change the data</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">Scala:</span><br />
<span style="font-size: large;">-------------------------</span><br />
<span style="font-size: large;">val => "value" is "immutable"</span><br />
<span style="font-size: large;">var => "variable" is "mutable"</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">Java Syntax:</span><br />
<span style="font-size: large;">----------------------</span><br />
<span style="font-size: large;"><data_type> <variable_name> = <data> ;</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">Scala Syntax:</span><br />
<span style="font-size: large;">----------------------</span><br />
<span style="font-size: large;"><variable_name> : <data_type> = <data></span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">val <variable_name> [: <data_type>] = <data></span><br />
<span style="font-size: large;">var <variable_name> [: <data_type>] = <data></span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">orienit@kalyan:~$ scala</span><br />
<span style="font-size: large;">Welcome to Scala 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_66-internal).</span><br />
<span style="font-size: large;">Type in expressions for evaluation. Or try :help.</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> </span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">"Scala" Provides 'REPL' promt</span><br />
<span style="font-size: large;">R => Read</span><br />
<span style="font-size: large;">E => Evaluate</span><br />
<span style="font-size: large;">P => Print</span><br />
<span style="font-size: large;">L => Loop</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">"Scala" providing similar to "Python & R" -> "REPL" Promt</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">"Scala" provides "Type Infer" feature.</span><br />
<span style="font-size: large;">Based on the "data" it automatically find the "data type"</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val a = 1</span><br />
<span style="font-size: large;">a: Int = 1</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val a = 1</span><br />
<span style="font-size: large;">a: Int = 1</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val a = 1.5</span><br />
<span style="font-size: large;">a: Double = 1.5</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val a = 1.5f</span><br />
<span style="font-size: large;">a: Float = 1.5</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val a = 1l</span><br />
<span style="font-size: large;">a: Long = 1</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val a = 1d</span><br />
<span style="font-size: large;">a: Double = 1.0</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val a = '1'</span><br />
<span style="font-size: large;">a: Char = 1</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val a = "1"</span><br />
<span style="font-size: large;">a: String = 1</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">------------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val a = 1</span><br />
<span style="font-size: large;">a: Int = 1</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val a : Int = 1</span><br />
<span style="font-size: large;">a: Int = 1</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val a : Double = 1</span><br />
<span style="font-size: large;">a: Double = 1.0</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val a : Long = 1</span><br />
<span style="font-size: large;">a: Long = 1</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val a : Char = 1</span><br />
<span style="font-size: large;">a: Char = ?</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val a : String = 1</span><br />
<span style="font-size: large;"><console>:11: error: type mismatch;</span><br />
<span style="font-size: large;"> found : Int(1)</span><br />
<span style="font-size: large;"> required: String</span><br />
<span style="font-size: large;"> val a : String = 1</span><br />
<span style="font-size: large;"> ^</span><br />
<span style="font-size: large;">------------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val a = 1</span><br />
<span style="font-size: large;">a: Int = 1</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> a.toDouble</span><br />
<span style="font-size: large;">res0: Double = 1.0</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> a.toLong</span><br />
<span style="font-size: large;">res1: Long = 1</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> a.toChar</span><br />
<span style="font-size: large;">res2: Char = ?</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> a.toFloat</span><br />
<span style="font-size: large;">res3: Float = 1.0</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> a.toString</span><br />
<span style="font-size: large;">res4: String = 1</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">------------------------------------------------</span><br />
<span style="font-size: large;">"Scala" provides "Operator Overloading" similar to "C++"</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val a = 1</span><br />
<span style="font-size: large;">a: Int = 1</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val b = 2</span><br />
<span style="font-size: large;">b: Int = 2</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val c = a + b</span><br />
<span style="font-size: large;">c: Int = 3</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val c = a.+(b)</span><br />
<span style="font-size: large;">c: Int = 3</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">a + b <====> a.+(b)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">a - b <====> a.-(b)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">a * b <====> a.*(b)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">------------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val a = 1</span><br />
<span style="font-size: large;">a: Int = 1</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> a = 2</span><br />
<span style="font-size: large;"><console>:12: error: reassignment to val</span><br />
<span style="font-size: large;"> a = 2</span><br />
<span style="font-size: large;"> ^</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> var a = 1</span><br />
<span style="font-size: large;">a: Int = 1</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> a = 2</span><br />
<span style="font-size: large;">a: Int = 2</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">------------------------------------------------</span><br />
<span style="font-size: large;">IF-ELSE</span><br />
<span style="font-size: large;">-------------</span><br />
<span style="font-size: large;">if(exp) {</span><br />
<span style="font-size: large;"><span class="Apple-tab-span" style="white-space: pre;"> </span>body</span><br />
<span style="font-size: large;">}</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">if(exp) {</span><br />
<span style="font-size: large;"><span class="Apple-tab-span" style="white-space: pre;"> </span>body1</span><br />
<span style="font-size: large;">} else {</span><br />
<span style="font-size: large;"><span class="Apple-tab-span" style="white-space: pre;"> </span>body2</span><br />
<span style="font-size: large;">}</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">if(exp1) {</span><br />
<span style="font-size: large;"><span class="Apple-tab-span" style="white-space: pre;"> </span>body1</span><br />
<span style="font-size: large;">} elseif(exp2) {</span><br />
<span style="font-size: large;"><span class="Apple-tab-span" style="white-space: pre;"> </span>body2</span><br />
<span style="font-size: large;">} else {</span><br />
<span style="font-size: large;"><span class="Apple-tab-span" style="white-space: pre;"> </span>body3</span><br />
<span style="font-size: large;">}</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">------------------------------------------------</span><br />
<span style="font-size: large;">Java:</span><br />
<span style="font-size: large;">----------</span><br />
<span style="font-size: large;">int[] nums = {1, 2, 3, 4, 5};</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">int[] nums = new int[5];</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">Scala:</span><br />
<span style="font-size: large;">----------</span><br />
<span style="font-size: large;">val nums = Array(1, 2, 3, 4, 5)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">val nums = Array[Int](1, 2, 3, 4, 5)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">val nums : Array[Int] = Array[Int](1, 2, 3, 4, 5)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">val nums = Array[Int](10)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">val nums = new Array[Int](10)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">------------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val nums = Array[Int](10)</span><br />
<span style="font-size: large;">nums: Array[Int] = Array(10)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val nums = new Array[Int](10)</span><br />
<span style="font-size: large;">nums: Array[Int] = Array(0, 0, 0, 0, 0, 0, 0, 0, 0, 0)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val nums = Array[Int](1, 2, 3, 4, 5)</span><br />
<span style="font-size: large;">nums: Array[Int] = Array(1, 2, 3, 4, 5)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">------------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">val names = Array[String]("anil", "kalyan", "raj", "venkat")</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">// names[0]</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">names(0)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">------------------------------------------------</span><br />
<span style="font-size: large;">scala> val names = Array[String]("anil", "kalyan", "raj", "venkat")</span><br />
<span style="font-size: large;">names: Array[String] = Array(anil, kalyan, raj, venkat)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> names[0]</span><br />
<span style="font-size: large;"><console>:1: error: identifier expected but integer literal found.</span><br />
<span style="font-size: large;">names[0]</span><br />
<span style="font-size: large;"> ^</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> names(0)</span><br />
<span style="font-size: large;">res5: String = anil</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> names(0) = "anil kumar"</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> names</span><br />
<span style="font-size: large;">res7: Array[String] = Array(anil kumar, kalyan, raj, venkat)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> names = 1</span><br />
<span style="font-size: large;"><console>:12: error: reassignment to val</span><br />
<span style="font-size: large;"> names = 1</span><br />
<span style="font-size: large;"> ^</span><br />
<span style="font-size: large;">------------------------------------------------</span><br />
<span style="font-size: large;">"Scala" providing 2 types of "collections"</span><br />
<span style="font-size: large;">1. immutable collections => scala.collection.immutable</span><br />
<span style="font-size: large;">2. mutable collections => scala.collection.mutable</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">------------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> 1 to 10</span><br />
<span style="font-size: large;">res8: scala.collection.immutable.Range.Inclusive = Range(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> 1 to 10 by 2</span><br />
<span style="font-size: large;">res9: scala.collection.immutable.Range = Range(1, 3, 5, 7, 9)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> 1 to 10 by 3</span><br />
<span style="font-size: large;">res10: scala.collection.immutable.Range = Range(1, 4, 7, 10)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> 1 until 10</span><br />
<span style="font-size: large;">res11: scala.collection.immutable.Range = Range(1, 2, 3, 4, 5, 6, 7, 8, 9)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> 1 until 10 by 2</span><br />
<span style="font-size: large;">res12: scala.collection.immutable.Range = Range(1, 3, 5, 7, 9)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> 1 until 10 by 3</span><br />
<span style="font-size: large;">res13: scala.collection.immutable.Range = Range(1, 4, 7)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">------------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><start number> (to | until) <end number> [by <step number>]</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">------------------------------------------------</span><br />
<span style="font-size: large;">// wrong syntax</span><br />
<span style="font-size: large;">for( 1 to 10 ) println(num)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">// correct syntax</span><br />
<span style="font-size: large;">for( num <- 1 to 10 ) println(num)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">------------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> for( 1 to 10 ) println(num)</span><br />
<span style="font-size: large;"><console>:1: error: '<-' expected but ')' found.</span><br />
<span style="font-size: large;">for( 1 to 10 ) println(num)</span><br />
<span style="font-size: large;"> ^</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> for( num <- 1 to 10 ) println(num)</span><br />
<span style="font-size: large;">1</span><br />
<span style="font-size: large;">2</span><br />
<span style="font-size: large;">3</span><br />
<span style="font-size: large;">4</span><br />
<span style="font-size: large;">5</span><br />
<span style="font-size: large;">6</span><br />
<span style="font-size: large;">7</span><br />
<span style="font-size: large;">8</span><br />
<span style="font-size: large;">9</span><br />
<span style="font-size: large;">10</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">for( num <- 1 to 10 by 2) println(num)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">for( num <- 1 to 10 ) if(num % 2 == 1) println(num)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">for( num <- 1 to 10 if(num % 2 == 1) ) println(num)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">------------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> for( num <- 1 to 10 by 2) println(num)</span><br />
<span style="font-size: large;">1</span><br />
<span style="font-size: large;">3</span><br />
<span style="font-size: large;">5</span><br />
<span style="font-size: large;">7</span><br />
<span style="font-size: large;">9</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> for( num <- 1 to 10 ) if(num % 2 == 1) println(num)</span><br />
<span style="font-size: large;">1</span><br />
<span style="font-size: large;">3</span><br />
<span style="font-size: large;">5</span><br />
<span style="font-size: large;">7</span><br />
<span style="font-size: large;">9</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> for( num <- 1 to 10 if(num % 2 == 1) ) println(num)</span><br />
<span style="font-size: large;">1</span><br />
<span style="font-size: large;">3</span><br />
<span style="font-size: large;">5</span><br />
<span style="font-size: large;">7</span><br />
<span style="font-size: large;">9</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">------------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">for( num <- 1 to 10 ) if(num % 2 == 1) println("num is odd : " + num) else println("num is even : " + num)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> for( num <- 1 to 10 ) if(num % 2 == 1) println("num is odd : " + num) else println("num is even : " + num)</span><br />
<span style="font-size: large;">num is odd : 1</span><br />
<span style="font-size: large;">num is even : 2</span><br />
<span style="font-size: large;">num is odd : 3</span><br />
<span style="font-size: large;">num is even : 4</span><br />
<span style="font-size: large;">num is odd : 5</span><br />
<span style="font-size: large;">num is even : 6</span><br />
<span style="font-size: large;">num is odd : 7</span><br />
<span style="font-size: large;">num is even : 8</span><br />
<span style="font-size: large;">num is odd : 9</span><br />
<span style="font-size: large;">num is even : 10</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">------------------------------------------------</span><br />
<span style="font-size: large;">String Interpolation:</span><br />
<span style="font-size: large;">---------------------------</span><br />
<span style="font-size: large;">val id = 1</span><br />
<span style="font-size: large;">val name = "kalyan"</span><br />
<span style="font-size: large;">val course = "spark"</span><br />
<span style="font-size: large;">val percentage = 90.5</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">val exp1 = "name is kalyan, course is spark"</span><br />
<span style="font-size: large;">println(exp1)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">val exp2 = "name is " + name + ", course is " + course</span><br />
<span style="font-size: large;">println(exp2)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">val exp3 = "name is $name, course is $course"</span><br />
<span style="font-size: large;">println(exp3)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">val exp4 = s"name is $name, course is $course"</span><br />
<span style="font-size: large;">println(exp4)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">val exp5 = s"name is $name, percentage is $percentage"</span><br />
<span style="font-size: large;">println(exp5)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">val exp6 = s"name is $name, percentage is $percentage%.2f"</span><br />
<span style="font-size: large;">println(exp6)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">val exp7 = f"name is $name, percentage is $percentage%.2f"</span><br />
<span style="font-size: large;">println(exp7)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">val exp8 = s"name is $name\ncourse is $course"</span><br />
<span style="font-size: large;">println(exp8)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">val exp9 = raw"name is $name\ncourse is $course"</span><br />
<span style="font-size: large;">println(exp9)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">------------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">------------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val id = 1</span><br />
<span style="font-size: large;">id: Int = 1</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val name = "kalyan"</span><br />
<span style="font-size: large;">name: String = kalyan</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val course = "spark"</span><br />
<span style="font-size: large;">course: String = spark</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val percentage = 90.5</span><br />
<span style="font-size: large;">percentage: Double = 90.5</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val exp1 = "name is kalyan, course is spark"</span><br />
<span style="font-size: large;">exp1: String = name is kalyan, course is spark</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> println(exp1)</span><br />
<span style="font-size: large;">name is kalyan, course is spark</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val exp2 = "name is " + name + ", course is " + course</span><br />
<span style="font-size: large;">exp2: String = name is kalyan, course is spark</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> println(exp2)</span><br />
<span style="font-size: large;">name is kalyan, course is spark</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val exp3 = "name is $name, course is $course"</span><br />
<span style="font-size: large;">exp3: String = name is $name, course is $course</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> println(exp3)</span><br />
<span style="font-size: large;">name is $name, course is $course</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val exp4 = s"name is $name, course is $course"</span><br />
<span style="font-size: large;">exp4: String = name is kalyan, course is spark</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> println(exp4)</span><br />
<span style="font-size: large;">name is kalyan, course is spark</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val exp5 = s"name is $name, percentage is $percentage"</span><br />
<span style="font-size: large;">exp5: String = name is kalyan, percentage is 90.5</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> println(exp5)</span><br />
<span style="font-size: large;">name is kalyan, percentage is 90.5</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val exp6 = s"name is $name, percentage is $percentage%.2f"</span><br />
<span style="font-size: large;">exp6: String = name is kalyan, percentage is 90.5%.2f</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> println(exp6)</span><br />
<span style="font-size: large;">name is kalyan, percentage is 90.5%.2f</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val exp7 = f"name is $name, percentage is $percentage%.2f"</span><br />
<span style="font-size: large;">exp7: String = name is kalyan, percentage is 90.50</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> println(exp7)</span><br />
<span style="font-size: large;">name is kalyan, percentage is 90.50</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val exp8 = s"name is $name\ncourse is $course"</span><br />
<span style="font-size: large;">exp8: String =</span><br />
<span style="font-size: large;">name is kalyan</span><br />
<span style="font-size: large;">course is spark</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> println(exp8)</span><br />
<span style="font-size: large;">name is kalyan</span><br />
<span style="font-size: large;">course is spark</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val exp9 = raw"name is $name\ncourse is $course"</span><br />
<span style="font-size: large;">exp9: String = name is kalyan\ncourse is spark</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> println(exp9)</span><br />
<span style="font-size: large;">name is kalyan\ncourse is spark</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">------------------------------------------------</span><br />
<span style="font-size: large;">Functional Programming in Scala:</span><br />
<span style="font-size: large;">----------------------------------------</span><br />
<span style="font-size: large;">1. Named Functions</span><br />
<span style="font-size: large;">2. Anounmus Functions</span><br />
<span style="font-size: large;">3. Curried Functions</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">------------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">Anounmus Functions:</span><br />
<span style="font-size: large;">-----------------------</span><br />
<span style="font-size: large;">(x : Int) => { x + 1 }</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">val add = (x : Int) => { x + 1 }</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">add(1)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">add(10)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">------------------------------------------------</span><br />
<span style="font-size: large;">scala> (x : Int) => { x + 1 }</span><br />
<span style="font-size: large;">res28: Int => Int = <function1></span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val add = (x : Int) => { x + 1 }</span><br />
<span style="font-size: large;">add: Int => Int = <function1></span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> add(1)</span><br />
<span style="font-size: large;">res29: Int = 2</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> add(10)</span><br />
<span style="font-size: large;">res30: Int = 11</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">------------------------------------------------</span><br />
<span style="font-size: large;">Named Functions:</span><br />
<span style="font-size: large;">-----------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">val add = (x : Int) => { x + 1 }</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">def add(x : Int) = { x + 1 }</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> def add(x : Int) = { x + 1 }</span><br />
<span style="font-size: large;">add: (x: Int)Int</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> add(1)</span><br />
<span style="font-size: large;">res31: Int = 2</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> add(10)</span><br />
<span style="font-size: large;">res32: Int = 11</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">------------------------------------------------</span><br />
<span style="font-size: large;">Curried Functions:</span><br />
<span style="font-size: large;">-----------------------</span><br />
<span style="font-size: large;">def add(x : Int, y : Int) = { x + y }</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">def add(x : Int)(y : Int) = { x + y }</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> def add(x : Int)(y : Int) = { x + y }</span><br />
<span style="font-size: large;">add: (x: Int)(y: Int)Int</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> add(1)(2)</span><br />
<span style="font-size: large;">res33: Int = 3</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> add(10)(20)</span><br />
<span style="font-size: large;">res34: Int = 30</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">------------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">val addOne(z : Int) = add(z : Int)(1)</span><br />
<span style="font-size: large;">val addOne(z : Int) = add(1)(z : Int)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">------------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">def factorial(n : Int) : Int = {</span><br />
<span style="font-size: large;"> if(n == 0) 1</span><br />
<span style="font-size: large;"> else n * factorial(n - 1)</span><br />
<span style="font-size: large;">}</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">------------------------------------------------</span><br />
<span style="font-size: large;">scala> def factorial(n : Int) : Int = {</span><br />
<span style="font-size: large;"> | if(n == 0) 1</span><br />
<span style="font-size: large;"> | else n * factorial(n - 1)</span><br />
<span style="font-size: large;"> | }</span><br />
<span style="font-size: large;">factorial: (n: Int)Int</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> factorial(5)</span><br />
<span style="font-size: large;">res38: Int = 120</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> factorial(4)</span><br />
<span style="font-size: large;">res39: Int = 24</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">def factorial(n : Int) : Int = {</span><br />
<span style="font-size: large;"> println(n)</span><br />
<span style="font-size: large;"> if(n == 1) 1</span><br />
<span style="font-size: large;"> else n * factorial(n - 1)</span><br />
<span style="font-size: large;">}</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> def factorial(n : Int) : Int = {</span><br />
<span style="font-size: large;"> | println(n)</span><br />
<span style="font-size: large;"> | if(n == 1) 1</span><br />
<span style="font-size: large;"> | else n * factorial(n - 1)</span><br />
<span style="font-size: large;"> | }</span><br />
<span style="font-size: large;">factorial: (n: Int)Int</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> factorial(5)</span><br />
<span style="font-size: large;">5</span><br />
<span style="font-size: large;">4</span><br />
<span style="font-size: large;">3</span><br />
<span style="font-size: large;">2</span><br />
<span style="font-size: large;">1</span><br />
<span style="font-size: large;">res41: Int = 120</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">------------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">def factorial(n : Int) : Int = {</span><br />
<span style="font-size: large;"> def fact(x : Int, y: Int) : Int = {</span><br />
<span style="font-size: large;"> println(x)</span><br />
<span style="font-size: large;"> if(x == y) x</span><br />
<span style="font-size: large;"> else x * fact(x + 1, y)</span><br />
<span style="font-size: large;"> }</span><br />
<span style="font-size: large;"> fact(1,n)</span><br />
<span style="font-size: large;">}</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> def factorial(n : Int) : Int = {</span><br />
<span style="font-size: large;"> | def fact(x : Int, y: Int) : Int = {</span><br />
<span style="font-size: large;"> | println(x)</span><br />
<span style="font-size: large;"> | if(x == y) x</span><br />
<span style="font-size: large;"> | else x * fact(x + 1, y)</span><br />
<span style="font-size: large;"> | }</span><br />
<span style="font-size: large;"> | fact(1,n)</span><br />
<span style="font-size: large;"> | }</span><br />
<span style="font-size: large;">factorial: (n: Int)Int</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> factorial(5)</span><br />
<span style="font-size: large;">1</span><br />
<span style="font-size: large;">2</span><br />
<span style="font-size: large;">3</span><br />
<span style="font-size: large;">4</span><br />
<span style="font-size: large;">5</span><br />
<span style="font-size: large;">res44: Int = 120</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">------------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">package com.orienit.scala.learnings</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">case class Student(name: String = "kalyan", id: Int, year: Int)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">class Employee(name: String, id: Int)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">object Sample3 extends App {</span><br />
<span style="font-size: large;"> val s1 = Student("kalyan", 1, 2016)</span><br />
<span style="font-size: large;"> val s2 = new Student("venkat", 1, 2016)</span><br />
<span style="font-size: large;"> val s3 = new Student(id = 1, year = 2016)</span><br />
<span style="font-size: large;"> val s4 = new Student(year = 2016, id = 1)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"> val e1 = new Employee("venkat", 1)</span><br />
<span style="font-size: large;"> // val e2 = Employee("venkat", 1)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">}</span><br />
<span style="font-size: large;"><br /></span>
<br /></div>
Anonymoushttp://www.blogger.com/profile/01685371156177399870noreply@blogger.com1tag:blogger.com,1999:blog-2182833570422175384.post-19218708044231585622016-12-02T06:12:00.000-08:002017-10-18T06:16:36.738-07:00How to generate large amount of sample data with simple techniques for Big Data Projects<div dir="ltr" style="text-align: left;" trbidi="on">
<div style="text-align: center;">
<span style="color: red; font-size: large;"><b>Kalyan Big Data Projects</b></span></div>
<div style="text-align: center;">
<span style="font-size: large;"><br /></span></div>
<span style="font-size: large;">How to generate large amount of sample data with simple techniques for Big Data Projects</span><br />
<span style="font-size: large;"><span style="font-size: medium;"><br /></span>
<span style="font-size: medium;">Follow the below commands to generate large amount of sample data.</span></span><br />
<span style="font-size: large;"><span style="font-size: medium;"><br /></span>
<span style="font-size: medium;">Create '<b>kalyan_bigdata_projects</b>' folder in user home (i.e <b>/home/orienit</b>)</span></span><br />
<span style="font-size: large;"><span style="font-size: medium;"><br /></span>
<span style="font-size: medium;"><b>Command:</b> <i>mkdir /home/orienit/kalyan_bigdata_projects</i></span></span><br />
<span style="font-size: large;"><span style="font-size: medium;"><br /></span>
</span><br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhWLVxhFVKs-V0XXf7fOdOazoqW5A0Nc46dRX3u8j27IOBmDo1mgdidCQOr1hag4xvvUbomF1xAkwaepzNst5SYDR4gcs1qbV_Cp-JF0QbeimYmSR7lvVsbP1SHfgLXWPpG2wbYojfbgHdL/s1600/mkdir+command.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><span style="font-size: large;"><img border="0" height="120" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhWLVxhFVKs-V0XXf7fOdOazoqW5A0Nc46dRX3u8j27IOBmDo1mgdidCQOr1hag4xvvUbomF1xAkwaepzNst5SYDR4gcs1qbV_Cp-JF0QbeimYmSR7lvVsbP1SHfgLXWPpG2wbYojfbgHdL/s640/mkdir+command.png" width="640" /></span></a></div>
<span style="font-size: large;"><span style="font-size: medium;"><br /></span>
</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">Download '<b>kalyan-bigdata-examples.jar</b>' jar file from this <a href="https://github.com/kalyanhadooptraining/kalyan-bigdata-realtime-projects/blob/master/kalyan/kalyan-bigdata-examples.jar" target="_blank">link</a>.</span><br />
<span style="font-size: large;">(</span><span style="font-size: large;"><span style="color: #0000ee;"><u><a href="https://github.com/kalyanhadooptraining/kalyan-bigdata-realtime-projects/blob/master/kalyan/kalyan-bigdata-examples.jar">https://github.com/kalyanhadooptraining/kalyan-bigdata-realtime-projects/blob/master/kalyan/kalyan-bigdata-examples.jar</a></u></span>)</span><br />
<br />
<br />
<span style="font-size: large;">Copy '<b>kalyan-bigdata-examples.jar</b>' jar file into '<b>/home/orienit/kalyan_bigdata_projects</b>' folder</span><br />
<span style="font-size: large;"><span style="font-size: medium;"><br /></span>
</span><br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhvFMPh3sZPlAe-13pS9pZNbQKPII7aKfbEFw4z73jMdUbwNbUzMVVPGoao-OnxNXBOeH5ErpauL4cjwIoWr4i6sU3JwH_QwoTXuDvsgciTM8RbRwHqllDpDH5wu14peeOaap5tcBcIjvqJ/s1600/copy+jar+file.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><span style="font-size: large;"><img border="0" height="122" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhvFMPh3sZPlAe-13pS9pZNbQKPII7aKfbEFw4z73jMdUbwNbUzMVVPGoao-OnxNXBOeH5ErpauL4cjwIoWr4i6sU3JwH_QwoTXuDvsgciTM8RbRwHqllDpDH5wu14peeOaap5tcBcIjvqJ/s640/copy+jar+file.png" width="640" /></span></a></div>
<span style="font-size: large;"><span style="font-size: medium;"><br /></span>
<span style="font-size: medium;"><br /></span>
<span style="font-size: large;"><b>We are going to learn below Use Cases</b></span></span><br />
<span style="font-size: large;"><span style="font-size: large;"><b><br /></b></span>
<span style="font-size: medium;"><b>Use Case1: </b>Generating Sample Server Logs with simple command</span></span><br />
<span style="font-size: large;"><b>Use Case2: </b>Generating Sample Users in JSON format with simple command</span><br />
<span style="font-size: large;"><b>Use Case3: </b>Generating Sample Users in CSV format with simple command</span><br />
<span style="font-size: large;"><b>Use Case4: </b>Generating Sample Users in TSV format with simple command</span><br />
<span style="font-size: large;"><b>Use Case5: </b>Generating Sample Users in DELIMITED format with simple command</span><br />
<span style="font-size: large;"><b>Use Case6: </b>Generating Sample Product Log in JSON format with simple command</span><br />
<span style="font-size: large;"><b>Use Case7: </b>Generating Sample Product Log in CSV format with simple command</span><br />
<span style="font-size: large;"><b>Use Case8: </b>Generating Sample Product Log in TSV format with simple command</span><br />
<span style="font-size: large;"><b>Use Case9: </b>Generating Sample Product Log in DELIMITED format with simple command</span><br />
<span style="font-size: large;"><span style="font-size: medium;"><br /></span>
<span style="font-size: medium;"><br /></span>
<span style="font-size: large;"><b>Use Case1:</b> Generating Sample Server Logs with simple command</span></span><br />
<span style="font-size: large;"><span style="font-size: medium;"><br /></span>
<span style="font-size: medium;">java -cp /home/orienit/kalyan_bigdata_projects/kalyan-bigdata-examples.jar \</span></span><br />
<span style="font-size: large;">com.orienit.kalyan.examples.GenerateServerLog \</span><br />
<span style="font-size: large;">-f /tmp/serverlog.txt \</span><br />
<span style="font-size: large;">-n 100 \</span><br />
<span style="font-size: large;">-s 10 \</span><br />
<span style="font-size: large;">-d 2016/01/01 \</span><br />
<span style="font-size: large;">-w 5</span><br />
<span style="font-size: large;"><span style="font-size: medium;"><br /></span>
</span><br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhGoT1oPA1GK7Y6hcq4w44E0f_juRIjMYqsaJQkdmIz9JypJQj9oez05VfHkAk3AzGQ-a6l8LULNXEdxrGCWEfm3qztoLlKg8r5EU_TsLvN6GY7zEGa9MwDo5sEkOZ5ckrnYPyrrcrr2Whh/s1600/generate+server+log.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><span style="font-size: large;"><img border="0" height="150" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhGoT1oPA1GK7Y6hcq4w44E0f_juRIjMYqsaJQkdmIz9JypJQj9oez05VfHkAk3AzGQ-a6l8LULNXEdxrGCWEfm3qztoLlKg8r5EU_TsLvN6GY7zEGa9MwDo5sEkOZ5ckrnYPyrrcrr2Whh/s640/generate+server+log.png" width="640" /></span></a></div>
<span style="font-size: large;"><br /></span>
<br />
<div class="separator" style="clear: both; text-align: center;">
</div>
<span style="font-size: large;"><span style="font-size: medium;"><br /></span>
</span><br />
<span style="font-size: large;">Read SERVER LOG data</span><br />
<span style="font-size: large;"><span style="font-size: medium;"><br /></span>
</span><br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgT1xeZ8lv8BDln3yq0g__F-dGFZxM_AY3OYTd4e6KJe9CJE1XYv1EGNc7lkACSh_YfWBFfmykiBxV3W8w0PIhyphenhyphenlhqrfEgy6yYvVYwRPc2JETNkfdfopvuKFmEjZDk5QWQ9IsbH4bxJrnWt/s1600/read+server+log+data.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><span style="font-size: large;"><img border="0" height="358" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgT1xeZ8lv8BDln3yq0g__F-dGFZxM_AY3OYTd4e6KJe9CJE1XYv1EGNc7lkACSh_YfWBFfmykiBxV3W8w0PIhyphenhyphenlhqrfEgy6yYvVYwRPc2JETNkfdfopvuKFmEjZDk5QWQ9IsbH4bxJrnWt/s640/read+server+log+data.png" width="640" /></span></a></div>
<span style="font-size: large;"><span style="font-size: medium;"><br /></span>
<span style="font-size: medium;"><br /></span>
<span style="font-size: large;"><b>Use Case:</b> Generating Sample Users with simple command</span></span><br />
<span style="font-size: large;"><span style="font-size: medium;"><br /></span>
<span style="font-size: medium;">java -cp /home/orienit/kalyan_bigdata_projects/kalyan-bigdata-examples.jar \</span></span><br />
<span style="font-size: large;">com.orienit.kalyan.examples.GenerateUsers</span><br />
<span style="font-size: large;"><span style="font-size: medium;"><br /></span>
</span><br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhSOhrAyPdbbkE2DTzDVDeYhA7QcTHIsHaifUvdl-sR3qK0Wi32U4DR7lCDTYhd70XdSV3BypXwQG9C9PysIof-bzXelrRDdbOjem2_4eZE6V1AAQy5lpPLu8knbxWSxkLUirnDh0lhP5NI/s1600/basic+users+command.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><span style="font-size: large;"><img border="0" height="178" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhSOhrAyPdbbkE2DTzDVDeYhA7QcTHIsHaifUvdl-sR3qK0Wi32U4DR7lCDTYhd70XdSV3BypXwQG9C9PysIof-bzXelrRDdbOjem2_4eZE6V1AAQy5lpPLu8knbxWSxkLUirnDh0lhP5NI/s640/basic+users+command.png" width="640" /></span></a></div>
<span style="font-size: large;"><span style="font-size: medium;"><br /></span>
<span style="font-size: medium;"><br /></span>
<span style="font-size: medium;">We can pass different arguments for above command</span></span><br />
<span style="font-size: large;"><span style="font-size: medium;"><br /></span>
<span style="font-size: medium;">-d => field delimiter like (tab, comma, semicolon, etc )</span></span><br />
<span style="font-size: large;">-f => output file path</span><br />
<span style="font-size: large;">-n => number of users, maximum number is 10000</span><br />
<span style="font-size: large;">-s => starting number of user id, bydefault is 1</span><br />
<span style="font-size: large;">-w => waiting time in milli sec, bydefault is 100 millisec</span><br />
<span style="font-size: large;"><span style="font-size: medium;"><br /></span>
<span style="font-size: medium;"><br /></span>
<span style="font-size: large;"><b>Use Case2:</b> Generating Sample Users in JSON format with simple command</span></span><br />
<span style="font-size: large;"><span style="font-size: medium;"><br /></span>
<span style="font-size: medium;">java -cp /home/orienit/kalyan_bigdata_projects/kalyan-bigdata-examples.jar \</span></span><br />
<span style="font-size: large;">com.orienit.kalyan.examples.GenerateUsers \</span><br />
<span style="font-size: large;">-f /tmp/users.json \</span><br />
<span style="font-size: large;">-n 10 \</span><br />
<span style="font-size: large;">-s 1</span><br />
<span style="font-size: large;"><span style="font-size: medium;"><br /></span>
</span><br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh_0v_Z-hI0ZW5Nx41N7_BLVnBfWNXQY5Nxf7h2Lvk1BNCnOTtzeuKJDtxx9j3NbKETnWzGrlSmcI2B6QyM1GBN9XXhmVYqyifxrzc8cSUA5A0jAnAoqaFf1LF_i1VnwQiELP4RcWYWYvfj/s1600/generate+users+json.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><span style="font-size: large;"><img border="0" height="150" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh_0v_Z-hI0ZW5Nx41N7_BLVnBfWNXQY5Nxf7h2Lvk1BNCnOTtzeuKJDtxx9j3NbKETnWzGrlSmcI2B6QyM1GBN9XXhmVYqyifxrzc8cSUA5A0jAnAoqaFf1LF_i1VnwQiELP4RcWYWYvfj/s640/generate+users+json.png" width="640" /></span></a></div>
<span style="font-size: large;"><span style="font-size: medium;"><br /></span><span style="font-size: medium;"><br /></span>
<span style="font-size: medium;">Read JSON Data</span></span><br />
<span style="font-size: large;"><span style="font-size: medium;"><br /></span>
</span><br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjQ8TyGX2k0jOx6v4A0o37K3TmfHhOqoDyp9bAv_V5MiX8G7hr8zFRqDRV7Eix3BIHBvXZcFM64Y5CcxcptGqpiFIOPsdrHgFoxgsPdxo7SGTfhJnhil4ff9_dsGMfQhuCrgcLjLzjhyb_4/s1600/read+json+data.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><span style="font-size: large;"><img border="0" height="358" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjQ8TyGX2k0jOx6v4A0o37K3TmfHhOqoDyp9bAv_V5MiX8G7hr8zFRqDRV7Eix3BIHBvXZcFM64Y5CcxcptGqpiFIOPsdrHgFoxgsPdxo7SGTfhJnhil4ff9_dsGMfQhuCrgcLjLzjhyb_4/s640/read+json+data.png" width="640" /></span></a></div>
<span style="font-size: large;"><span style="font-size: medium;"><br /></span>
<span style="font-size: medium;"><br /></span>
<span style="font-size: medium;"><br /></span>
<span style="font-size: large;"><b>Use Case3: </b>Generating Sample Users in CSV format with simple command</span></span><br />
<span style="font-size: large;"><span style="font-size: medium;"><br /></span>
<span style="font-size: medium;">java -cp /home/orienit/kalyan_bigdata_projects/kalyan-bigdata-examples.jar \</span></span><br />
<span style="font-size: large;">com.orienit.kalyan.examples.GenerateUsers \</span><br />
<span style="font-size: large;">-f /tmp/users.csv \</span><br />
<span style="font-size: large;">-d ',' \</span><br />
<span style="font-size: large;">-n 10 \</span><br />
<span style="font-size: large;">-s 1</span><br />
<span style="font-size: large;"><span style="font-size: medium;"><br /></span>
</span><br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj9W6jGx8XSWduRVxUxKXUxXlS-T38IXjkJoEHMtWFEjs7UYI8q2edgPuaxzvqE1-B4iT0sCLbMmQHI04IJPUHCNxAYDGGE9b56ovTzmXvKwZtgtAmMzp-Cvlj_ognlPiDs6mGBHvm_z5qE/s1600/generate+users+csv.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><span style="font-size: large;"><img border="0" height="150" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj9W6jGx8XSWduRVxUxKXUxXlS-T38IXjkJoEHMtWFEjs7UYI8q2edgPuaxzvqE1-B4iT0sCLbMmQHI04IJPUHCNxAYDGGE9b56ovTzmXvKwZtgtAmMzp-Cvlj_ognlPiDs6mGBHvm_z5qE/s640/generate+users+csv.png" width="640" /></span></a></div>
<span style="font-size: large;"><br /></span>
<br />
<div class="separator" style="clear: both; text-align: center;">
</div>
<span style="font-size: large;"><span style="font-size: medium;"><br /></span>
<span style="font-size: medium;"><br /></span>
<span style="font-size: medium;">Read CSV data</span></span><br />
<span style="font-size: large;"><span style="font-size: medium;"><br /></span>
</span><br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi2Ykz8gY3Zfu9Q7Z5fVhOch1VCC__qKdbf5bFkQ5hjGzOeyZtJsPO47cjBQ1DmebYDwd7P8V43OJDUySNwQgp8zi9MWw2gfEyi-WA6uqPwNPFOBTHPl3lBkcdGqV9-_krDiUj6gqY9-Wn8/s1600/read+csv+data.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><span style="font-size: large;"><img border="0" height="358" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi2Ykz8gY3Zfu9Q7Z5fVhOch1VCC__qKdbf5bFkQ5hjGzOeyZtJsPO47cjBQ1DmebYDwd7P8V43OJDUySNwQgp8zi9MWw2gfEyi-WA6uqPwNPFOBTHPl3lBkcdGqV9-_krDiUj6gqY9-Wn8/s640/read+csv+data.png" width="640" /></span></a></div>
<span style="font-size: large;"><span style="font-size: medium;"><br /></span>
<span style="font-size: medium;"><br /></span>
<span style="font-size: large;"><b>Use Case4:</b> Generating Sample Users in TSV format with simple command</span></span><br />
<span style="font-size: large;"><span style="font-size: medium;"><br /></span>
<span style="font-size: medium;">java -cp /home/orienit/kalyan_bigdata_projects/kalyan-bigdata-examples.jar \</span></span><br />
<span style="font-size: large;">com.orienit.kalyan.examples.GenerateUsers \</span><br />
<span style="font-size: large;">-f /tmp/users.tsv \</span><br />
<span style="font-size: large;">-d '\t' \</span><br />
<span style="font-size: large;">-n 10 \</span><br />
<span style="font-size: large;">-s 1</span><br />
<span style="font-size: large;"><span style="font-size: medium;"><br /></span>
</span><br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEikPvL1pfiJpUBGyS2GoCRTLDu5GVCfhfUdgbYFZr0vOn_QQuoauYoeU8bCxlxg7hyphenhyphen79eWvWqyMXzxQcRNQKilaFYnC_5RXQQRFrvzt4QF6zhagGjn4WTKDJACavlZoSlQwUIj-vHcJh4UF/s1600/generate+product+log+tsv+data.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><span style="font-size: large;"><img border="0" height="150" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEikPvL1pfiJpUBGyS2GoCRTLDu5GVCfhfUdgbYFZr0vOn_QQuoauYoeU8bCxlxg7hyphenhyphen79eWvWqyMXzxQcRNQKilaFYnC_5RXQQRFrvzt4QF6zhagGjn4WTKDJACavlZoSlQwUIj-vHcJh4UF/s640/generate+product+log+tsv+data.png" width="640" /></span></a></div>
<span style="font-size: large;"><span style="font-size: medium;"><br /></span>
<span style="font-size: medium;">Read TSV data</span></span><br />
<span style="font-size: large;"><span style="font-size: medium;"><br /></span>
</span><br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhXKTG9LQV1ScF3xiQi2DntrsqRrbR04lzQVm6KzD4etGewhyphenhyphenbdWEkTXroAxyWfBFH9AGIY2Yny7SJ0Zy4TehuYfh2iJzdcK0eecxGdiThqVqxB3dhcEItEV0sT4npVrzEbNDqxvwWdzgKY/s1600/read+tsv+data.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><span style="font-size: large;"><img border="0" height="358" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhXKTG9LQV1ScF3xiQi2DntrsqRrbR04lzQVm6KzD4etGewhyphenhyphenbdWEkTXroAxyWfBFH9AGIY2Yny7SJ0Zy4TehuYfh2iJzdcK0eecxGdiThqVqxB3dhcEItEV0sT4npVrzEbNDqxvwWdzgKY/s640/read+tsv+data.png" width="640" /></span></a></div>
<span style="font-size: large;"><span style="font-size: medium;"><br /></span>
<span style="font-size: medium;"><br /></span>
<span style="font-size: medium;"><br /></span>
<span style="font-size: large;"><b>Use Case5:</b> Generating Sample Users in DELIMITED format with simple command</span></span><br />
<span style="font-size: large;"><span style="font-size: medium;"><br /></span>
<span style="font-size: medium;">java -cp /home/orienit/kalyan_bigdata_projects/kalyan-bigdata-examples.jar \</span></span><br />
<span style="font-size: large;">com.orienit.kalyan.examples.GenerateUsers \</span><br />
<span style="font-size: large;">-f /tmp/users.txt \</span><br />
<span style="font-size: large;">-d '#' \</span><br />
<span style="font-size: large;">-n 10 \</span><br />
<span style="font-size: large;">-s 1</span><br />
<span style="font-size: large;"><span style="font-size: medium;"><br /></span>
</span><br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiSBWE_kwE_EnzGIyTaabEazFf6OFIt5pNOErYyW4LgPjkmM7O7tO0f_FL2nSNDHB-7cZUEqhh79SXEuxNCTpxmsQaW8nCTQQKT4FE5SPpR7HaSg68dV9araaTdmjWvgU2eWtMUvDlrfycL/s1600/generate+users+delimited.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><span style="font-size: large;"><img border="0" height="150" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiSBWE_kwE_EnzGIyTaabEazFf6OFIt5pNOErYyW4LgPjkmM7O7tO0f_FL2nSNDHB-7cZUEqhh79SXEuxNCTpxmsQaW8nCTQQKT4FE5SPpR7HaSg68dV9araaTdmjWvgU2eWtMUvDlrfycL/s640/generate+users+delimited.png" width="640" /></span></a></div>
<span style="font-size: large;"><br /></span>
<br />
<div class="separator" style="clear: both; text-align: center;">
</div>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">Read Any DELIMITED Data</span><br />
<span style="font-size: large;"><span style="font-size: medium;"><br /></span>
</span><br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh51tv4WinAdblshBgB6QxZMd6oIdPH8YptCDsbz3Si0pyiQoc6kQDSnLt9PqW3DmBr0BkHMvBZqmrSTW3BCOLGe73Onmk_aEZv_CRu0-DIwNc_YoxdyKl8s-L7o-3f-maLEojLEboYlG9R/s1600/read+delimited+data.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><span style="font-size: large;"><img border="0" height="358" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh51tv4WinAdblshBgB6QxZMd6oIdPH8YptCDsbz3Si0pyiQoc6kQDSnLt9PqW3DmBr0BkHMvBZqmrSTW3BCOLGe73Onmk_aEZv_CRu0-DIwNc_YoxdyKl8s-L7o-3f-maLEojLEboYlG9R/s640/read+delimited+data.png" width="640" /></span></a></div>
<span style="font-size: large;"><span style="font-size: medium;"><br /></span>
<span style="font-size: medium;"><br /></span>
<span style="font-size: large;"><b>Use Case:</b> Generating Sample Product Log with simple command</span></span><br />
<span style="font-size: large;"><span style="font-size: medium;"><br /></span>
<span style="font-size: medium;">java -cp /home/orienit/kalyan_bigdata_projects/kalyan-bigdata-examples.jar \</span></span><br />
<span style="font-size: large;">com.orienit.kalyan.examples.GenerateProductLog</span><br />
<span style="font-size: large;"><span style="font-size: medium;"><br /></span>
</span><br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEijtqIPbK2FzGhuor7SZxait1LoGhW7sWChYqCSMoQBpcDbQqRd3pTDzq3zKwx_KO9Y1sZpZaaMxpWBozDOzTFLzotCZt___QfUqai6xRrIRDUkgw0aMtZmsX-8OBXZSYRlkFUeRwwO042J/s1600/product+log+usage.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><span style="font-size: large;"><img border="0" height="178" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEijtqIPbK2FzGhuor7SZxait1LoGhW7sWChYqCSMoQBpcDbQqRd3pTDzq3zKwx_KO9Y1sZpZaaMxpWBozDOzTFLzotCZt___QfUqai6xRrIRDUkgw0aMtZmsX-8OBXZSYRlkFUeRwwO042J/s640/product+log+usage.png" width="640" /></span></a></div>
<span style="font-size: large;"><span style="font-size: medium;"><br /></span>
<span style="font-size: medium;"><br /></span>
<span style="font-size: medium;">We can pass different arguments for above command</span></span><br />
<span style="font-size: large;"><span style="font-size: medium;"><br /></span>
<span style="font-size: medium;"><i>-d => field delimiter like (tab, comma, semicolon, etc )</i></span></span><br />
<span style="font-size: large;"><i>-f => output file path</i></span><br />
<span style="font-size: large;"><i>-l => number of logs, maximum number is 100000</i></span><br />
<span style="font-size: large;"><i>-n => number of users, maximum number is 10000</i></span><br />
<span style="font-size: large;"><i>-w => waiting time in milli sec, bydefault is 100 millisec</i></span><br />
<span style="font-size: large;"><span style="font-size: medium;"><br /></span>
<span style="font-size: medium;"><br /></span>
<span style="font-size: large;"><b>Use Case6:</b> Generating Sample Product Log in JSON format with simple command</span></span><br />
<span style="font-size: large;"><span style="font-size: medium;"><br /></span>
<span style="font-size: medium;">java -cp /home/orienit/kalyan_bigdata_projects/kalyan-bigdata-examples.jar \</span></span><br />
<span style="font-size: large;">com.orienit.kalyan.examples.GenerateProductLog \</span><br />
<span style="font-size: large;">-f /tmp/productlog.json \</span><br />
<span style="font-size: large;">-n 10 \</span><br />
<span style="font-size: large;">-l 20</span><br />
<span style="font-size: large;"><span style="font-size: medium;"><br /></span>
</span><br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhAU2YzeHkIqEQatD4VWfxKtauQkATuiZHzKoQ9vQVqvCF8IPKxkCmJ9rUa8fo2ePZNrHuxH40rDycxsOm45oMpoll3iVvS7IWvXqqeSj2zX-Ib6jaXWWn3IUEip1awp5h3D43-Laqsdv95/s1600/generate+product+log+json+data.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><span style="font-size: large;"><img border="0" height="150" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhAU2YzeHkIqEQatD4VWfxKtauQkATuiZHzKoQ9vQVqvCF8IPKxkCmJ9rUa8fo2ePZNrHuxH40rDycxsOm45oMpoll3iVvS7IWvXqqeSj2zX-Ib6jaXWWn3IUEip1awp5h3D43-Laqsdv95/s640/generate+product+log+json+data.png" width="640" /></span></a></div>
<span style="font-size: large;"><span style="font-size: medium;"><br /></span>
<span style="font-size: medium;"><br /></span>
<span style="font-size: medium;">Read JSON data</span></span><br />
<span style="font-size: large;"><span style="font-size: medium;"><br /></span>
</span><br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgq7MglXMTRdQA8AAG4tPMMcLzSpe_LR7QOwhcutLMKpCxxnBqNr7jx4VwDG5ugaUczIjVoHeqo40GaV49SG8nu_RR44sjj9WXKwDRPYHZ8kYdf1C96XngKqXpoTqiZE3TkAoNW9haJ7eIN/s1600/read+product+log+json+data.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><span style="font-size: large;"><img border="0" height="358" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgq7MglXMTRdQA8AAG4tPMMcLzSpe_LR7QOwhcutLMKpCxxnBqNr7jx4VwDG5ugaUczIjVoHeqo40GaV49SG8nu_RR44sjj9WXKwDRPYHZ8kYdf1C96XngKqXpoTqiZE3TkAoNW9haJ7eIN/s640/read+product+log+json+data.png" width="640" /></span></a></div>
<span style="font-size: large;"><span style="font-size: medium;"><br /></span>
<span style="font-size: medium;"><br /></span>
<span style="font-size: large;"><b>Use Case7:</b> Generating Sample Product Log in CSV format with simple command</span></span><br />
<span style="font-size: large;"><span style="font-size: medium;"><br /></span>
<span style="font-size: medium;">java -cp /home/orienit/kalyan_bigdata_projects/kalyan-bigdata-examples.jar \</span></span><br />
<span style="font-size: large;">com.orienit.kalyan.examples.GenerateProductLog \</span><br />
<span style="font-size: large;">-f /tmp/productlog.csv \</span><br />
<span style="font-size: large;">-d ',' \</span><br />
<span style="font-size: large;">-n 10 \</span><br />
<span style="font-size: large;">-l 20</span><br />
<span style="font-size: large;"><span style="font-size: medium;"><br /></span>
</span><br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhyf5GQnWU0OaOF4tySrcGv-RBuSnRcQfOkKmRtq__YLPbRn49iIhMGfpwKJPADeEARoCvLT81eDs0_K5I2LKMC2t0TXWo_LxCt4Ad4ugp_xy4mA9tCbRUT934CCcIzcQEhPo0a4qouvQPE/s1600/generate+product+log+csv+data.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><span style="font-size: large;"><img border="0" height="150" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhyf5GQnWU0OaOF4tySrcGv-RBuSnRcQfOkKmRtq__YLPbRn49iIhMGfpwKJPADeEARoCvLT81eDs0_K5I2LKMC2t0TXWo_LxCt4Ad4ugp_xy4mA9tCbRUT934CCcIzcQEhPo0a4qouvQPE/s640/generate+product+log+csv+data.png" width="640" /></span></a></div>
<span style="font-size: large;"><br /></span>
<br />
<div class="separator" style="clear: both; text-align: center;">
</div>
<span style="font-size: large;"><span style="font-size: medium;"><br /></span>
<span style="font-size: medium;">Read CSV data</span></span><br />
<span style="font-size: large;"><span style="font-size: medium;"><br /></span>
</span><br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh7POVCJijuXaGH4O4hGkbrl46UcccPq8YzQcQSCg8Gf8ra2h48VXJTc0MFN7vUlQuSd_0oVgPGNNMs_hbmOKP6qmD9cJ70WIjiuNnusWgywyntiDGttFGZwQMYHBbYmMDMVw6NFpi9H7km/s1600/read+product+log+csv+data.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><span style="font-size: large;"><img border="0" height="358" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh7POVCJijuXaGH4O4hGkbrl46UcccPq8YzQcQSCg8Gf8ra2h48VXJTc0MFN7vUlQuSd_0oVgPGNNMs_hbmOKP6qmD9cJ70WIjiuNnusWgywyntiDGttFGZwQMYHBbYmMDMVw6NFpi9H7km/s640/read+product+log+csv+data.png" width="640" /></span></a></div>
<span style="font-size: large;"><span style="font-size: medium;"><br /></span>
<span style="font-size: medium;"><br /></span>
<span style="font-size: large;"><b>Use Case8:</b> Generating Sample Product Log in TSV format with simple command</span></span><br />
<span style="font-size: large;"><span style="font-size: medium;"><br /></span>
<span style="font-size: medium;">java -cp /home/orienit/kalyan_bigdata_projects/kalyan-bigdata-examples.jar \</span></span><br />
<span style="font-size: large;">com.orienit.kalyan.examples.GenerateProductLog \</span><br />
<span style="font-size: large;">-f /tmp/productlog.tsv \</span><br />
<span style="font-size: large;">-d '\t' \</span><br />
<span style="font-size: large;">-n 10 \</span><br />
<span style="font-size: large;">-l 20</span><br />
<span style="font-size: large;"><span style="font-size: medium;"><br /></span>
</span><br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiBlPRYAdS9ZbREvvQVOL-PLBP2MXu3pMdE7PNzHz-ZkSHzQSRmVaWW7i8xARDgyY9NxsfUNwJ9tuekHbyjzudnXVWZrLsS-4ZiKCpnIzstTAfwL0KKZr4qHrJB6mMkvFFuteMnCYJd9r3H/s1600/generate+product+log+tsv+data.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><span style="font-size: large;"><img border="0" height="150" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiBlPRYAdS9ZbREvvQVOL-PLBP2MXu3pMdE7PNzHz-ZkSHzQSRmVaWW7i8xARDgyY9NxsfUNwJ9tuekHbyjzudnXVWZrLsS-4ZiKCpnIzstTAfwL0KKZr4qHrJB6mMkvFFuteMnCYJd9r3H/s640/generate+product+log+tsv+data.png" width="640" /></span></a></div>
<span style="font-size: large;"><br /></span>
<br />
<div class="separator" style="clear: both; text-align: center;">
</div>
<span style="font-size: large;"><span style="font-size: medium;"><br /></span>
<span style="font-size: medium;"><br /></span>
<span style="font-size: medium;">Read TSV data</span></span><br />
<span style="font-size: large;"><span style="font-size: medium;"><br /></span>
</span><br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg2SnM2EkZNorxb25MH0rTJaZ0OATGUQ5dRkpAxlTjblguiTNbJZ1dPTOoyxhh1nKxYkQuDk4eRgEPu7xNgUvsioC4izVI_SlzHE75pbtoZvbaf1n5bfJZh85Mp9QBKyliYHJtmzl0Ner3r/s1600/read+product+log+tsv+data.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><span style="font-size: large;"><img border="0" height="358" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg2SnM2EkZNorxb25MH0rTJaZ0OATGUQ5dRkpAxlTjblguiTNbJZ1dPTOoyxhh1nKxYkQuDk4eRgEPu7xNgUvsioC4izVI_SlzHE75pbtoZvbaf1n5bfJZh85Mp9QBKyliYHJtmzl0Ner3r/s640/read+product+log+tsv+data.png" width="640" /></span></a></div>
<span style="font-size: large;"><span style="font-size: medium;"><br /></span>
<span style="font-size: large;"><span style="font-size: small;"><br /></span>
<span style="font-size: small;"><span style="font-size: xx-small;"><b>Use Case9:</b> Generating Sample Product Log in DELIMITED format with simple </span>command</span></span></span><br />
<span style="font-size: large;"><span style="font-size: medium;"><br /></span>
<span style="font-size: medium;">java -cp /home/orienit/kalyan_bigdata_projects/kalyan-bigdata-examples.jar \</span></span><br />
<span style="font-size: large;">com.orienit.kalyan.examples.GenerateProductLog \</span><br />
<span style="font-size: large;">-f /tmp/productlog.txt \</span><br />
<span style="font-size: large;">-d '#' \</span><br />
<span style="font-size: large;">-n 10 \</span><br />
<span style="font-size: large;">-l 20</span><br />
<span style="font-size: large;"><span style="font-size: medium;"><br /></span>
</span><br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi-qJTKf-_FNQG2sCnByIpqvrGlhrtV8mDx0W_pVOnDG_Dh1fwpLfRH141pLYW_gVnO35SwQ3G-qS_x61osk9my88I8To5s9dewAHo0na9cgiNBJ1HW7pO4s4rr73jTeRY0H_7MpKs-VE3E/s1600/generate+product+log+delimited+data.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><span style="font-size: large;"><img border="0" height="150" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi-qJTKf-_FNQG2sCnByIpqvrGlhrtV8mDx0W_pVOnDG_Dh1fwpLfRH141pLYW_gVnO35SwQ3G-qS_x61osk9my88I8To5s9dewAHo0na9cgiNBJ1HW7pO4s4rr73jTeRY0H_7MpKs-VE3E/s640/generate+product+log+delimited+data.png" width="640" /></span></a></div>
<span style="font-size: large;"><br /></span>
<br />
<div class="separator" style="clear: both; text-align: center;">
</div>
<span style="font-size: large;"><span style="font-size: medium;"><br /></span>
<span style="font-size: medium;"><br /></span>
<span style="font-size: medium;">Read Any DELIMITED data</span></span><br />
<span style="font-size: large;"><span style="font-size: medium;"><br /></span>
</span><br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjlDuNFV4_ZwUVoZLg3Ag9wR3YqQ1-DXBUxsyOb8x-VVipsxu5hWtukBEmmAZqiIafnK6iBcRG805zE3ZRg_dJ_59oKUbz7FgN12lY8DRu1_YakAPlcMrcdwl1HzBFxvGtuDWsAHOGuMBVg/s1600/read+product+log+delimited+data.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><span style="font-size: large;"><img border="0" height="358" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjlDuNFV4_ZwUVoZLg3Ag9wR3YqQ1-DXBUxsyOb8x-VVipsxu5hWtukBEmmAZqiIafnK6iBcRG805zE3ZRg_dJ_59oKUbz7FgN12lY8DRu1_YakAPlcMrcdwl1HzBFxvGtuDWsAHOGuMBVg/s640/read+product+log+delimited+data.png" width="640" /></span></a></div>
<span style="font-size: large;"><span style="font-size: medium;"><br /></span>
<span style="font-size: medium;"><br /></span></span>
<span style="font-size: large;"><br /></span></div>
Anonymoushttp://www.blogger.com/profile/01685371156177399870noreply@blogger.com1tag:blogger.com,1999:blog-2182833570422175384.post-14188360344321717652016-11-27T07:33:00.001-08:002016-11-27T07:33:22.992-08:00Spark Interview Questions & Answers - Part 3<div dir="ltr" style="text-align: left;" trbidi="on">
<b style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;"><span style="color: purple; font-size: medium;">1.What is Apache Spark?</span></b><br style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;" /><span style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: medium;">Spark is a fast, easy-to-use and flexible data processing framework. It has an advanced execution engine supporting cyclic data flow and in-memory computing. Spark can run on Hadoop, standalone or in the cloud and is capable of accessing diverse data sources including HDFS, HBase, Cassandra and others.</span><br style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;" /><span style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: medium;"><br /></span><span style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;"></span><b style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;"><span style="color: purple; font-size: medium;">2.Explain key features of Spark.</span></b><br style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;" /><br />
<ul style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 1.4; list-style-image: initial; list-style-position: initial; margin: 0.5em 0px; padding: 0px 2.5em;">
<li style="border: none; margin: 0px 0px 0.25em; padding: 0.25em 0px;"><span style="font-size: medium;">Allows Integration with Hadoop and files included in HDFS.</span></li>
<li style="border: none; margin: 0px 0px 0.25em; padding: 0.25em 0px;"><span style="font-size: medium;">Spark has an interactive language shell as it has an independent Scala (the language in which Spark is written) interpreter</span></li>
<li style="border: none; margin: 0px 0px 0.25em; padding: 0.25em 0px;"><span style="font-size: medium;">Spark consists of RDD’s (Resilient Distributed Datasets), which can be cached across computing nodes in a cluster.</span></li>
<li style="border: none; margin: 0px 0px 0.25em; padding: 0.25em 0px;"><span style="font-size: medium;">Spark supports multiple analytic tools that are used for interactive query analysis , real-time analysis and graph processing</span></li>
</ul>
<br style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;" /><span style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: medium;"><br /></span><span style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;"></span><b style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;"><span style="color: purple; font-size: medium;">3.Define RDD.</span></b><br style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;" /><span style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: medium;">RDD is the acronym for Resilient Distribution Datasets – a fault-tolerant collection of operational elements that run parallel. The partitioned data in RDD is immutable and distributed. There are primarily two types of RDD:</span><br style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;" /><br />
<ul style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 1.4; list-style-image: initial; list-style-position: initial; margin: 0.5em 0px; padding: 0px 2.5em;">
<li style="border: none; margin: 0px 0px 0.25em; padding: 0.25em 0px;"><span style="font-size: medium;">Parallelized Collections : The existing RDD’s running parallel with one another</span></li>
<li style="border: none; margin: 0px 0px 0.25em; padding: 0.25em 0px;"><span style="font-size: medium;">Hadoop datasets: perform function on each file record in HDFS or other storage system</span></li>
</ul>
<br style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;" /><span style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: medium;"><br /></span><span style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;"></span><b style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;"><span style="color: purple; font-size: medium;">4.What does a Spark Engine do?</span></b><br style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;" /><span style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: medium;">Spark Engine is responsible for scheduling, distributing and monitoring the data application across the cluster.</span><br style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;" /><span style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: medium;"><br /></span><span style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;"></span><b style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;"><span style="color: purple; font-size: medium;">5.Define Partitions?</span></b><br style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;" /><span style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: medium;">As the name suggests, partition is a smaller and logical division of data similar to ‘split’ in MapReduce. Partitioning is the process to derive logical units of data to speed up the processing process. Everything in Spark is a partitioned RDD.</span><br style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;" /><span style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: medium;"><br /></span><span style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;"></span><b style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;"><span style="color: purple; font-size: medium;">6.What operations RDD support?</span></b><br style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;" /><br />
<ul style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 1.4; list-style-image: initial; list-style-position: initial; margin: 0.5em 0px; padding: 0px 2.5em;">
<li style="border: none; margin: 0px 0px 0.25em; padding: 0.25em 0px;"><span style="font-size: medium;">Transformations</span></li>
<li style="border: none; margin: 0px 0px 0.25em; padding: 0.25em 0px;"><span style="font-size: medium;">Actions</span></li>
</ul>
<br style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;" /><span style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: medium;"><br /></span><span style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;"></span><b style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;"><span style="color: purple; font-size: medium;">7.What do you understand by Transformations in Spark?</span></b><br style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;" /><span style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: medium;">Transformations are functions applied on RDD, resulting into another RDD. It does not execute until an action occurs. map() and filer() are examples of transformations, where the former applies the function passed to it on each element of RDD and results into another RDD. The filter() creates a new RDD by selecting elements form current RDD that pass function argument.</span><br style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;" /><span style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: medium;"><br /></span><span style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;"></span><b style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;"><span style="color: purple; font-size: medium;">8. Define Actions.</span></b><br style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;" /><span style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: medium;">An action helps in bringing back the data from RDD to the local machine. An action’s execution is the result of all previously created transformations. reduce() is an action that implements the function passed again and again until one value if left. take() action takes all the values from RDD to local node.</span><br style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;" /><span style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: medium;"><br /></span><span style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;"></span><b style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;"><span style="color: purple; font-size: medium;">9.Define functions of SparkCore.</span></b><br style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;" /><span style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: medium;">Serving as the base engine, SparkCore performs various important functions like memory management, monitoring jobs, fault-tolerance, job scheduling and interaction with storage systems.</span><br style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;" /><span style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: medium;"><br /></span><span style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;"></span><b style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;"><span style="color: purple; font-size: medium;">10.What is RDD Lineage?</span></b><br style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;" /><span style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: medium;">Spark does not support data replication in the memory and thus, if any data is lost, it is rebuild using RDD lineage. RDD lineage is a process that reconstructs lost data partitions. The best is that RDD always remembers how to build from other datasets.</span><br style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;" /><span style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: medium;"><br /></span><span style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;"></span><b style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;"><span style="color: purple; font-size: medium;">11.What is Spark Driver?</span></b><br style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;" /><span style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: medium;">Spark Driver is the program that runs on the master node of the machine and declares transformations and actions on data RDDs. In simple terms, driver in Spark creates SparkContext, connected to a given Spark Master.</span><br style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;" /><span style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: medium;"><br /></span><span style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;"></span><span style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: medium;">The driver also delivers the RDD graphs to Master, where the standalone cluster manager runs.</span><br style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;" /><span style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: medium;"><br /></span><span style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;"></span><b style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;"><span style="color: purple; font-size: medium;">12.What is Hive on Spark?</span></b><br style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;" /><span style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: medium;">Hive contains significant support for Apache Spark, wherein Hive execution is configured to Spark:</span><br style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;" /><span style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: medium;">hive> set spark.home=/location/to/sparkHome;</span><br style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;" /><span style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: medium;">hive> set hive.execution.engine=spark;</span><br style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;" /><span style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: medium;"><br /></span><span style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;"></span><span style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: medium;">Hive on Spark supports Spark on yarn mode by default.</span><br style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;" /><span style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: medium;"><br /></span><span style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;"></span><span style="background-color: white; color: purple; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: medium;"><b>13.Name commonly-used Spark Ecosystems.</b></span><br style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;" /><br />
<ul style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 1.4; list-style-image: initial; list-style-position: initial; margin: 0.5em 0px; padding: 0px 2.5em;">
<li style="border: none; margin: 0px 0px 0.25em; padding: 0.25em 0px;"><span style="font-size: medium;"><b>Spark SQL </b>(Shark)- for developers</span></li>
<li style="border: none; margin: 0px 0px 0.25em; padding: 0.25em 0px;"><span style="font-size: medium;"><b>Spark Streaming</b> for processing live data streams</span></li>
<li style="border: none; margin: 0px 0px 0.25em; padding: 0.25em 0px;"><span style="font-size: medium;"><b>GraphX</b> for generating and computing graphs</span></li>
<li style="border: none; margin: 0px 0px 0.25em; padding: 0.25em 0px;"><span style="font-size: medium;"><b>MLlib</b> (Machine Learning Algorithms)</span></li>
<li style="border: none; margin: 0px 0px 0.25em; padding: 0.25em 0px;"><span style="font-size: medium;"><b>SparkR</b> to promote R Programming in Spark engine.</span></li>
</ul>
<br style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;" /><span style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: medium;"><br /></span><span style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;"></span><b style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;"><span style="color: purple; font-size: medium;">14.Define Spark Streaming.</span></b><br style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;" /><span style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: medium;">Spark supports stream processing – an extension to the Spark API , allowing stream processing of live data streams. The data from different sources like Flume, HDFS is streamed and finally processed to file systems, live dashboards and databases. It is similar to batch processing as the input data is divided into streams like batches.</span><br style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;" /><span style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: medium;"><br /></span><span style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;"></span><b style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;"><span style="color: purple; font-size: medium;">15.What is GraphX?</span></b><br style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;" /><span style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: medium;">Spark uses GraphX for graph processing to build and transform interactive graphs. The GraphX component enables programmers to reason about structured data at scale.</span><br style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;" /><span style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: medium;"><br /></span><span style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;"></span><span style="background-color: white; color: purple; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: medium;"><b>16.What does MLlib do?</b></span><br style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;" /><span style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: medium;">MLlib is scalable machine learning library provided by Spark. It aims at making machine learning easy and scalable with common learning algorithms and use cases like clustering, regression filtering, dimensional reduction, and alike.</span><br style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;" /><span style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: medium;"><br /></span><span style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;"></span><b style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;"><span style="color: purple; font-size: medium;">17.What is Spark SQL?</span></b><br style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;" /><span style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: medium;">SQL Spark, better known as Shark is a novel module introduced in Spark to work with structured data and perform structured data processing. Through this module, Spark executes relational SQL queries on the data. The core of the component supports an altogether different RDD called SchemaRDD, composed of rows objects and schema objects defining data type of each column in the row. It is similar to a table in relational database.</span><br style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;" /><span style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: medium;"><br /></span><span style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;"></span><b style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;"><span style="color: purple; font-size: medium;">18.What is a Parquet file?</span></b><br style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;" /><span style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: medium;">Parquet is a columnar format file supported by many other data processing systems. Spark SQL performs both read and write operations with Parquet file and consider it be one of the best big data analytics format so far.</span><br style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;" /><span style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: medium;"><br /></span><span style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;"></span><b style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;"><span style="color: purple; font-size: medium;">19.What file systems Spark support?</span></b><br style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;" /><span style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: medium;">• Hadoop Distributed File System (HDFS)</span><br style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;" /><span style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: medium;">• Local File system</span><br style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;" /><span style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: medium;">• S3</span><br style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;" /><span style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: medium;"><br /></span><span style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;"></span><b style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;"><span style="color: purple; font-size: medium;">20.What is Yarn?</span></b><br style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;" /><span style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: medium;">Similar to Hadoop, Yarn is one of the key features in Spark, providing a central and resource management platform to deliver scalable operations across the cluster . Running Spark on Yarn necessitates a binary distribution of Spar as built on Yarn support.</span><br style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;" /><span style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: medium;"><br /></span><span style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;"></span><b style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;"><span style="color: purple; font-size: medium;">21.List the functions of Spark SQL.</span></b><br style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;" /><span style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: medium;">Spark SQL is capable of:</span><br style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;" /><span style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: medium;">• Loading data from a variety of structured sources</span><br style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;" /><span style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: medium;">• Querying data using SQL statements, both inside a Spark program and from external tools that connect to Spark SQL through standard database connectors (JDBC/ODBC). For instance, using business intelligence tools like Tableau</span><br style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;" /><span style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: medium;">• Providing rich integration between SQL and regular Python/Java/Scala code, including the ability to join RDDs and SQL tables, expose custom functions in SQL, and more</span><br style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;" /><span style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: medium;"><br /></span><span style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;"></span><b style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;"><span style="color: purple; font-size: medium;">22.What are benefits of Spark over MapReduce?</span></b><br style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;" /><br />
<ul style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 1.4; list-style-image: initial; list-style-position: initial; margin: 0.5em 0px; padding: 0px 2.5em;">
<li style="border: none; margin: 0px 0px 0.25em; padding: 0.25em 0px;"><span style="font-size: medium;">Due to the availability of in-memory processing, Spark implements the processing around 10-100x faster than Hadoop MapReduce. MapReduce makes use of persistence storage for any of the data processing tasks.</span></li>
<li style="border: none; margin: 0px 0px 0.25em; padding: 0.25em 0px;"><span style="font-size: medium;">Unlike Hadoop, Spark provides in-built libraries to perform multiple tasks form the same core like batch processing, Steaming, Machine learning, Interactive SQL queries. However, Hadoop only supports batch processing.</span></li>
<li style="border: none; margin: 0px 0px 0.25em; padding: 0.25em 0px;"><span style="font-size: medium;">Hadoop is highly disk-dependent whereas Spark promotes caching and in-memory data storage</span></li>
<li style="border: none; margin: 0px 0px 0.25em; padding: 0.25em 0px;"><span style="font-size: medium;">Spark is capable of performing computations multiple times on the same dataset. This is called iterative computation while there is no iterative computing implemented by Hadoop.</span></li>
</ul>
<br style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;" /><br style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;" /><b style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;"><span style="color: purple; font-size: medium;">23.Is there any benefit of learning MapReduce, then?</span></b><br style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;" /><span style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: medium;">Yes, MapReduce is a paradigm used by many big data tools including Spark as well. It is extremely relevant to use MapReduce when the data grows bigger and bigger. Most tools like Pig and Hive convert their queries into MapReduce phases to optimize them better.</span><br style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;" /><span style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: medium;"><br /></span><span style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;"></span><b style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;"><span style="color: purple; font-size: medium;">24.What is Spark Executor?</span></b><br style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;" /><span style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: medium;">When SparkContext connect to a cluster manager, it acquires an Executor on nodes in the cluster. Executors are Spark processes that run computations and store the data on the worker node. The final tasks by SparkContext are transferred to executors for their execution.</span><br style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;" /><span style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: medium;"><br /></span><span style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;"></span><b style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;"><span style="color: purple; font-size: medium;">25.Name types of Cluster Managers in Spark.</span></b><br style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;" /><span style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: medium;">The Spark framework supports three major types of Cluster Managers:</span><br style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;" /><br />
<ul style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 1.4; list-style-image: initial; list-style-position: initial; margin: 0.5em 0px; padding: 0px 2.5em;">
<li style="border: none; margin: 0px 0px 0.25em; padding: 0.25em 0px;"><span style="font-size: medium;">Standalone: a basic manager to set up a cluster</span></li>
<li style="border: none; margin: 0px 0px 0.25em; padding: 0.25em 0px;"><span style="font-size: medium;">Apache Mesos: generalized/commonly-used cluster manager, also runs Hadoop MapReduce and other applications</span></li>
<li style="border: none; margin: 0px 0px 0.25em; padding: 0.25em 0px;"><span style="font-size: medium;">Yarn: responsible for resource management in Hadoop</span></li>
</ul>
<br style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;" /><span style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: medium;"><br /></span><span style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;"></span><b style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;"><span style="color: purple; font-size: medium;">26.What do you understand by worker node?</span></b><br style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;" /><span style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: medium;">Worker node refers to any node that can run the application code in a cluster.</span><br style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;" /><span style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: medium;"><br /></span><span style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;"></span><b style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;"><span style="color: purple; font-size: medium;">27.What is PageRank?</span></b><br style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;" /><span style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: medium;">A unique feature and algorithm in graph, PageRank is the measure of each vertex in the graph. For instance, an edge from u to v represents endorsement of v’s importance by u. In simple terms, if a user at Instagram is followed massively, it will rank high on that platform.</span><br style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;" /><span style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: medium;"><br /></span><span style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;"></span><b style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;"><span style="color: purple; font-size: medium;">28.Do you need to install Spark on all nodes of Yarn cluster while running Spark on Yarn?</span></b><br style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;" /><span style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: medium;">No because Spark runs on top of Yarn.</span><br style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;" /><span style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: medium;"><br /></span><span style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;"></span><b style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;"><span style="color: purple; font-size: medium;">29.Illustrate some demerits of using Spark.</span></b><br style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;" /><span style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: medium;">Since Spark utilizes more storage space compared to Hadoop and MapReduce, there may arise certain problems. Developers need to be careful while running their applications in Spark. Instead of running everything on a single node, the work must be distributed over multiple clusters. </span><br style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;" /><span style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: medium;"><br /></span><span style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;"></span><b style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;"><span style="color: purple; font-size: medium;">30.How to create RDD?</span></b><br style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;" /><span style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: medium;">Spark provides two methods to create RDD:</span><br style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;" /><span style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: medium;">• By parallelizing a collection in your Driver program. This makes use of SparkContext’s ‘parallelize’ method</span><br style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;" /><span style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: medium;">val </span><span style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: medium;">data</span><span style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: medium;"> = Array(2,4,6,8,10)</span><br style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;" /><span style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: medium;">val distData = sc.parallelize(data)</span><br style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;" /><span style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: medium;"><br /></span><span style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px;"></span><span style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: medium;">• By loading an external dataset from external storage like HDFS, HBase, shared file system</span><br />
<span style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: medium;"><br /></span></div>
Anonymoushttp://www.blogger.com/profile/01685371156177399870noreply@blogger.com0tag:blogger.com,1999:blog-2182833570422175384.post-21545385867665004552016-11-27T07:32:00.001-08:002016-11-27T07:32:19.780-08:00Spark Interview Questions & Answers - Part 2<div dir="ltr" style="text-align: left;" trbidi="on">
<div dir="ltr" style="background-color: white; box-sizing: border-box; font-family: "Helvetica Neue", Helvetica, Arial, sans-serif; font-size: 13px; line-height: 1.38; margin-bottom: 8pt; margin-top: 8pt;">
<span style="background-color: transparent; box-sizing: border-box; font-family: Arial; font-weight: bold; line-height: 1.38; white-space: pre-wrap;"><span style="color: red; font-size: medium;">Q1: Say I have a huge list of numbers in RDD(say myrdd). And I wrote the following code to compute average: </span></span></div>
<pre class="r" style="background-color: whitesmoke; border-radius: 4px; border: 1px solid rgb(204, 204, 204); box-sizing: border-box; color: #333333; font-family: Monaco, Menlo, Consolas, "Courier New", monospace; font-size: 13px; line-height: 1.42857; margin-bottom: 10px; padding: 9.5px; white-space: pre-wrap; word-break: break-all; word-wrap: break-word;"><code class="r" style="background-color: transparent; border-radius: 0px; box-sizing: border-box; color: inherit; font-family: Monaco, Menlo, Consolas, "Courier New", monospace; padding: 0px;"><span style="font-size: medium;"><span class="identifier" style="box-sizing: border-box;">def myAvg(x, y):</span>
<span class="keyword" style="box-sizing: border-box;"> return (x+y)/2.0;</span>
<span class="keyword" style="box-sizing: border-box;">avg = myrdd.reduce(myAvg);</span></span></code></pre>
<div style="background-color: white; box-sizing: border-box; color: #333333; font-family: "Helvetica Neue", Helvetica, Arial, sans-serif; font-size: 13px; line-height: 20px; margin-bottom: 10px;">
<strong style="box-sizing: border-box;"><span style="font-size: medium;"><br /></span></strong></div>
<div style="background-color: white; box-sizing: border-box; color: #333333; font-family: "Helvetica Neue", Helvetica, Arial, sans-serif; font-size: 13px; line-height: 20px; margin-bottom: 10px;">
<strong style="box-sizing: border-box;"><span style="font-size: medium;">What is wrong with it? And How would you correct it?</span></strong></div>
<div dir="ltr" style="background-color: white; box-sizing: border-box; color: #333333; font-family: "Helvetica Neue", Helvetica, Arial, sans-serif; font-size: 13px; line-height: 1.38; margin-bottom: 8pt; margin-top: 8pt;">
<span id="docs-internal-guid-9e43148d-dc1c-e718-3ac5-5c74a9820309" style="box-sizing: border-box; font-size: medium;"><span style="background-color: transparent; box-sizing: border-box; color: black; font-family: Arial; vertical-align: baseline; white-space: pre-wrap;">Ans: </span>The average function is not commutative and associative;</span></div>
<div style="background-color: white; box-sizing: border-box; color: #333333; font-family: "Helvetica Neue", Helvetica, Arial, sans-serif; font-size: 13px; line-height: 20px; margin-bottom: 10px;">
<span style="font-size: medium;">I would simply sum it and then divide by count.</span></div>
<pre class="r" style="background-color: whitesmoke; border-radius: 4px; border: 1px solid rgb(204, 204, 204); box-sizing: border-box; color: #333333; font-family: Monaco, Menlo, Consolas, "Courier New", monospace; font-size: 13px; line-height: 1.42857; margin-bottom: 10px; padding: 9.5px; white-space: pre-wrap; word-break: break-all; word-wrap: break-word;"><code class="r" style="background-color: transparent; border-radius: 0px; box-sizing: border-box; color: inherit; font-family: Monaco, Menlo, Consolas, "Courier New", monospace; padding: 0px;"><span style="font-size: medium;"><span class="identifier" style="box-sizing: border-box;">def sum(x, y):</span>
<span class="identifier" style="box-sizing: border-box;"> return x+y;</span>
<span class="keyword" style="box-sizing: border-box;">total = myrdd.reduce(sum);</span>
<span class="keyword" style="box-sizing: border-box;">avg = total / myrdd.count();</span></span></code></pre>
<div style="background-color: white; box-sizing: border-box; color: #333333; font-family: "Helvetica Neue", Helvetica, Arial, sans-serif; font-size: 13px; line-height: 20px; margin-bottom: 10px;">
<span style="font-size: medium;">The only problem with the above code is that the total might become very big thus over flow. So, I would rather divide each number by count and then sum in the following way.</span></div>
<pre class="r" style="background-color: whitesmoke; border-radius: 4px; border: 1px solid rgb(204, 204, 204); box-sizing: border-box; color: #333333; font-family: Monaco, Menlo, Consolas, "Courier New", monospace; font-size: 13px; line-height: 1.42857; margin-bottom: 10px; padding: 9.5px; white-space: pre-wrap; word-break: break-all; word-wrap: break-word;"><code class="r" style="background-color: transparent; border-radius: 0px; box-sizing: border-box; color: inherit; font-family: Monaco, Menlo, Consolas, "Courier New", monospace; padding: 0px;"><span style="font-size: medium;"><span class="identifier" style="box-sizing: border-box;">cnt = myrdd.count();</span>
<span class="identifier" style="box-sizing: border-box;">def devideByCnd(x):</span>
<span class="identifier" style="box-sizing: border-box;"> return x/cnt;</span>
<span class="keyword" style="box-sizing: border-box;">myrdd1 = myrdd.map(devideByCnd);</span>
<span class="keyword" style="box-sizing: border-box;">avg = myrdd.reduce(sum);</span></span></code></pre>
<div dir="ltr" style="background-color: white; box-sizing: border-box; font-family: "Helvetica Neue", Helvetica, Arial, sans-serif; font-size: 13px; line-height: 1.38; margin-bottom: 8pt; margin-top: 8pt;">
<span style="box-sizing: border-box;"><span style="background-color: transparent; box-sizing: border-box; font-family: Arial; font-weight: bold; vertical-align: baseline; white-space: pre-wrap;"><span style="color: red;"><span style="background-color: transparent; box-sizing: border-box; font-size: medium; line-height: 1.38;"><br /></span></span></span></span></div>
<div dir="ltr" style="background-color: white; box-sizing: border-box; font-family: "Helvetica Neue", Helvetica, Arial, sans-serif; font-size: 13px; line-height: 1.38; margin-bottom: 8pt; margin-top: 8pt;">
<span id="docs-internal-guid-9e43148d-dc1c-e718-3ac5-5c74a9820309" style="box-sizing: border-box;"><span style="background-color: transparent; box-sizing: border-box; font-family: Arial; font-weight: bold; vertical-align: baseline; white-space: pre-wrap;"><span style="color: red; font-size: medium;"><span style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial;"><a href="https://draft.blogger.com/null" id="q2" name="q2" style="background: transparent; box-sizing: border-box; color: #572f69; text-decoration: none;"></a></span><span style="background-color: transparent; box-sizing: border-box; line-height: 1.38;">Q2: Say I have a huge list of numbers in a file in HDFS. Each line has one number.And I want to compute the square root of sum of squares of these numbers. How would you do it?</span></span></span></span></div>
<div dir="ltr" style="background-color: white; box-sizing: border-box; color: #333333; font-family: "Helvetica Neue", Helvetica, Arial, sans-serif; font-size: 13px; line-height: 1.38; margin-bottom: 8pt; margin-top: 8pt;">
<span id="docs-internal-guid-9e43148d-dc1c-e718-3ac5-5c74a9820309" style="box-sizing: border-box;"><span style="background-color: transparent; box-sizing: border-box; color: black; font-family: Arial; font-size: medium; vertical-align: baseline; white-space: pre-wrap;">Ans: </span></span></div>
<div dir="ltr" style="background-color: white; box-sizing: border-box; color: #333333; font-family: "Helvetica Neue", Helvetica, Arial, sans-serif; font-size: 13px; line-height: 1.38; margin-bottom: 8pt; margin-top: 8pt;">
<span style="font-size: medium;"># We would first load the file as RDD from HDFS on spark</span></div>
<div dir="ltr" style="background-color: white; box-sizing: border-box; color: #333333; font-family: "Helvetica Neue", Helvetica, Arial, sans-serif; font-size: 13px; line-height: 1.38; margin-bottom: 8pt; margin-top: 8pt;">
<span style="font-size: medium;">numsAsText = sc.textFile("hdfs://namenode:9000/user/kayan/mynumbersfile.txt");</span></div>
<pre class="r" style="background-color: whitesmoke; border-radius: 4px; border: 1px solid rgb(204, 204, 204); box-sizing: border-box; color: #333333; font-family: Monaco, Menlo, Consolas, "Courier New", monospace; font-size: 13px; line-height: 1.42857; margin-bottom: 10px; padding: 9.5px; white-space: pre-wrap; word-break: break-all; word-wrap: break-word;"><code class="r" style="background-color: transparent; border-radius: 0px; box-sizing: border-box; color: inherit; font-family: Monaco, Menlo, Consolas, "Courier New", monospace; padding: 0px;"><span style="font-size: medium;"><span class="identifier" style="box-sizing: border-box;"><span style="box-sizing: border-box; line-height: 16.56px;"># Define the function to compute the squares</span>
def toSqInt(str):</span>
<span class="keyword" style="box-sizing: border-box;"> v = int(str);</span>
<span class="keyword" style="box-sizing: border-box;"> return v*v;</span></span></code></pre>
<pre class="r" style="background-color: whitesmoke; border-radius: 4px; border: 1px solid rgb(204, 204, 204); box-sizing: border-box; color: #333333; font-family: Monaco, Menlo, Consolas, "Courier New", monospace; font-size: 13px; line-height: 1.42857; margin-bottom: 10px; padding: 9.5px; white-space: pre-wrap; word-break: break-all; word-wrap: break-word;"><code class="r" style="background-color: transparent; border-radius: 0px; box-sizing: border-box; color: inherit; font-family: Monaco, Menlo, Consolas, "Courier New", monospace; padding: 0px;"><span style="font-size: medium;">#Run the function on spark rdd as transformation
<span class="identifier" style="box-sizing: border-box;">nums = numsAsText.map(toSqInt);
</span>#Run the summation as reduce action
<span class="identifier" style="box-sizing: border-box;">total = nums.reduce(sum)
</span>#finally compute the square root. For which we need to import math.
<span class="keyword" style="box-sizing: border-box;">import math;</span>
<span class="keyword" style="box-sizing: border-box;">print math.sqrt(total);</span>
</span></code></pre>
<div dir="ltr" style="background-color: white; box-sizing: border-box; font-family: "Helvetica Neue", Helvetica, Arial, sans-serif; font-size: 13px; line-height: 1.38; margin-bottom: 8pt; margin-top: 8pt;">
<span style="box-sizing: border-box;"><span style="background-color: transparent; box-sizing: border-box; font-family: Arial; font-weight: bold; vertical-align: baseline; white-space: pre-wrap;"><span style="color: red;"><span style="background-color: transparent; box-sizing: border-box; font-size: medium; line-height: 1.38;"><br /></span></span></span></span></div>
<div dir="ltr" style="background-color: white; box-sizing: border-box; font-family: "Helvetica Neue", Helvetica, Arial, sans-serif; font-size: 13px; line-height: 1.38; margin-bottom: 8pt; margin-top: 8pt;">
<span id="docs-internal-guid-9e43148d-dc1c-e718-3ac5-5c74a9820309" style="box-sizing: border-box;"><span style="background-color: transparent; box-sizing: border-box; font-family: Arial; font-weight: bold; vertical-align: baseline; white-space: pre-wrap;"><span style="color: red; font-size: medium;"><span style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial;"><a href="https://draft.blogger.com/null" id="q3" name="q3" style="background: transparent; box-sizing: border-box; color: #572f69; text-decoration: none;"></a></span><span style="background-color: transparent; box-sizing: border-box; line-height: 1.38;">Q3: Is the following approach correct? Is the </span><em style="background-color: transparent; box-sizing: border-box; line-height: 1.38;"><strong style="box-sizing: border-box;">sqrtOfSumOfSq</strong></em><span style="background-color: transparent; box-sizing: border-box; line-height: 1.38;"> a valid reducer?</span></span></span></span></div>
<pre class="r" style="background-color: whitesmoke; border-radius: 4px; border: 1px solid rgb(204, 204, 204); box-sizing: border-box; color: #333333; font-family: Monaco, Menlo, Consolas, "Courier New", monospace; font-size: 13px; line-height: 1.42857; margin-bottom: 10px; padding: 9.5px; white-space: pre-wrap; word-break: break-all; word-wrap: break-word;"><code class="r" style="background-color: transparent; border-radius: 0px; box-sizing: border-box; color: inherit; font-family: Monaco, Menlo, Consolas, "Courier New", monospace; padding: 0px;"><span style="font-size: medium;"><span class="identifier" style="box-sizing: border-box;">
numsAsText =sc.textFile("hdfs://namenode:9000/user/kalyan/mynumbersfile.txt");</span>
<span class="identifier" style="box-sizing: border-box;">def toInt(str):</span>
<span class="keyword" style="box-sizing: border-box;"> return int(str);</span>
<span class="keyword" style="box-sizing: border-box;">nums = numsAsText.map(toInt);</span>
<span class="identifier" style="box-sizing: border-box;">def sqrtOfSumOfSq(x, y):</span>
<span class="identifier" style="box-sizing: border-box;"> return math.sqrt(x*x+y*y);</span>
<span class="keyword" style="box-sizing: border-box;">total = nums.reduce(sum)</span>
<span class="keyword" style="box-sizing: border-box;">import math;</span>
<span class="keyword" style="box-sizing: border-box;">print math.sqrt(total);</span>
</span></code></pre>
<div dir="ltr" style="background-color: white; box-sizing: border-box; color: #333333; font-family: "Helvetica Neue", Helvetica, Arial, sans-serif; font-size: 13px; line-height: 1.38; margin-bottom: 8pt; margin-top: 8pt;">
<span id="docs-internal-guid-9e43148d-dc1c-e718-3ac5-5c74a9820309" style="box-sizing: border-box; font-size: medium;"><span style="background-color: transparent; box-sizing: border-box; color: black; font-family: Arial; vertical-align: baseline; white-space: pre-wrap;">Ans: </span>Yes. The approach is correct and <em style="box-sizing: border-box;"><strong style="box-sizing: border-box;">sqrtOfSumOfSq</strong></em> is a valid reducer.</span></div>
<div dir="ltr" style="background-color: white; box-sizing: border-box; font-family: "Helvetica Neue", Helvetica, Arial, sans-serif; font-size: 13px; line-height: 1.38; margin-bottom: 8pt; margin-top: 8pt;">
<span style="box-sizing: border-box;"><span style="background-color: transparent; box-sizing: border-box; font-family: Arial; font-weight: bold; vertical-align: baseline; white-space: pre-wrap;"><span style="color: red;"><span style="box-sizing: border-box; font-family: Arial, Verdana, sans-serif; line-height: 1.38;"><span style="background-color: transparent; box-sizing: border-box; font-family: Arial; font-size: medium; vertical-align: baseline;"><br /></span></span></span></span></span></div>
<div dir="ltr" style="background-color: white; box-sizing: border-box; font-family: "Helvetica Neue", Helvetica, Arial, sans-serif; font-size: 13px; line-height: 1.38; margin-bottom: 8pt; margin-top: 8pt;">
<span id="docs-internal-guid-9e43148d-dc1c-e718-3ac5-5c74a9820309" style="box-sizing: border-box;"><span style="background-color: transparent; box-sizing: border-box; font-family: Arial; font-weight: bold; vertical-align: baseline; white-space: pre-wrap;"><span style="color: red; font-size: medium;"><span style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial;"><a href="https://draft.blogger.com/null" id="q4" name="q4" style="background: transparent; box-sizing: border-box; color: #572f69; text-decoration: none;"></a></span><span style="box-sizing: border-box; font-family: Arial, Verdana, sans-serif; line-height: 1.38;"><span style="background-color: transparent; box-sizing: border-box; font-family: Arial; vertical-align: baseline;">Q4: Could you compare the pros and cons of the your approach (</span></span><span style="box-sizing: border-box; line-height: 20.7px;">in Question 2 above</span><span style="background-color: transparent; box-sizing: border-box; line-height: 1.38;">) and my approach (in Question 3 above)?</span></span></span></span></div>
<div dir="ltr" style="background-color: white; box-sizing: border-box; color: #333333; font-family: "Helvetica Neue", Helvetica, Arial, sans-serif; font-size: 13px; line-height: 1.38; margin-bottom: 8pt; margin-top: 8pt;">
<span id="docs-internal-guid-9e43148d-dc1c-e718-3ac5-5c74a9820309" style="box-sizing: border-box;"><span style="background-color: transparent; box-sizing: border-box; color: black; font-family: Arial; font-size: medium; vertical-align: baseline; white-space: pre-wrap;">Ans: </span></span></div>
<div dir="ltr" style="background-color: white; box-sizing: border-box; color: #333333; font-family: "Helvetica Neue", Helvetica, Arial, sans-serif; font-size: 13px; line-height: 1.38; margin-bottom: 8pt; margin-top: 8pt;">
<span style="box-sizing: border-box; font-size: medium;">You are doing the square and square root as part of reduce action while I am squaring in map() and summing in reduce in my approach.</span></div>
<div dir="ltr" style="background-color: white; box-sizing: border-box; color: #333333; font-family: "Helvetica Neue", Helvetica, Arial, sans-serif; font-size: 13px; line-height: 1.38; margin-bottom: 8pt; margin-top: 8pt;">
<span style="box-sizing: border-box; font-size: medium;">My approach will be faster because in your case the reducer code is heavy as it is calling math.sqrt() and reducer code is generally executed approximately <strong style="box-sizing: border-box;">n-1 </strong>times the spark RDD.<br style="box-sizing: border-box;" />The only downside of my approach is that there is a huge chance of integer overflow because I am computing the sum of squares as part of map.</span></div>
<div dir="ltr" style="background-color: white; box-sizing: border-box; font-family: "Helvetica Neue", Helvetica, Arial, sans-serif; font-size: 13px; line-height: 1.38; margin-bottom: 8pt; margin-top: 8pt;">
<span style="box-sizing: border-box;"><span style="background-color: transparent; box-sizing: border-box; font-family: Arial; font-weight: bold; vertical-align: baseline; white-space: pre-wrap;"><span style="color: red;"><span style="background-color: transparent; box-sizing: border-box; font-size: medium; line-height: 1.38;"><br /></span></span></span></span></div>
<div dir="ltr" style="background-color: white; box-sizing: border-box; font-family: "Helvetica Neue", Helvetica, Arial, sans-serif; font-size: 13px; line-height: 1.38; margin-bottom: 8pt; margin-top: 8pt;">
<span id="docs-internal-guid-9e43148d-dc1c-e718-3ac5-5c74a9820309" style="box-sizing: border-box;"><span style="background-color: transparent; box-sizing: border-box; font-family: Arial; font-weight: bold; vertical-align: baseline; white-space: pre-wrap;"><span style="color: red; font-size: medium;"><span style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial;"><a href="https://draft.blogger.com/null" id="q5" name="q5" style="background: transparent; box-sizing: border-box; color: #572f69; text-decoration: none;"></a></span><span style="background-color: transparent; box-sizing: border-box; line-height: 1.38;">Q5: If you have to compute the total counts of each of the unique words on spark, how would you go about it?</span></span></span></span></div>
<div dir="ltr" style="background-color: white; box-sizing: border-box; color: #333333; font-family: "Helvetica Neue", Helvetica, Arial, sans-serif; font-size: 13px; line-height: 1.38; margin-bottom: 8pt; margin-top: 8pt;">
<span id="docs-internal-guid-9e43148d-dc1c-e718-3ac5-5c74a9820309" style="box-sizing: border-box;"><span style="background-color: transparent; box-sizing: border-box; color: black; font-family: Arial; font-size: medium; vertical-align: baseline; white-space: pre-wrap;">Ans:</span></span></div>
<pre class="r" style="background-color: whitesmoke; border-radius: 4px; border: 1px solid rgb(204, 204, 204); box-sizing: border-box; color: #333333; font-family: Monaco, Menlo, Consolas, "Courier New", monospace; font-size: 13px; line-height: 1.42857; margin-bottom: 10px; padding: 9.5px; white-space: pre-wrap; word-break: break-all; word-wrap: break-word;"></pre>
<pre style="background-color: whitesmoke; border-radius: 4px; border: 1px solid rgb(204, 204, 204); box-sizing: border-box; color: #333333; font-family: Monaco, Menlo, Consolas, "Courier New", monospace; font-size: 13px; line-height: 1.42857; margin-bottom: 10px; padding: 9.5px; white-space: pre-wrap; word-break: break-all; word-wrap: break-word;"><code class="r" style="background-color: transparent; border-radius: 0px; box-sizing: border-box; color: inherit; font-family: Monaco, Menlo, Consolas, "Courier New", monospace; padding: 0px;"><span style="font-size: medium;">#This will load the bigtextfile.txt as RDD in the spark
lines = sc.textFile("hdfs://namenode:9000/user/kalyan/bigtextfile.txt");
#define a function that can break each line into words
def toWords(line):
return line.split();
# Run the toWords function on each element of RDD on spark as flatMap transformation.
# We are going to flatMap instead of map because our function is returning multiple values.
words = lines.flatMap(toWords);
# Convert each word into (key, value) pair. Her key will be the word itself and value will be 1.
def toTuple(word):
return (word, 1);
wordsTuple = words.map(toTuple);
# Now we can easily do the reduceByKey() action.
def sum(x, y):
return x+y;
counts = wordsTuple.reduceByKey(sum)
# Now, print
counts.collect()</span></code></pre>
<div style="background-color: white; box-sizing: border-box; color: #333333; font-family: "Helvetica Neue", Helvetica, Arial, sans-serif; font-size: 13px; line-height: 20px;">
</div>
<div style="background-color: white; box-sizing: border-box; color: #333333; font-family: "Helvetica Neue", Helvetica, Arial, sans-serif; font-size: 13px; line-height: 20px; margin-bottom: 10px;">
</div>
<div dir="ltr" style="background-color: white; box-sizing: border-box; font-family: "Helvetica Neue", Helvetica, Arial, sans-serif; font-size: 13px; line-height: 1.38; margin-bottom: 8pt; margin-top: 8pt;">
<span style="box-sizing: border-box;"><span style="background-color: transparent; box-sizing: border-box; font-family: Arial; font-weight: bold; vertical-align: baseline; white-space: pre-wrap;"><span style="color: red; font-size: medium;"><br /></span></span></span></div>
<div dir="ltr" style="background-color: white; box-sizing: border-box; font-family: "Helvetica Neue", Helvetica, Arial, sans-serif; font-size: 13px; line-height: 1.38; margin-bottom: 8pt; margin-top: 8pt;">
<span id="docs-internal-guid-9e43148d-dc1c-e718-3ac5-5c74a9820309" style="box-sizing: border-box;"><span style="background-color: transparent; box-sizing: border-box; font-family: Arial; font-weight: bold; vertical-align: baseline; white-space: pre-wrap;"><span style="color: red; font-size: medium;"><span style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial;"><a href="https://draft.blogger.com/null" id="q6" name="q6" style="background: transparent; box-sizing: border-box; color: #572f69; text-decoration: none;"></a></span>Q6: In a very huge text file, you want to just check if a particular keyword exists. How would you do this using Spark? </span></span></span></div>
<div dir="ltr" style="background-color: white; box-sizing: border-box; color: #333333; font-family: "Helvetica Neue", Helvetica, Arial, sans-serif; font-size: 13px; line-height: 1.38; margin-bottom: 8pt; margin-top: 8pt;">
<span id="docs-internal-guid-9e43148d-dc1c-e718-3ac5-5c74a9820309" style="box-sizing: border-box;"><span style="background-color: transparent; box-sizing: border-box; color: black; font-family: Arial; font-size: medium; vertical-align: baseline; white-space: pre-wrap;">Ans:</span></span></div>
<pre class="r" style="background-color: whitesmoke; border-radius: 4px; border: 1px solid rgb(204, 204, 204); box-sizing: border-box; color: #333333; font-family: Monaco, Menlo, Consolas, "Courier New", monospace; font-size: 13px; line-height: 1.42857; margin-bottom: 10px; padding: 9.5px; white-space: pre-wrap; word-break: break-all; word-wrap: break-word;"><code class="r" style="background-color: transparent; border-radius: 0px; box-sizing: border-box; color: inherit; font-family: Monaco, Menlo, Consolas, "Courier New", monospace; padding: 0px;"><span style="font-size: medium;"><span class="identifier" style="box-sizing: border-box;">lines = sc.textFile("hdfs://namenode:9000/user/kalyan/bigtextfile.txt");</span>
<span class="identifier" style="box-sizing: border-box;">def isFound(line):</span>
<span class="identifier" style="box-sizing: border-box;"> if line.find(“mykeyword”) > -1:</span>
<span class="identifier" style="box-sizing: border-box;"> return 1;</span>
<span class="identifier" style="box-sizing: border-box;"> return 0;</span>
<span class="identifier" style="box-sizing: border-box;">foundBits = lines.map(isFound);</span>
<span class="identifier" style="box-sizing: border-box;">sum = foundBits.reduce(sum);</span>
<span class="identifier" style="box-sizing: border-box;">if sum > 0:</span>
<span class="identifier" style="box-sizing: border-box;"> print “FOUND”;</span>
<span class="identifier" style="box-sizing: border-box;">else:</span>
<span class="identifier" style="box-sizing: border-box;"> print “NOT FOUND”;</span>
</span></code></pre>
<div dir="ltr" style="background-color: white; box-sizing: border-box; font-family: "Helvetica Neue", Helvetica, Arial, sans-serif; font-size: 13px; line-height: 1.38; margin-bottom: 8pt; margin-top: 8pt;">
<span style="box-sizing: border-box;"><span style="background-color: transparent; box-sizing: border-box; font-family: Arial; font-weight: bold; vertical-align: baseline; white-space: pre-wrap;"><span style="color: red;"><span style="background-color: transparent; box-sizing: border-box; font-size: medium; line-height: 1.38;"><br /></span></span></span></span></div>
<div dir="ltr" style="background-color: white; box-sizing: border-box; font-family: "Helvetica Neue", Helvetica, Arial, sans-serif; font-size: 13px; line-height: 1.38; margin-bottom: 8pt; margin-top: 8pt;">
<span id="docs-internal-guid-9e43148d-dc1c-e718-3ac5-5c74a9820309" style="box-sizing: border-box;"><span style="background-color: transparent; box-sizing: border-box; font-family: Arial; font-weight: bold; vertical-align: baseline; white-space: pre-wrap;"><span style="color: red; font-size: medium;"><span style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial;"><a href="https://draft.blogger.com/null" id="q7" name="q7" style="background: transparent; box-sizing: border-box; color: #572f69; text-decoration: none;"></a></span><span style="background-color: transparent; box-sizing: border-box; line-height: 1.38;">Q7: Can you improve the performance of this code in previous answer?</span></span></span></span></div>
<div dir="ltr" style="background-color: white; box-sizing: border-box; color: #333333; font-family: "Helvetica Neue", Helvetica, Arial, sans-serif; font-size: 13px; line-height: 1.38; margin-bottom: 8pt; margin-top: 8pt;">
<span id="docs-internal-guid-9e43148d-dc1c-e718-3ac5-5c74a9820309" style="box-sizing: border-box;"><span style="background-color: transparent; box-sizing: border-box; color: black; font-family: Arial; font-size: medium; vertical-align: baseline; white-space: pre-wrap;">Ans: Yes. </span></span></div>
<div dir="ltr" style="background-color: white; box-sizing: border-box; color: #333333; font-family: "Helvetica Neue", Helvetica, Arial, sans-serif; font-size: 13px; line-height: 1.38; margin-bottom: 8pt; margin-top: 8pt;">
<span style="box-sizing: border-box;"><span style="background-color: transparent; box-sizing: border-box; color: black; font-family: Arial; font-size: medium; vertical-align: baseline; white-space: pre-wrap;"> The search is not stopping even after the word we are looking for has been found. Our map code would keep executing on all the nodes which is very inefficient.<br style="box-sizing: border-box;" /> We could utilize accumulators to report whether the word has been found or not and then stop the job. Something on these line:</span></span></div>
<br />
<div style="background-color: white; box-sizing: border-box; color: #333333; font-family: "Helvetica Neue", Helvetica, Arial, sans-serif; font-size: 13px; line-height: 20px; margin-bottom: 10px;">
</div>
<br />
<pre class="r" style="-webkit-text-stroke-width: 0px; background-color: whitesmoke; border-radius: 4px; border: 1px solid rgb(204, 204, 204); box-sizing: border-box; color: #333333; font-family: Monaco, Menlo, Consolas, "Courier New", monospace; font-size: 13px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: normal; letter-spacing: normal; line-height: 1.42857; margin-bottom: 10px; orphans: 2; padding: 9.5px; text-align: left; text-indent: 0px; text-transform: none; white-space: pre-wrap; widows: 2; word-break: break-all; word-spacing: 0px; word-wrap: break-word;"><code class="r" style="background-color: transparent; border-radius: 0px; box-sizing: border-box; color: inherit; font-family: Monaco, Menlo, Consolas, "Courier New", monospace; padding: 0px;"><span style="font-size: medium;"><span class="identifier" style="box-sizing: border-box;">import thread, threading</span>
<span class="identifier" style="box-sizing: border-box;">from time import sleep</span>
<span class="identifier" style="box-sizing: border-box;">result = "Not Set"</span>
<span class="identifier" style="box-sizing: border-box;">lock = threading.Lock()</span>
<span class="identifier" style="box-sizing: border-box;">accum = sc.accumulator(0)</span>
<span class="identifier" style="box-sizing: border-box;">def map_func(line):</span>
<span class="identifier" style="box-sizing: border-box;"> #introduce delay to emulate the slowness</span>
<span class="identifier" style="box-sizing: border-box;"> sleep(1); </span>
<span class="identifier" style="box-sizing: border-box;"> if line.find("Adventures") > -1:</span>
<span class="identifier" style="box-sizing: border-box;"> accum.add(1);</span>
<span class="identifier" style="box-sizing: border-box;"> return 1;</span>
<span class="identifier" style="box-sizing: border-box;"> return 0;</span>
<span class="identifier" style="box-sizing: border-box;">def start_job():</span>
<span class="identifier" style="box-sizing: border-box;"> global result</span>
<span class="identifier" style="box-sizing: border-box;"> try:</span>
<span class="identifier" style="box-sizing: border-box;"> sc.setJobGroup("job_to_cancel", "some description")</span>
<span class="identifier" style="box-sizing: border-box;"> lines = sc.textFile("hdfs://namenode:9000/user/kalyan/wordcount/input/big.txt");</span>
<span class="identifier" style="box-sizing: border-box;"> result = lines.map(map_func);</span>
<span class="identifier" style="box-sizing: border-box;"> result.take(1);</span>
<span class="identifier" style="box-sizing: border-box;"> except Exception as e:</span>
<span class="identifier" style="box-sizing: border-box;"> result = "Cancelled"</span>
<span class="identifier" style="box-sizing: border-box;"> lock.release()</span>
<span class="identifier" style="box-sizing: border-box;">def stop_job():</span>
<span class="identifier" style="box-sizing: border-box;"> while accum.value < 3 :</span>
<span class="identifier" style="box-sizing: border-box;"> sleep(1);</span>
<span class="identifier" style="box-sizing: border-box;"> sc.cancelJobGroup("job_to_cancel")</span>
<span class="identifier" style="box-sizing: border-box;">supress = lock.acquire()</span>
<span class="identifier" style="box-sizing: border-box;">supress = thread.start_new_thread(start_job, tuple())</span>
<span class="identifier" style="box-sizing: border-box;">supress = thread.start_new_thread(stop_job, tuple())</span>
<span class="identifier" style="box-sizing: border-box;">supress = lock.acquire()</span>
</span></code></pre>
</div>
Anonymoushttp://www.blogger.com/profile/01685371156177399870noreply@blogger.com1tag:blogger.com,1999:blog-2182833570422175384.post-18365816113162094162016-11-27T07:30:00.000-08:002016-11-27T07:30:31.125-08:00Spark Interview Questions & Answers - Part 1<div dir="ltr" style="text-align: left;" trbidi="on">
<div dir="ltr" style="background-color: white; box-sizing: border-box; font-family: "Helvetica Neue", Helvetica, Arial, sans-serif; font-size: 13px; line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;">
<span id="docs-internal-guid-9e43148d-dc1c-e718-3ac5-5c74a9820309" style="box-sizing: border-box;"><span style="background-color: transparent; box-sizing: border-box; font-family: Arial; font-weight: bold; vertical-align: baseline; white-space: pre-wrap;"><span style="color: blue; font-size: medium;">Q1: When do you use apache spark? OR What are the benefits of Spark over Mapreduce?</span></span></span></div>
<div dir="ltr" style="background-color: white; box-sizing: border-box; color: #333333; font-family: "Helvetica Neue", Helvetica, Arial, sans-serif; font-size: 13px; line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;">
<span id="docs-internal-guid-9e43148d-dc1c-e718-3ac5-5c74a9820309" style="box-sizing: border-box;"><span style="background-color: transparent; box-sizing: border-box; color: black; font-family: Arial; font-size: medium; vertical-align: baseline; white-space: pre-wrap;">Ans:</span></span></div>
<ol style="background-color: white; box-sizing: border-box; color: #333333; font-family: "Helvetica Neue", Helvetica, Arial, sans-serif; font-size: 13px; line-height: 20px; margin-bottom: 0pt; margin-top: 0pt;">
<li dir="ltr" style="background-color: transparent; box-sizing: border-box; color: black; font-family: Arial; list-style-type: lower-alpha; margin: 0px 0px 0.25em; padding: 0px; vertical-align: baseline;"><div dir="ltr" style="box-sizing: border-box; line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;">
<span id="docs-internal-guid-9e43148d-dc1c-e718-3ac5-5c74a9820309" style="box-sizing: border-box;"><span style="background-color: transparent; box-sizing: border-box; font-size: medium; vertical-align: baseline; white-space: pre-wrap;">Spark is really fast. As per their claims, it runs programs up to 100x faster than Hadoop MapReduce in memory, or 10x faster on disk. It aptly utilizes RAM to produce the faster results.</span></span></div>
</li>
<li dir="ltr" style="background-color: transparent; box-sizing: border-box; color: black; font-family: Arial; list-style-type: lower-alpha; margin: 0px 0px 0.25em; padding: 0px; vertical-align: baseline;"><div dir="ltr" style="box-sizing: border-box; line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;">
<span id="docs-internal-guid-9e43148d-dc1c-e718-3ac5-5c74a9820309" style="box-sizing: border-box;"><span style="background-color: transparent; box-sizing: border-box; font-size: medium; vertical-align: baseline; white-space: pre-wrap;">In map reduce paradigm, you write many Map-reduce tasks and then tie these tasks together using Oozie/shell script. This mechanism is very time consuming and the map-reduce task have heavy latency.</span></span></div>
</li>
<li dir="ltr" style="background-color: transparent; box-sizing: border-box; color: black; font-family: Arial; list-style-type: lower-alpha; margin: 0px 0px 0.25em; padding: 0px; vertical-align: baseline;"><div dir="ltr" style="box-sizing: border-box; line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;">
<span id="docs-internal-guid-9e43148d-dc1c-e718-3ac5-5c74a9820309" style="box-sizing: border-box;"><span style="background-color: transparent; box-sizing: border-box; font-size: medium; vertical-align: baseline; white-space: pre-wrap;">And quite often, translating the output out of one MR job into the input of another MR job might require writing another code because Oozie may not suffice.</span></span></div>
</li>
<li dir="ltr" style="background-color: transparent; box-sizing: border-box; color: black; font-family: Arial; list-style-type: lower-alpha; margin: 0px 0px 0.25em; padding: 0px; vertical-align: baseline;"><div dir="ltr" style="box-sizing: border-box; line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;">
<span id="docs-internal-guid-9e43148d-dc1c-e718-3ac5-5c74a9820309" style="box-sizing: border-box;"><span style="background-color: transparent; box-sizing: border-box; font-size: medium; vertical-align: baseline; white-space: pre-wrap;">In Spark, you can basically do everything using single application / console (pyspark or scala console) and get the results immediately. Switching between 'Running something on cluster' and 'doing something locally' is fairly easy and straightforward. This also leads to less context switch of the developer and more productivity.</span></span></div>
</li>
<li dir="ltr" style="background-color: transparent; box-sizing: border-box; color: black; font-family: Arial; list-style-type: lower-alpha; margin: 0px 0px 0.25em; padding: 0px; vertical-align: baseline;"><div dir="ltr" style="box-sizing: border-box; line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;">
<span id="docs-internal-guid-9e43148d-dc1c-e718-3ac5-5c74a9820309" style="box-sizing: border-box;"><span style="background-color: transparent; box-sizing: border-box; font-size: medium; vertical-align: baseline; white-space: pre-wrap;">Spark kind of equals to MapReduce and Oozie put together.</span></span></div>
</li>
</ol>
<div style="background-color: white; box-sizing: border-box; color: #333333; font-family: "Helvetica Neue", Helvetica, Arial, sans-serif; font-size: 13px; line-height: 20px; margin-bottom: 10px;">
<span style="font-size: medium;"><br /></span></div>
<div dir="ltr" style="background-color: white; box-sizing: border-box; font-family: "Helvetica Neue", Helvetica, Arial, sans-serif; font-size: 13px; line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;">
<span id="docs-internal-guid-9e43148d-dc1c-e718-3ac5-5c74a9820309" style="box-sizing: border-box;"><span style="background-color: transparent; box-sizing: border-box; font-family: Arial; font-weight: bold; vertical-align: baseline; white-space: pre-wrap;"><span style="color: blue; font-size: medium;">Q2: Is there are point of learning Mapreduce, then?</span></span></span></div>
<div dir="ltr" style="background-color: white; box-sizing: border-box; color: #333333; font-family: "Helvetica Neue", Helvetica, Arial, sans-serif; font-size: 13px; line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;">
<span style="font-size: medium;"><span id="docs-internal-guid-9e43148d-dc1c-e718-3ac5-5c74a9820309" style="box-sizing: border-box;"><span style="background-color: transparent; box-sizing: border-box; color: black; font-family: Arial; vertical-align: baseline; white-space: pre-wrap;">Ans: Yes. For the following reason:</span></span><span style="line-height: 20px;"> </span></span></div>
<ol style="background-color: white; box-sizing: border-box; color: #333333; font-family: "Helvetica Neue", Helvetica, Arial, sans-serif; font-size: 13px; line-height: 20px; margin-bottom: 0pt; margin-top: 0pt;">
<li dir="ltr" style="background-color: transparent; box-sizing: border-box; color: black; font-family: Arial; list-style-type: decimal; margin: 0px 0px 0.25em; padding: 0px; vertical-align: baseline;"><div dir="ltr" style="box-sizing: border-box; line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;">
<span id="docs-internal-guid-9e43148d-dc1c-e718-3ac5-5c74a9820309" style="box-sizing: border-box;"><span style="background-color: transparent; box-sizing: border-box; font-size: medium; vertical-align: baseline; white-space: pre-wrap;">Mapreduce is a paradigm used by many big data tools including Spark. So, understanding the MapReduce paradigm and how to convert a problem into series of MR tasks is very important.</span></span></div>
</li>
<li dir="ltr" style="background-color: transparent; box-sizing: border-box; color: black; font-family: Arial; list-style-type: decimal; margin: 0px 0px 0.25em; padding: 0px; vertical-align: baseline;"><div dir="ltr" style="box-sizing: border-box; line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;">
<span id="docs-internal-guid-9e43148d-dc1c-e718-3ac5-5c74a9820309" style="box-sizing: border-box;"><span style="background-color: transparent; box-sizing: border-box; font-size: medium; vertical-align: baseline; white-space: pre-wrap;">When the data grows beyond what can fit into the memory on your cluster, the Hadoop Map-Reduce paradigm is still very relevant.</span></span></div>
</li>
<li dir="ltr" style="background-color: transparent; box-sizing: border-box; color: black; font-family: Arial; list-style-type: decimal; margin: 0px 0px 0.25em; padding: 0px; vertical-align: baseline;"><div dir="ltr" style="box-sizing: border-box; line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;">
<span id="docs-internal-guid-9e43148d-dc1c-e718-3ac5-5c74a9820309" style="box-sizing: border-box;"><span style="background-color: transparent; box-sizing: border-box; font-size: medium; vertical-align: baseline; white-space: pre-wrap;">Almost, every other tool such as Hive or Pig converts its query into MapReduce phases. If you understand the Mapreduce then you will be able to optimize your queries better.</span></span></div>
</li>
</ol>
<div style="background-color: white; box-sizing: border-box; color: #333333; font-family: "Helvetica Neue", Helvetica, Arial, sans-serif; font-size: 13px; line-height: 20px; margin-bottom: 10px;">
<span style="font-size: medium;"><br /></span></div>
<div dir="ltr" style="background-color: white; box-sizing: border-box; font-family: "Helvetica Neue", Helvetica, Arial, sans-serif; font-size: 13px; line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;">
<span id="docs-internal-guid-9e43148d-dc1c-e718-3ac5-5c74a9820309" style="box-sizing: border-box;"><span style="background-color: transparent; box-sizing: border-box; font-family: Arial; font-weight: bold; vertical-align: baseline; white-space: pre-wrap;"><span style="color: blue; font-size: medium;">Q3: When running Spark on Yarn, do I need to install Spark on all nodes of Yarn Cluster?</span></span></span></div>
<div dir="ltr" style="background-color: white; box-sizing: border-box; color: #333333; font-family: "Helvetica Neue", Helvetica, Arial, sans-serif; font-size: 13px; line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;">
<span id="docs-internal-guid-9e43148d-dc1c-e718-3ac5-5c74a9820309" style="box-sizing: border-box;"><span style="background-color: transparent; box-sizing: border-box; color: black; font-family: Arial; font-size: medium; vertical-align: baseline; white-space: pre-wrap;">Ans: </span></span></div>
<div style="background-color: white; box-sizing: border-box; color: #333333; font-family: "Helvetica Neue", Helvetica, Arial, sans-serif; font-size: 13px; line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;">
<span id="docs-internal-guid-9e43148d-dc1c-e718-3ac5-5c74a9820309" style="box-sizing: border-box;"><span style="background-color: transparent; box-sizing: border-box; color: black; font-family: Arial; font-size: medium; vertical-align: baseline; white-space: pre-wrap;">Since spark runs on top of Yarn, it utilizes yarn for the execution of its commands over the cluster's nodes.</span></span></div>
<div style="background-color: white; box-sizing: border-box; color: #333333; font-family: "Helvetica Neue", Helvetica, Arial, sans-serif; font-size: 13px; line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;">
<span id="docs-internal-guid-9e43148d-dc1c-e718-3ac5-5c74a9820309" style="box-sizing: border-box;"><span style="background-color: transparent; box-sizing: border-box; color: black; font-family: Arial; font-size: medium; vertical-align: baseline; white-space: pre-wrap;">So, you just have to install Spark on one node.</span></span></div>
<div style="background-color: white; box-sizing: border-box; color: #333333; font-family: "Helvetica Neue", Helvetica, Arial, sans-serif; font-size: 13px; line-height: 20px; margin-bottom: 10px;">
<span style="font-size: medium;"><br /></span></div>
<div dir="ltr" style="background-color: white; box-sizing: border-box; font-family: "Helvetica Neue", Helvetica, Arial, sans-serif; font-size: 13px; line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;">
<span id="docs-internal-guid-9e43148d-dc1c-e718-3ac5-5c74a9820309" style="box-sizing: border-box;"><span style="background-color: transparent; box-sizing: border-box; font-family: Arial; font-weight: bold; vertical-align: baseline; white-space: pre-wrap;"><span style="color: blue; font-size: medium;">Q4: What are the downsides of Spark?</span></span></span></div>
<div dir="ltr" style="background-color: white; box-sizing: border-box; color: #333333; font-family: "Helvetica Neue", Helvetica, Arial, sans-serif; font-size: 13px; line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;">
<span id="docs-internal-guid-9e43148d-dc1c-e718-3ac5-5c74a9820309" style="box-sizing: border-box;"><span style="background-color: transparent; box-sizing: border-box; color: black; font-family: Arial; font-size: medium; vertical-align: baseline; white-space: pre-wrap;">Ans: </span></span></div>
<div dir="ltr" style="background-color: white; box-sizing: border-box; color: #333333; font-family: "Helvetica Neue", Helvetica, Arial, sans-serif; font-size: 13px; line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;">
<span style="font-size: medium;"><span style="background-color: transparent; box-sizing: border-box; color: black; font-family: Arial; line-height: 1.38; white-space: pre-wrap;">Spark utilizes the memory. </span><span style="background-color: transparent; box-sizing: border-box; color: black; font-family: Arial; line-height: 1.38; white-space: pre-wrap;">The developer has to be careful. A casual developer might make following mistakes:</span></span></div>
<ol style="background-color: white; box-sizing: border-box; color: #333333; font-family: "Helvetica Neue", Helvetica, Arial, sans-serif; font-size: 13px; line-height: 20px; margin-bottom: 0pt; margin-top: 0pt;">
<li dir="ltr" style="background-color: transparent; box-sizing: border-box; color: black; font-family: Arial; list-style-type: decimal; margin: 0px 0px 0.25em; padding: 0px; vertical-align: baseline;"><div dir="ltr" style="box-sizing: border-box; line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;">
<span id="docs-internal-guid-9e43148d-dc1c-e718-3ac5-5c74a9820309" style="box-sizing: border-box;"><span style="background-color: transparent; box-sizing: border-box; font-size: medium; vertical-align: baseline; white-space: pre-wrap;">She may end up running everything on the local node instead of distributing work over to the cluster. </span></span></div>
</li>
<li dir="ltr" style="background-color: transparent; box-sizing: border-box; color: black; font-family: Arial; list-style-type: decimal; margin: 0px 0px 0.25em; padding: 0px; vertical-align: baseline;"><div dir="ltr" style="box-sizing: border-box; line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;">
<span id="docs-internal-guid-9e43148d-dc1c-e718-3ac5-5c74a9820309" style="box-sizing: border-box;"><span style="background-color: transparent; box-sizing: border-box; font-size: medium; vertical-align: baseline; white-space: pre-wrap;">She might hit some webservice too many times by the way of using multiple clusters.</span></span></div>
</li>
</ol>
<div dir="ltr" style="background-color: white; box-sizing: border-box; color: #333333; font-family: "Helvetica Neue", Helvetica, Arial, sans-serif; font-size: 13px; line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;">
<span style="font-size: medium;"><br /></span></div>
<div dir="ltr" style="background-color: white; box-sizing: border-box; color: #333333; font-family: "Helvetica Neue", Helvetica, Arial, sans-serif; font-size: 13px; line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;">
<span id="docs-internal-guid-9e43148d-dc1c-e718-3ac5-5c74a9820309" style="box-sizing: border-box;"><span style="background-color: transparent; box-sizing: border-box; color: black; font-family: Arial; font-size: medium; vertical-align: baseline; white-space: pre-wrap;">The first problem is well tackled by Hadoop Map reduce paradigm as it ensures that the data your code is churning is fairly small a point of time thus you can make a mistake of trying to handle whole data on a single node. </span></span></div>
<div dir="ltr" style="background-color: white; box-sizing: border-box; color: #333333; font-family: "Helvetica Neue", Helvetica, Arial, sans-serif; font-size: 13px; line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;">
<span id="docs-internal-guid-9e43148d-dc1c-e718-3ac5-5c74a9820309" style="box-sizing: border-box;"><span style="background-color: transparent; box-sizing: border-box; color: black; font-family: Arial; font-size: medium; vertical-align: baseline; white-space: pre-wrap;">The second mistake is possible in Map-Reduce too. While writing Map-Reduce, user may hit a service from inside of map() or reduce() too many times. This overloading of service is also possible while using Spark.</span></span></div>
<div style="background-color: white; box-sizing: border-box; color: #333333; font-family: "Helvetica Neue", Helvetica, Arial, sans-serif; font-size: 13px; line-height: 20px; margin-bottom: 10px;">
<span style="font-size: medium;"><br /></span></div>
<div dir="ltr" style="background-color: white; box-sizing: border-box; font-family: "Helvetica Neue", Helvetica, Arial, sans-serif; font-size: 13px; line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;">
<span id="docs-internal-guid-9e43148d-dc1c-e718-3ac5-5c74a9820309" style="box-sizing: border-box;"><span style="background-color: transparent; box-sizing: border-box; font-family: Arial; font-weight: bold; vertical-align: baseline; white-space: pre-wrap;"><span style="color: blue; font-size: medium;">Q5: What is a RDD?</span></span></span></div>
<div dir="ltr" style="background-color: white; box-sizing: border-box; color: #333333; font-family: "Helvetica Neue", Helvetica, Arial, sans-serif; font-size: 13px; line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;">
<span id="docs-internal-guid-9e43148d-dc1c-e718-3ac5-5c74a9820309" style="box-sizing: border-box;"><span style="background-color: transparent; box-sizing: border-box; color: black; font-family: Arial; font-size: medium; vertical-align: baseline; white-space: pre-wrap;">Ans:</span></span></div>
<div dir="ltr" style="background-color: white; box-sizing: border-box; color: #333333; font-family: "Helvetica Neue", Helvetica, Arial, sans-serif; font-size: 13px; line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;">
<span id="docs-internal-guid-9e43148d-dc1c-e718-3ac5-5c74a9820309" style="box-sizing: border-box;"><span style="background-color: transparent; box-sizing: border-box; color: black; font-family: Arial; font-size: medium; vertical-align: baseline; white-space: pre-wrap;">The full form of RDD is resilience distributed dataset. It is a representation of data located on a network which is </span></span></div>
<ol style="background-color: white; box-sizing: border-box; color: #333333; font-family: "Helvetica Neue", Helvetica, Arial, sans-serif; font-size: 13px; line-height: 20px; margin-bottom: 0pt; margin-top: 0pt;">
<li dir="ltr" style="background-color: transparent; box-sizing: border-box; color: black; font-family: Arial; list-style-type: decimal; margin: 0px 0px 0.25em; padding: 0px; vertical-align: baseline;"><div dir="ltr" style="box-sizing: border-box; line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;">
<span id="docs-internal-guid-9e43148d-dc1c-e718-3ac5-5c74a9820309" style="box-sizing: border-box;"><span style="background-color: transparent; box-sizing: border-box; font-size: medium; vertical-align: baseline; white-space: pre-wrap;">Immutable - You can operate on the rdd to produce another rdd but you can’t alter it.</span></span></div>
</li>
<li dir="ltr" style="background-color: transparent; box-sizing: border-box; color: black; font-family: Arial; list-style-type: decimal; margin: 0px 0px 0.25em; padding: 0px; vertical-align: baseline;"><div dir="ltr" style="box-sizing: border-box; line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;">
<span id="docs-internal-guid-9e43148d-dc1c-e718-3ac5-5c74a9820309" style="box-sizing: border-box;"><span style="background-color: transparent; box-sizing: border-box; font-size: medium; vertical-align: baseline; white-space: pre-wrap;">Partitioned / Parallel - The data located on RDD is operated in parallel. Any operation on RDD is done using multiple nodes.</span></span></div>
</li>
<li dir="ltr" style="background-color: transparent; box-sizing: border-box; color: black; font-family: Arial; list-style-type: decimal; margin: 0px 0px 0.25em; padding: 0px; vertical-align: baseline;"><div dir="ltr" style="box-sizing: border-box; line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;">
<span id="docs-internal-guid-9e43148d-dc1c-e718-3ac5-5c74a9820309" style="box-sizing: border-box;"><span style="background-color: transparent; box-sizing: border-box; font-size: medium; vertical-align: baseline; white-space: pre-wrap;">Resilience - If one of the node hosting the partition fails, another nodes takes its data.</span></span></div>
</li>
</ol>
<div style="background-color: white; box-sizing: border-box; color: #333333; font-family: "Helvetica Neue", Helvetica, Arial, sans-serif; font-size: 13px; line-height: 20px; margin-bottom: 10px;">
<span style="font-size: medium;"><br /></span></div>
<div dir="ltr" style="background-color: white; box-sizing: border-box; color: #333333; font-family: "Helvetica Neue", Helvetica, Arial, sans-serif; font-size: 13px; line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;">
<span id="docs-internal-guid-9e43148d-dc1c-e718-3ac5-5c74a9820309" style="box-sizing: border-box;"><span style="background-color: transparent; box-sizing: border-box; color: black; font-family: Arial; font-size: medium; vertical-align: baseline; white-space: pre-wrap;">RDD provides two kinds of operations: Transformations and Actions.</span></span></div>
<div style="background-color: white; box-sizing: border-box; color: #333333; font-family: "Helvetica Neue", Helvetica, Arial, sans-serif; font-size: 13px; line-height: 20px; margin-bottom: 10px;">
<span style="font-size: medium;"><br /></span></div>
<div dir="ltr" style="background-color: white; box-sizing: border-box; font-family: "Helvetica Neue", Helvetica, Arial, sans-serif; font-size: 13px; line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;">
<span id="docs-internal-guid-9e43148d-dc1c-e718-3ac5-5c74a9820309" style="box-sizing: border-box;"><span style="background-color: transparent; box-sizing: border-box; font-family: Arial; font-weight: bold; vertical-align: baseline; white-space: pre-wrap;"><span style="color: blue; font-size: medium;">Q6: What is Transformations?</span></span></span></div>
<div dir="ltr" style="background-color: white; box-sizing: border-box; color: #333333; font-family: "Helvetica Neue", Helvetica, Arial, sans-serif; font-size: 13px; line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;">
<span id="docs-internal-guid-9e43148d-dc1c-e718-3ac5-5c74a9820309" style="box-sizing: border-box;"><span style="background-color: transparent; box-sizing: border-box; color: black; font-family: Arial; font-size: medium; vertical-align: baseline; white-space: pre-wrap;">Ans: The transformations are the functions that are applied on an RDD (resilient distributed data set). The transformation results in another RDD. A transformation is not executed until an action follows.</span></span></div>
<div style="background-color: white; box-sizing: border-box; color: #333333; font-family: "Helvetica Neue", Helvetica, Arial, sans-serif; font-size: 13px; line-height: 20px; margin-bottom: 10px;">
<span style="font-size: medium;"><br /></span></div>
<div style="background-color: white; box-sizing: border-box; color: #333333; font-family: "Helvetica Neue", Helvetica, Arial, sans-serif; font-size: 13px; line-height: 20px; margin-bottom: 10px;">
<span style="background-color: transparent; color: black; font-family: Arial; font-size: medium; line-height: 1.38; white-space: pre-wrap;">The example of transformations are:</span></div>
<ol style="background-color: white; box-sizing: border-box; color: #333333; font-family: "Helvetica Neue", Helvetica, Arial, sans-serif; font-size: 13px; line-height: 20px; margin-bottom: 0pt; margin-top: 0pt;">
<li dir="ltr" style="background-color: transparent; box-sizing: border-box; color: black; font-family: Arial; list-style-type: decimal; margin: 0px 0px 0.25em; padding: 0px; vertical-align: baseline;"><div dir="ltr" style="box-sizing: border-box; line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;">
<span id="docs-internal-guid-9e43148d-dc1c-e718-3ac5-5c74a9820309" style="box-sizing: border-box;"><span style="background-color: transparent; box-sizing: border-box; font-size: medium; vertical-align: baseline; white-space: pre-wrap;">map() - applies the function passed to it on each element of RDD resulting in a new RDD.</span></span></div>
</li>
<li dir="ltr" style="background-color: transparent; box-sizing: border-box; color: black; font-family: Arial; list-style-type: decimal; margin: 0px 0px 0.25em; padding: 0px; vertical-align: baseline;"><div dir="ltr" style="box-sizing: border-box; line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;">
<span id="docs-internal-guid-9e43148d-dc1c-e718-3ac5-5c74a9820309" style="box-sizing: border-box;"><span style="background-color: transparent; box-sizing: border-box; font-size: medium; vertical-align: baseline; white-space: pre-wrap;">filter() - creates a new RDD by picking the elements from the current RDD which pass the function argument.</span></span></div>
</li>
</ol>
<div style="background-color: white; box-sizing: border-box; color: #333333; font-family: "Helvetica Neue", Helvetica, Arial, sans-serif; font-size: 13px; line-height: 20px; margin-bottom: 10px;">
<span style="font-size: medium;"><br /></span></div>
<div dir="ltr" style="background-color: white; box-sizing: border-box; font-family: "Helvetica Neue", Helvetica, Arial, sans-serif; font-size: 13px; line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;">
<span id="docs-internal-guid-9e43148d-dc1c-e718-3ac5-5c74a9820309" style="box-sizing: border-box;"><span style="background-color: transparent; box-sizing: border-box; font-family: Arial; font-weight: bold; vertical-align: baseline; white-space: pre-wrap;"><span style="color: blue; font-size: medium;">Q7: What are Actions?</span></span></span></div>
<div dir="ltr" style="background-color: white; box-sizing: border-box; color: #333333; font-family: "Helvetica Neue", Helvetica, Arial, sans-serif; font-size: 13px; line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;">
<span id="docs-internal-guid-9e43148d-dc1c-e718-3ac5-5c74a9820309" style="box-sizing: border-box;"><span style="background-color: transparent; box-sizing: border-box; color: black; font-family: Arial; font-size: medium; vertical-align: baseline; white-space: pre-wrap;">Ans: </span></span></div>
<div dir="ltr" style="background-color: white; box-sizing: border-box; color: #333333; font-family: "Helvetica Neue", Helvetica, Arial, sans-serif; font-size: 13px; line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;">
<span style="box-sizing: border-box;"><span style="background-color: transparent; box-sizing: border-box; color: black; font-family: Arial; font-size: medium; vertical-align: baseline; white-space: pre-wrap;">An action brings back the data from the RDD to the local machine. Execution of an action results in all the previously created transformation. The example of actions are:</span></span></div>
<ol style="background-color: white; box-sizing: border-box; color: #333333; font-family: "Helvetica Neue", Helvetica, Arial, sans-serif; font-size: 13px; line-height: 20px; margin-bottom: 0pt; margin-top: 0pt;">
<li dir="ltr" style="background-color: transparent; box-sizing: border-box; color: black; font-family: Arial; list-style-type: decimal; margin: 0px 0px 0.25em; padding: 0px; vertical-align: baseline;"><div dir="ltr" style="box-sizing: border-box; line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;">
<span id="docs-internal-guid-9e43148d-dc1c-e718-3ac5-5c74a9820309" style="box-sizing: border-box;"><span style="background-color: transparent; box-sizing: border-box; font-size: medium; vertical-align: baseline; white-space: pre-wrap;">reduce() - executes the function passed again and again until only one value is left. The function should take two argument and return one value.</span></span></div>
</li>
<li dir="ltr" style="background-color: transparent; box-sizing: border-box; color: black; font-family: Arial; list-style-type: decimal; margin: 0px 0px 0.25em; padding: 0px; vertical-align: baseline;"><span id="docs-internal-guid-9e43148d-dc1c-e718-3ac5-5c74a9820309" style="box-sizing: border-box;"><span style="background-color: transparent; box-sizing: border-box; font-size: medium; vertical-align: baseline; white-space: pre-wrap;">take() - take all the values back to the local node form RDD.</span></span></li>
</ol>
<div>
<span style="font-family: Arial; font-size: medium;"><span style="white-space: pre-wrap;"><br /></span></span></div>
<div>
<span style="font-family: Arial; font-size: medium;"><span style="white-space: pre-wrap;"><br /></span></span></div>
<div>
<span style="font-family: Arial; font-size: medium;"><span style="white-space: pre-wrap;"><br /></span></span></div>
<div>
<span style="font-family: Arial; font-size: medium;"><span style="white-space: pre-wrap;"><br /></span></span></div>
<div>
<span style="font-family: Arial; font-size: medium;"><span style="white-space: pre-wrap;"><br /></span></span></div>
</div>
Anonymoushttp://www.blogger.com/profile/01685371156177399870noreply@blogger.com3tag:blogger.com,1999:blog-2182833570422175384.post-60808772208026832572016-11-27T06:53:00.001-08:002016-11-27T06:53:45.389-08:00SPARK BASICS Practice on 27 Nov 2016<div dir="ltr" style="text-align: left;" trbidi="on">
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgkTInX2NSaM7gXXE1efcnfL6n5TdBLbq17_-2IYmSXwkBVlSuE1KSeH_cREnJVF068MG1bjbEFlSdVhHb2x_0sqHJ5fEd-sOCkn3YFgjl66Lv7v0DwTGOup6t2i7goYeSu0bjno_OIOje2/s1600/Spark+Basics.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="358" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgkTInX2NSaM7gXXE1efcnfL6n5TdBLbq17_-2IYmSXwkBVlSuE1KSeH_cREnJVF068MG1bjbEFlSdVhHb2x_0sqHJ5fEd-sOCkn3YFgjl66Lv7v0DwTGOup6t2i7goYeSu0bjno_OIOje2/s640/Spark+Basics.png" width="640" /></a></div>
<span style="font-size: large;"><br /></span>
<br />
<span style="font-size: large;">---------------------------------------------------------------</span><br />
<span class="Apple-tab-span" style="font-size: large; white-space: pre;"> </span><span style="font-size: large;">Spark Basics</span><br />
<span style="font-size: large;">---------------------------------------------------------------</span><br />
<div>
<span style="font-size: large;"><br /></span></div>
<span style="font-size: large;">1.SparkContext(sc) is the main entry point for spark operaions</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">2. From spark context, we will create a RDD</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">3. Create RDD in 2 ways</span><br />
<span style="font-size: large;"> -> from collections (List, Seq, Set ,....)</span><br />
<span style="font-size: large;"> -> from datasets (csv, tsv, text, json, hive, hdfs, cassandra....)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">---------------------------------------------------------------</span><br />
<span style="font-size: large;">Start the spark shell using 3 approaches:</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">1. scala command <span class="Apple-tab-span" style="white-space: pre;"> </span>=> spark-shell</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">2. python command <span class="Apple-tab-span" style="white-space: pre;"> </span>=> pyspark</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">3. R command <span class="Apple-tab-span" style="white-space: pre;"> </span>=> sparkR</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">Spark-1.6.x is compatable with Scala-2.10 & Scala-2.11</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">Spark-2.0.x is compatable with Scala-2.11</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">Spark context available as sc.</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">SQL context available as sqlContext.</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">------------------------------------------------------------------</span><br />
<span style="font-size: large;">RDD => Resilient Distributed DataSets</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">RDD features:</span><br />
<span style="font-size: large;">--------------</span><br />
<span style="font-size: large;">1. Immutable</span><br />
<span style="font-size: large;">2. Lazy Evaluated (bottom to top approach)</span><br />
<span style="font-size: large;">3. Cacheable</span><br />
<span style="font-size: large;">4. Type Infer</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">RDD Operations:</span><br />
<span style="font-size: large;">--------------------</span><br />
<span style="font-size: large;">1. Transformations</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">old rdd ====> new rdd</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">2. Actions</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">rdd ====> result</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">---------------------------------------------</span><br />
<span style="font-size: large;">Examples in RDD:</span><br />
<span style="font-size: large;">--------------------</span><br />
<span style="font-size: large;">list <-- {1, 2, 3, 4}</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">Ex1: Transformations</span><br />
<span style="font-size: large;">-------------------------</span><br />
<span style="font-size: large;">f(x) => x + 1</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">f(list) <-- {2, 3, 4, 5}</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">f(x) => x * x</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">f(list) <-- {1, 4, 9, 16}</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">Ex2: Actions</span><br />
<span style="font-size: large;">-------------------------</span><br />
<span style="font-size: large;">min(list) <-- 1</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">max(list) <-- 4</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">sum(list) <-- 10</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">avg(list) <-- 2.5</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">---------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">1.SparkContext(sc) is the main entry point for spark operaions</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">2. From spark context, we will create a RDD</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">3. Create RDD in 2 ways</span><br />
<span style="font-size: large;"> -> from collections (List, Seq, Set ,....)</span><br />
<span style="font-size: large;"> -> from datasets (csv, tsv, text, json, hive, hdfs, cassandra....)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">---------------------------------------------</span><br />
<span style="font-size: large;">Start the spark shell using 3 approaches:</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">1. scala command <span class="Apple-tab-span" style="white-space: pre;"> </span>=> spark-shell</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">2. python command <span class="Apple-tab-span" style="white-space: pre;"> </span>=> pyspark</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">3. R command <span class="Apple-tab-span" style="white-space: pre;"> </span>=> sparkR</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">Spark-1.6.x is compatable with Scala-2.10 & Scala-2.11</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">Spark-2.0.x is compatable with Scala-2.11</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">Spark context available as sc.</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">SQL context available as sqlContext.</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">---------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">val list = List(1,2,3,4)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">val rdd = sc.parallelize(list)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">---------------------------------------------</span><br />
<span style="font-size: large;">scala> val list = List(1,2,3,4)</span><br />
<span style="font-size: large;">list: List[Int] = List(1, 2, 3, 4)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val rdd = sc.parallelize(list)</span><br />
<span style="font-size: large;">rdd: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[0] at parallelize at <console>:29</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> rdd.getNumPartitions</span><br />
<span style="font-size: large;">res0: Int = 4</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val rdd = sc.parallelize(list, 2)</span><br />
<span style="font-size: large;">rdd: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[1] at parallelize at <console>:29</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> rdd.getNumPartitions</span><br />
<span style="font-size: large;">res1: Int = 2</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">---------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> rdd.collect</span><br />
<span style="font-size: large;">res3: Array[Int] = Array(1, 2, 3, 4)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> rdd.map(x => x + 1).collect</span><br />
<span style="font-size: large;">res4: Array[Int] = Array(2, 3, 4, 5)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> rdd.map(x => x + 1)</span><br />
<span style="font-size: large;">res5: org.apache.spark.rdd.RDD[Int] = MapPartitionsRDD[4] at map at <console>:32</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> rdd.map(x => x + 1).collect</span><br />
<span style="font-size: large;">res6: Array[Int] = Array(2, 3, 4, 5)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> rdd.map(x => x * x).collect</span><br />
<span style="font-size: large;">res7: Array[Int] = Array(1, 4, 9, 16)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> rdd.min</span><br />
<span style="font-size: large;">res8: Int = 1</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> rdd.max</span><br />
<span style="font-size: large;">res9: Int = 4</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> rdd.sum</span><br />
<span style="font-size: large;">res10: Double = 10.0</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">---------------------------------------------</span><br />
<span style="font-size: large;">1. create a rdd with 1 to 10 numbers</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">val rdd = sc.parallelize(1 to 10)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">2. add number 1 to each element</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">val addRdd = rdd.map(x => x + 1)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">3. filter even number in above data</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">val filterRdd = addRdd.filter(x => x % 2 == 1)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">4. sum of the all numbers in above data</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">val sum = filterRdd.sum</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">5. print all the numbers in above data</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">filterRdd.foreach(x => println(x))</span><br />
<span style="font-size: large;">filterRdd.foreach(println)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">6. save this data into local file system / hdfs</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">val output = "file:///home/orienit/work/output/data"</span><br />
<span style="font-size: large;">val output = "hdfs://localhost:8020/output/data"</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">filterRdd.saveAsTextFile(output)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">7. repartition the data</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">val repartitionRdd = filterRdd.repartition(1)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">8. save this data into local file system / hdfs</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">val output = "file:///home/orienit/work/output/data1"</span><br />
<span style="font-size: large;">val output = "hdfs://localhost:8020/output/data1"</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">repartitionRdd.saveAsTextFile(output)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">---------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val rdd = sc.parallelize(1 to 10)</span><br />
<span style="font-size: large;">rdd: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[8] at parallelize at <console>:27</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> rdd.collect</span><br />
<span style="font-size: large;">res11: Array[Int] = Array(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val addRdd = rdd.map(x => x + 1)</span><br />
<span style="font-size: large;">addRdd: org.apache.spark.rdd.RDD[Int] = MapPartitionsRDD[9] at map at <console>:29</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> addRdd.collect</span><br />
<span style="font-size: large;">res12: Array[Int] = Array(2, 3, 4, 5, 6, 7, 8, 9, 10, 11)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val filterRdd = addRdd.filter(x => x % 2 == 1)</span><br />
<span style="font-size: large;">filterRdd: org.apache.spark.rdd.RDD[Int] = MapPartitionsRDD[10] at filter at <console>:31</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> filterRdd.collect</span><br />
<span style="font-size: large;">res13: Array[Int] = Array(3, 5, 7, 9, 11)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val sum = filterRdd.sum</span><br />
<span style="font-size: large;">sum: Double = 35.0</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> filterRdd.foreach(x => println(x))</span><br />
<span style="font-size: large;">3</span><br />
<span style="font-size: large;">5</span><br />
<span style="font-size: large;">9</span><br />
<span style="font-size: large;">11</span><br />
<span style="font-size: large;">7</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> filterRdd.foreach(println)</span><br />
<span style="font-size: large;">7</span><br />
<span style="font-size: large;">3</span><br />
<span style="font-size: large;">5</span><br />
<span style="font-size: large;">9</span><br />
<span style="font-size: large;">11</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val output = "file:///home/orienit/work/output/data"</span><br />
<span style="font-size: large;">output: String = file:///home/orienit/work/output/data</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> filterRdd.saveAsTextFile(output)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val repartitionRdd = filterRdd.repartition(1)</span><br />
<span style="font-size: large;">repartitionRdd: org.apache.spark.rdd.RDD[Int] = MapPartitionsRDD[16] at repartition at <console>:33</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val output = "file:///home/orienit/work/output/data1"</span><br />
<span style="font-size: large;">output: String = file:///home/orienit/work/output/data1</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> repartitionRdd.saveAsTextFile(output)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val output = "hdfs://localhost:8020/output/data1"</span><br />
<span style="font-size: large;">output: String = hdfs://localhost:8020/output/data1</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> repartitionRdd.saveAsTextFile(output)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">---------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">Creating a RDD from DataSets:</span><br />
<span style="font-size: large;">--------------------------------</span><br />
<span style="font-size: large;">val input = "file:///home/orienit/work/input/demoinput"</span><br />
<span style="font-size: large;">val rdd = sc.textFile(input)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val input = "file:///home/orienit/work/input/demoinput"</span><br />
<span style="font-size: large;">input: String = file:///home/orienit/work/input/demoinput</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val rdd = sc.textFile(input)</span><br />
<span style="font-size: large;">rdd: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[21] at textFile at <console>:29</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> rdd.getNumPartitions</span><br />
<span style="font-size: large;">res20: Int = 2</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val rdd = sc.textFile(input, 1)</span><br />
<span style="font-size: large;">rdd: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[23] at textFile at <console>:29</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> rdd.getNumPartitions</span><br />
<span style="font-size: large;">res21: Int = 1</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> rdd.collect</span><br />
<span style="font-size: large;">res22: Array[String] = Array(I am going, to hyd, I am learning, hadoop course)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> rdd.collect.foreach(println)</span><br />
<span style="font-size: large;">I am going</span><br />
<span style="font-size: large;">to hyd</span><br />
<span style="font-size: large;">I am learning</span><br />
<span style="font-size: large;">hadoop course</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">---------------------------------------------</span><br />
<span style="font-size: large;">WordCount in Spark using Scala:</span><br />
<span style="font-size: large;">-------------------------------------</span><br />
<span style="font-size: large;">// input path</span><br />
<span style="font-size: large;">val input = "file:///home/orienit/work/input/demoinput"</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">// creating a rdd from input</span><br />
<span style="font-size: large;">val fileRdd = sc.textFile(input, 1)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">// read the file line by line</span><br />
<span style="font-size: large;">// split the line into words</span><br />
<span style="font-size: large;">val words = fileRdd.flatMap(line => line.split(" "))</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">// assign count(1) to each word</span><br />
<span style="font-size: large;">val counts = words.map(word => (word, 1))</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">// sum the list of values</span><br />
<span style="font-size: large;">val wordcount = counts.reduceByKey((a,b) => a + b)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">// save the output into local file system / hdfs</span><br />
<span style="font-size: large;">val output = "file:///home/orienit/work/output/wordcount"</span><br />
<span style="font-size: large;">val output = "hdfs://localhost:8020/output/wordcount"</span><br />
<span style="font-size: large;">wordcount.saveAsTextFile(output)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val input = "file:///home/orienit/work/input/demoinput"</span><br />
<span style="font-size: large;">input: String = file:///home/orienit/work/input/demoinput</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val fileRdd = sc.textFile(input, 1)</span><br />
<span style="font-size: large;">fileRdd: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[25] at textFile at <console>:29</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val words = fileRdd.flatMap(line => line.split(" "))</span><br />
<span style="font-size: large;">words: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[26] at flatMap at <console>:31</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val counts = words.map(word => (word, 1))</span><br />
<span style="font-size: large;">counts: org.apache.spark.rdd.RDD[(String, Int)] = MapPartitionsRDD[27] at map at <console>:33</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val wordcount = counts.reduceByKey((a,b) => a + b)</span><br />
<span style="font-size: large;">wordcount: org.apache.spark.rdd.RDD[(String, Int)] = ShuffledRDD[28] at reduceByKey at <console>:35</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val output = "file:///home/orienit/work/output/wordcount"</span><br />
<span style="font-size: large;">output: String = file:///home/orienit/work/output/wordcount</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> wordcount.saveAsTextFile(output)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val output = "hdfs://localhost:8020/output/wordcount"</span><br />
<span style="font-size: large;">output: String = hdfs://localhost:8020/output/wordcount</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> wordcount.saveAsTextFile(output)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">val input = "file:///home/orienit/work/input/demoinput"</span><br />
<span style="font-size: large;">val output = "file:///home/orienit/work/output/wordcount"</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">val fileRdd = sc.textFile(input, 1)</span><br />
<span style="font-size: large;">val words = fileRdd.flatMap(line => line.split(" "))</span><br />
<span style="font-size: large;">val counts = words.map(word => (word, 1))</span><br />
<span style="font-size: large;">val wordcount = counts.reduceByKey((a,b) => a + b)</span><br />
<span style="font-size: large;">wordcount.saveAsTextFile(output)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">val input = "file:///home/orienit/work/input/demoinput"</span><br />
<span style="font-size: large;">val output = "file:///home/orienit/work/output/wordcount1"</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">val fileRdd = sc.textFile(input, 1)</span><br />
<span style="font-size: large;">val wordcount = fileRdd.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey((a,b) => a + b)</span><br />
<span style="font-size: large;">wordcount.saveAsTextFile(output)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">Grep in Spark using Scala:</span><br />
<span style="font-size: large;">-------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">val input = "file:///home/orienit/work/input/demoinput"</span><br />
<span style="font-size: large;">val output = "file:///home/orienit/work/output/grepjob"</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">val fileRdd = sc.textFile(input, 1)</span><br />
<span style="font-size: large;">val filterRdd = fileRdd.filter(line => line.contains("am"))</span><br />
<span style="font-size: large;">filterRdd.saveAsTextFile(output)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">WordCount in Spark using Python:</span><br />
<span style="font-size: large;">-------------------------------------</span><br />
<span style="font-size: large;">input = "file:///home/orienit/work/input/demoinput"</span><br />
<span style="font-size: large;">output = "file:///home/orienit/work/output/wordcount-py"</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">fileRdd = sc.textFile(input, 1)</span><br />
<span style="font-size: large;">words = fileRdd.flatMap(lambda line : line.split(" "))</span><br />
<span style="font-size: large;">counts = words.map(lambda word : (word, 1))</span><br />
<span style="font-size: large;">wordcount = counts.reduceByKey(lambda a,b : a + b)</span><br />
<span style="font-size: large;">wordcount.saveAsTextFile(output)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">>>> input = "file:///home/orienit/work/input/demoinput"</span><br />
<span style="font-size: large;">>>> output = "file:///home/orienit/work/output/wordcount-py"</span><br />
<span style="font-size: large;">>>> fileRdd = sc.textFile(input, 1)</span><br />
<span style="font-size: large;">16/11/27 15:08:16 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 127.4 KB, free 127.4 KB)</span><br />
<span style="font-size: large;">16/11/27 15:08:17 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 13.9 KB, free 141.3 KB)</span><br />
<span style="font-size: large;">16/11/27 15:08:17 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:42322 (size: 13.9 KB, free: 3.4 GB)</span><br />
<span style="font-size: large;">16/11/27 15:08:17 INFO SparkContext: Created broadcast 0 from textFile at NativeMethodAccessorImpl.java:-2</span><br />
<span style="font-size: large;">>>> words = fileRdd.flatMap(lambda line : line.split(" "))</span><br />
<span style="font-size: large;">>>> counts = words.map(lambda word : (word, 1))</span><br />
<span style="font-size: large;">>>> wordcount = counts.reduceByKey(lambda a,b : a + b)</span><br />
<span style="font-size: large;">>>> wordcount.saveAsTextFile(output)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">---------------------------------------------</span><br />
<span style="font-size: large;">Spark SQL:</span><br />
<span style="font-size: large;">---------------------</span><br />
<span style="font-size: large;">DataFrame <=> Table</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">orienit@kalyan:~$ cat /home/orienit/spark/input/student.json</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">{"name":"anil","id":1,"course":"spark","year":2016}</span><br />
<span style="font-size: large;">{"name":"anvith","id":5,"course":"hadoop","year":2015}</span><br />
<span style="font-size: large;">{"name":"dev","id":6,"course":"hadoop","year":2015}</span><br />
<span style="font-size: large;">{"name":"raj","id":3,"course":"spark","year":2016}</span><br />
<span style="font-size: large;">{"name":"sunil","id":4,"course":"hadoop","year":2015}</span><br />
<span style="font-size: large;">{"name":"venkat","id":2,"course":"spark","year":2016}</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">val df = sqlContext.read.json("file:///home/orienit/spark/input/student.json")</span><br />
<span style="font-size: large;">val df = sqlContext.read.parquet("file:///home/orienit/spark/input/student.parquet")</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val df = sqlContext.read.json("file:///home/orienit/spark/input/student.json")</span><br />
<span style="font-size: large;">df: org.apache.spark.sql.DataFrame = [course: string, id: bigint, name: string, year: bigint]</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> df.show()</span><br />
<span style="font-size: large;">+------+---+------+----+</span><br />
<span style="font-size: large;">|course| id| name|year|</span><br />
<span style="font-size: large;">+------+---+------+----+</span><br />
<span style="font-size: large;">| spark| 1| anil|2016|</span><br />
<span style="font-size: large;">|hadoop| 5|anvith|2015|</span><br />
<span style="font-size: large;">|hadoop| 6| dev|2015|</span><br />
<span style="font-size: large;">| spark| 3| raj|2016|</span><br />
<span style="font-size: large;">|hadoop| 4| sunil|2015|</span><br />
<span style="font-size: large;">| spark| 2|venkat|2016|</span><br />
<span style="font-size: large;">+------+---+------+----+</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val df = sqlContext.read.parquet("file:///home/orienit/spark/input/student.parquet")</span><br />
<span style="font-size: large;">df: org.apache.spark.sql.DataFrame = [name: string, id: int, course: string, year: int]</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> df.show()</span><br />
<span style="font-size: large;">+------+---+------+----+</span><br />
<span style="font-size: large;">| name| id|course|year|</span><br />
<span style="font-size: large;">+------+---+------+----+</span><br />
<span style="font-size: large;">| anil| 1| spark|2016|</span><br />
<span style="font-size: large;">|anvith| 5|hadoop|2015|</span><br />
<span style="font-size: large;">| dev| 6|hadoop|2015|</span><br />
<span style="font-size: large;">| raj| 3| spark|2016|</span><br />
<span style="font-size: large;">| sunil| 4|hadoop|2015|</span><br />
<span style="font-size: large;">|venkat| 2| spark|2016|</span><br />
<span style="font-size: large;">+------+---+------+----+</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">---------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">DataFrame supports 2 operations</span><br />
<span style="font-size: large;">1. DSL - Domain Specific Language</span><br />
<span style="font-size: large;">2. SQL - Structure Query Language</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">---------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">DSL Examples:</span><br />
<span style="font-size: large;">------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> df.show()</span><br />
<span style="font-size: large;">+------+---+------+----+</span><br />
<span style="font-size: large;">| name| id|course|year|</span><br />
<span style="font-size: large;">+------+---+------+----+</span><br />
<span style="font-size: large;">| anil| 1| spark|2016|</span><br />
<span style="font-size: large;">|anvith| 5|hadoop|2015|</span><br />
<span style="font-size: large;">| dev| 6|hadoop|2015|</span><br />
<span style="font-size: large;">| raj| 3| spark|2016|</span><br />
<span style="font-size: large;">| sunil| 4|hadoop|2015|</span><br />
<span style="font-size: large;">|venkat| 2| spark|2016|</span><br />
<span style="font-size: large;">+------+---+------+----+</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> df.show(3)</span><br />
<span style="font-size: large;">+------+---+------+----+</span><br />
<span style="font-size: large;">| name| id|course|year|</span><br />
<span style="font-size: large;">+------+---+------+----+</span><br />
<span style="font-size: large;">| anil| 1| spark|2016|</span><br />
<span style="font-size: large;">|anvith| 5|hadoop|2015|</span><br />
<span style="font-size: large;">| dev| 6|hadoop|2015|</span><br />
<span style="font-size: large;">+------+---+------+----+</span><br />
<span style="font-size: large;">only showing top 3 rows</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> df.limit(3)</span><br />
<span style="font-size: large;">res32: org.apache.spark.sql.DataFrame = [name: string, id: int, course: string, year: int]</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> df.limit(3).show()</span><br />
<span style="font-size: large;">+------+---+------+----+</span><br />
<span style="font-size: large;">| name| id|course|year|</span><br />
<span style="font-size: large;">+------+---+------+----+</span><br />
<span style="font-size: large;">| anil| 1| spark|2016|</span><br />
<span style="font-size: large;">|anvith| 5|hadoop|2015|</span><br />
<span style="font-size: large;">| dev| 6|hadoop|2015|</span><br />
<span style="font-size: large;">+------+---+------+----+</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> df.select("name", "id")</span><br />
<span style="font-size: large;">res34: org.apache.spark.sql.DataFrame = [name: string, id: int]</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> </span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> df.select("name", "id").show()</span><br />
<span style="font-size: large;">+------+---+</span><br />
<span style="font-size: large;">| name| id|</span><br />
<span style="font-size: large;">+------+---+</span><br />
<span style="font-size: large;">| anil| 1|</span><br />
<span style="font-size: large;">|anvith| 5|</span><br />
<span style="font-size: large;">| dev| 6|</span><br />
<span style="font-size: large;">| raj| 3|</span><br />
<span style="font-size: large;">| sunil| 4|</span><br />
<span style="font-size: large;">|venkat| 2|</span><br />
<span style="font-size: large;">+------+---+</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> df.select("name", "id").limit(2).show()</span><br />
<span style="font-size: large;">+------+---+</span><br />
<span style="font-size: large;">| name| id|</span><br />
<span style="font-size: large;">+------+---+</span><br />
<span style="font-size: large;">| anil| 1|</span><br />
<span style="font-size: large;">|anvith| 5|</span><br />
<span style="font-size: large;">+------+---+</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> df.filter("id > 4").show()</span><br />
<span style="font-size: large;">+------+---+------+----+</span><br />
<span style="font-size: large;">| name| id|course|year|</span><br />
<span style="font-size: large;">+------+---+------+----+</span><br />
<span style="font-size: large;">|anvith| 5|hadoop|2015|</span><br />
<span style="font-size: large;">| dev| 6|hadoop|2015|</span><br />
<span style="font-size: large;">+------+---+------+----+</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> df.where("id > 4").show()</span><br />
<span style="font-size: large;">+------+---+------+----+</span><br />
<span style="font-size: large;">| name| id|course|year|</span><br />
<span style="font-size: large;">+------+---+------+----+</span><br />
<span style="font-size: large;">|anvith| 5|hadoop|2015|</span><br />
<span style="font-size: large;">| dev| 6|hadoop|2015|</span><br />
<span style="font-size: large;">+------+---+------+----+</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">---------------------------------------------</span><br />
<span style="font-size: large;">SQL Examples:</span><br />
<span style="font-size: large;">---------------------</span><br />
<span style="font-size: large;">Register DataFrame as Temp Table</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> df.registerTempTable("student")</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> sqlContext.sql("select * from student")</span><br />
<span style="font-size: large;">res42: org.apache.spark.sql.DataFrame = [name: string, id: int, course: string, year: int]</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> sqlContext.sql("select * from student").show()</span><br />
<span style="font-size: large;">+------+---+------+----+</span><br />
<span style="font-size: large;">| name| id|course|year|</span><br />
<span style="font-size: large;">+------+---+------+----+</span><br />
<span style="font-size: large;">| anil| 1| spark|2016|</span><br />
<span style="font-size: large;">|anvith| 5|hadoop|2015|</span><br />
<span style="font-size: large;">| dev| 6|hadoop|2015|</span><br />
<span style="font-size: large;">| raj| 3| spark|2016|</span><br />
<span style="font-size: large;">| sunil| 4|hadoop|2015|</span><br />
<span style="font-size: large;">|venkat| 2| spark|2016|</span><br />
<span style="font-size: large;">+------+---+------+----+</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> sqlContext.sql("select name, id from student").show()</span><br />
<span style="font-size: large;">+------+---+</span><br />
<span style="font-size: large;">| name| id|</span><br />
<span style="font-size: large;">+------+---+</span><br />
<span style="font-size: large;">| anil| 1|</span><br />
<span style="font-size: large;">|anvith| 5|</span><br />
<span style="font-size: large;">| dev| 6|</span><br />
<span style="font-size: large;">| raj| 3|</span><br />
<span style="font-size: large;">| sunil| 4|</span><br />
<span style="font-size: large;">|venkat| 2|</span><br />
<span style="font-size: large;">+------+---+</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> sqlContext.sql("select * from student where id > 4").show()</span><br />
<span style="font-size: large;">+------+---+------+----+</span><br />
<span style="font-size: large;">| name| id|course|year|</span><br />
<span style="font-size: large;">+------+---+------+----+</span><br />
<span style="font-size: large;">|anvith| 5|hadoop|2015|</span><br />
<span style="font-size: large;">| dev| 6|hadoop|2015|</span><br />
<span style="font-size: large;">+------+---+------+----+</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> sqlContext.sql("select year, count(*) from student group by year").show()</span><br />
<span style="font-size: large;">+----+---+</span><br />
<span style="font-size: large;">|year|_c1|</span><br />
<span style="font-size: large;">+----+---+</span><br />
<span style="font-size: large;">|2015| 3|</span><br />
<span style="font-size: large;">|2016| 3|</span><br />
<span style="font-size: large;">+----+---+</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> sqlContext.sql("select year, count(*) as cnt from student group by year").show()</span><br />
<span style="font-size: large;">+----+---+</span><br />
<span style="font-size: large;">|year|cnt|</span><br />
<span style="font-size: large;">+----+---+</span><br />
<span style="font-size: large;">|2015| 3|</span><br />
<span style="font-size: large;">|2016| 3|</span><br />
<span style="font-size: large;">+----+---+</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">---------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">case class Contact(cid: Int, name: String, loc: String, pincode:Int)</span><br />
<span style="font-size: large;">case class Orders(oid: Int, cid: Int, status: String)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">val contact = sc.textFile("file:///home/orienit/spark/input/contact.csv").map(_.split(","))</span><br />
<span style="font-size: large;">val cdf = contact.map(c => Contact(c(0).trim.toInt, c(1), c(2), c(3).trim.toInt)).toDF()</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">val orders = sc.textFile("file:///home/orienit/spark/input/orders.tsv").map(_.split("\t"))</span><br />
<span style="font-size: large;">val odf = orders.map(x => Orders(x(0).trim.toInt, x(1).trim.toInt, x(2))).toDF()</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">---------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> case class Contact(cid: Int, name: String, loc: String, pincode:Int)</span><br />
<span style="font-size: large;">defined class Contact</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> case class Orders(oid: Int, cid: Int, status: String)</span><br />
<span style="font-size: large;">defined class Orders</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> </span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val contact = sc.textFile("file:///home/orienit/spark/input/contact.csv").map(_.split(","))</span><br />
<span style="font-size: large;">contact: org.apache.spark.rdd.RDD[Array[String]] = MapPartitionsRDD[113] at map at <console>:27</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val cdf = contact.map(c => Contact(c(0).trim.toInt, c(1), c(2), c(3).trim.toInt)).toDF()</span><br />
<span style="font-size: large;">cdf: org.apache.spark.sql.DataFrame = [cid: int, name: string, loc: string, pincode: int]</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> </span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val orders = sc.textFile("file:///home/orienit/spark/input/orders.tsv").map(_.split("\t"))</span><br />
<span style="font-size: large;">orders: org.apache.spark.rdd.RDD[Array[String]] = MapPartitionsRDD[118] at map at <console>:27</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val odf = orders.map(x => Orders(x(0).trim.toInt, x(1).trim.toInt, x(2))).toDF()</span><br />
<span style="font-size: large;">odf: org.apache.spark.sql.DataFrame = [oid: int, cid: int, status: string]</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">---------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> cdf.show()</span><br />
<span style="font-size: large;">+---+------+----+-------+</span><br />
<span style="font-size: large;">|cid| name| loc|pincode|</span><br />
<span style="font-size: large;">+---+------+----+-------+</span><br />
<span style="font-size: large;">| 1|kalyan| hyd| 500072|</span><br />
<span style="font-size: large;">| 2|venkat| hyd| 500073|</span><br />
<span style="font-size: large;">| 3| anil| hyd| 500071|</span><br />
<span style="font-size: large;">| 4| raj| hyd| 500075|</span><br />
<span style="font-size: large;">| 5| arun| hyd| 500074|</span><br />
<span style="font-size: large;">| 6| vani|bang| 600072|</span><br />
<span style="font-size: large;">| 7| vamsi|bang| 600073|</span><br />
<span style="font-size: large;">| 8|prasad|bang| 600076|</span><br />
<span style="font-size: large;">| 9|anvith|bang| 600075|</span><br />
<span style="font-size: large;">| 10| swamy|bang| 600071|</span><br />
<span style="font-size: large;">+---+------+----+-------+</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> odf.show()</span><br />
<span style="font-size: large;">+---+---+-------+</span><br />
<span style="font-size: large;">|oid|cid| status|</span><br />
<span style="font-size: large;">+---+---+-------+</span><br />
<span style="font-size: large;">|111| 1|success|</span><br />
<span style="font-size: large;">|112| 1|failure|</span><br />
<span style="font-size: large;">|113| 2|success|</span><br />
<span style="font-size: large;">|114| 3|success|</span><br />
<span style="font-size: large;">|115| 2|failure|</span><br />
<span style="font-size: large;">|116| 3|failure|</span><br />
<span style="font-size: large;">|117| 2|success|</span><br />
<span style="font-size: large;">|118| 5|success|</span><br />
<span style="font-size: large;">|119| 6|failure|</span><br />
<span style="font-size: large;">|120| 2|success|</span><br />
<span style="font-size: large;">|121| 3|failure|</span><br />
<span style="font-size: large;">|122| 7|success|</span><br />
<span style="font-size: large;">|123| 3|failure|</span><br />
<span style="font-size: large;">|124| 2|success|</span><br />
<span style="font-size: large;">|125| 1|failure|</span><br />
<span style="font-size: large;">|126| 5|success|</span><br />
<span style="font-size: large;">+---+---+-------+</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">---------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> cdf.registerTempTable("customers")</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> odf.registerTempTable("orders")</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> sqlContext.sql("select customers.*, orders.* from customers join orders on customers.cid == orders.cid").show()</span><br />
<span style="font-size: large;">+---+------+----+-------+---+---+-------+</span><br />
<span style="font-size: large;">|cid| name| loc|pincode|oid|cid| status|</span><br />
<span style="font-size: large;">+---+------+----+-------+---+---+-------+</span><br />
<span style="font-size: large;">| 1|kalyan| hyd| 500072|111| 1|success|</span><br />
<span style="font-size: large;">| 1|kalyan| hyd| 500072|112| 1|failure|</span><br />
<span style="font-size: large;">| 1|kalyan| hyd| 500072|125| 1|failure|</span><br />
<span style="font-size: large;">| 2|venkat| hyd| 500073|113| 2|success|</span><br />
<span style="font-size: large;">| 2|venkat| hyd| 500073|115| 2|failure|</span><br />
<span style="font-size: large;">| 2|venkat| hyd| 500073|117| 2|success|</span><br />
<span style="font-size: large;">| 2|venkat| hyd| 500073|120| 2|success|</span><br />
<span style="font-size: large;">| 2|venkat| hyd| 500073|124| 2|success|</span><br />
<span style="font-size: large;">| 3| anil| hyd| 500071|114| 3|success|</span><br />
<span style="font-size: large;">| 3| anil| hyd| 500071|116| 3|failure|</span><br />
<span style="font-size: large;">| 3| anil| hyd| 500071|121| 3|failure|</span><br />
<span style="font-size: large;">| 3| anil| hyd| 500071|123| 3|failure|</span><br />
<span style="font-size: large;">| 5| arun| hyd| 500074|118| 5|success|</span><br />
<span style="font-size: large;">| 5| arun| hyd| 500074|126| 5|success|</span><br />
<span style="font-size: large;">| 6| vani|bang| 600072|119| 6|failure|</span><br />
<span style="font-size: large;">| 7| vamsi|bang| 600073|122| 7|success|</span><br />
<span style="font-size: large;">+---+------+----+-------+---+---+-------+</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">---------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val jdf = sqlContext.sql("select customers.cid, customers.name, customers.loc, customers.pincode, orders.oid, orders.status from customers join orders on customers.cid == orders.cid")</span><br />
<span style="font-size: large;">jdf: org.apache.spark.sql.DataFrame = [cid: int, name: string, loc: string, pincode: int, oid: int, status: string]</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">val prop = new java.util.Properties</span><br />
<span style="font-size: large;">prop.setProperty("driver","com.mysql.jdbc.Driver")</span><br />
<span style="font-size: large;">prop.setProperty("user","root")</span><br />
<span style="font-size: large;">prop.setProperty("password","hadoop")</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">val url = "jdbc:mysql://localhost:3306/kalyan"</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">jdf.write.jdbc(url, "jointable", prop)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">---------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<br /></div>
Anonymoushttp://www.blogger.com/profile/01685371156177399870noreply@blogger.com2tag:blogger.com,1999:blog-2182833570422175384.post-4177284411399632432016-11-26T16:20:00.000-08:002016-11-26T16:32:47.716-08:00Sizing and Configuring your Hadoop Cluster<div dir="ltr" style="text-align: left;" trbidi="on">
<div class="article-content" itemprop="articleBody" style="background-color: white; border-bottom: 4px solid rgb(255, 137, 1); color: #536576; font-family: ubuntu, sans-serif; margin-bottom: 3px;">
<h1 style="font-weight: normal; margin: 20px 0px; padding: 0px; position: relative;">
<span style="font-size: x-large;">
Sizing your Hadoop cluster</span></h1>
<div style="color: #333333; font-family: merriweather, georgia, serif; line-height: 1.9rem; margin-bottom: 20px; margin-top: 0.6em; padding: 0px;">
<span style="font-size: large;">Hadoop's performance depends on multiple factors based on well-configured software layers and well-dimensioned hardware resources that utilize its CPU, Memory, hard drive (storage I/O) and network bandwidth efficiently.</span></div>
<div style="color: #333333; font-family: merriweather, georgia, serif; line-height: 1.9rem; margin-bottom: 20px; margin-top: 0.6em; padding: 0px;">
<span style="font-size: large;">Planning the Hadoop cluster remains a complex task that requires minimum knowledge of the Hadoop architecture and may be out the scope of this book. This is what we are trying to make clearer in this section by providing explanations and formulas in order to help you to best estimate your needs. We will introduce a basic guideline that will help you to make your decision while sizing your cluster and answer some <em>How to plan</em> questions about cluster's needs such as the following:</span></div>
<ul style="line-height: 1.4; margin: 0.5em 0px 1em; padding: 0px;">
<li style="background: url("/themes/garland/images/menu-leaf.gif") 1px 0.35em no-repeat transparent; list-style-image: none; list-style-type: none; margin: 0.15em 0px 0.15em 0.5em; padding: 0px 0px 0.2em 1.5em;"><span style="font-size: large;">How to plan my storage?</span></li>
<li style="background: url("/themes/garland/images/menu-leaf.gif") 1px 0.35em no-repeat transparent; list-style-image: none; list-style-type: none; margin: 0.15em 0px 0.15em 0.5em; padding: 0px 0px 0.2em 1.5em;"><span style="font-size: large;">How to plan my CPU?</span></li>
<li style="background: url("/themes/garland/images/menu-leaf.gif") 1px 0.35em no-repeat transparent; list-style-image: none; list-style-type: none; margin: 0.15em 0px 0.15em 0.5em; padding: 0px 0px 0.2em 1.5em;"><span style="font-size: large;">How to plan my memory?</span></li>
<li style="background: url("/themes/garland/images/menu-leaf.gif") 1px 0.35em no-repeat transparent; list-style-image: none; list-style-type: none; margin: 0.15em 0px 0.15em 0.5em; padding: 0px 0px 0.2em 1.5em;"><span style="font-size: large;">How to plan the network bandwidth?</span></li>
</ul>
<div style="color: #333333; font-family: merriweather, georgia, serif; line-height: 1.9rem; margin-bottom: 20px; margin-top: 0.6em; padding: 0px;">
<span style="font-size: large;">While sizing your Hadoop cluster, you should also consider the data volume that the final users will process on the cluster. The answer to this question will lead you to determine how many machines (nodes) you need in your cluster to process the input data efficiently and determine the disk/memory capacity of each one.</span></div>
<div style="color: #333333; font-family: merriweather, georgia, serif; line-height: 1.9rem; margin-bottom: 20px; margin-top: 0.6em; padding: 0px;">
<span style="font-size: large;">Hadoop is a Master/Slave architecture and needs a lot of memory and CPU bound. It has two main components:</span></div>
<ul style="line-height: 1.4; margin: 0.5em 0px 1em; padding: 0px;">
<li style="background: url("/themes/garland/images/menu-leaf.gif") 1px 0.35em no-repeat transparent; list-style-image: none; list-style-type: none; margin: 0.15em 0px 0.15em 0.5em; padding: 0px 0px 0.2em 1.5em;"><span style="font-size: large;"><span style="font-weight: 700;">JobTracker:</span> This is the critical component in this architecture and monitors jobs that are running on the cluster</span></li>
<li style="background: url("/themes/garland/images/menu-leaf.gif") 1px 0.35em no-repeat transparent; list-style-image: none; list-style-type: none; margin: 0.15em 0px 0.15em 0.5em; padding: 0px 0px 0.2em 1.5em;"><span style="font-size: large;"><span style="font-weight: 700;">TaskTracker:</span> This runs tasks on each node of the cluster</span></li>
</ul>
<div style="color: #333333; font-family: merriweather, georgia, serif; line-height: 1.9rem; margin-bottom: 20px; margin-top: 0.6em; padding: 0px;">
<span style="font-size: large;">To work efficiently, HDFS must have high throughput hard drives with an underlying filesystem that supports the HDFS read and write pattern (large block). This pattern defines one big read (or write) at a time with a block size of 64 MB, 128 MB, up to 256 MB. Also, the network layer should be fast enough to cope with intermediate data transfer and block.</span></div>
<div style="color: #333333; font-family: merriweather, georgia, serif; line-height: 1.9rem; margin-bottom: 20px; margin-top: 0.6em; padding: 0px;">
<span style="font-size: large;">HDFS is itself based on a Master/Slave architecture with two main components: the NameNode / Secondary NameNode and DataNode components. These are critical components and need a lot of memory to store the file's meta information such as attributes and file localization, directory structure, names, and to process data. The NameNode component ensures that data blocks are properly replicated in the cluster. The second component, the DataNode component, manages the state of an HDFS node and interacts with its data blocks. It requires a lot of I/O for processing and data transfer.</span></div>
<div style="color: #333333; font-family: merriweather, georgia, serif; line-height: 1.9rem; margin-bottom: 20px; margin-top: 0.6em; padding: 0px;">
<span style="font-size: large;">Typically, the MapReduce layer has two main prerequisites: input datasets must be large enough to fill a data block and split in smaller and independent data chunks (for example, a 10 GB text file can be split into 40,960 blocks of 256 MB each, and each line of text in any data block can be processed independently). The second prerequisite is that it should consider the <span style="font-weight: 700;">data locality</span>, which means that the MapReduce code is moved where the data lies, not the opposite (it is more efficient to move a few megabytes of code to be close to the data to be processed, than moving many data blocks over the network or the disk). This involves having a distributed storage system that exposes data locality and allows the execution of code on any storage node.</span></div>
<div style="color: #333333; font-family: merriweather, georgia, serif; line-height: 1.9rem; margin-bottom: 20px; margin-top: 0.6em; padding: 0px;">
<span style="font-size: large;">Concerning the network bandwidth, it is used at two instances: during the replication process and following a file write, and during the balancing of the replication factor when a node fails.</span></div>
<div style="color: #333333; font-family: merriweather, georgia, serif; line-height: 1.9rem; margin-bottom: 20px; margin-top: 0.6em; padding: 0px;">
<span style="font-size: large;">The most common practice to size a Hadoop cluster is sizing the cluster based on the amount of storage required. The more data into the system, the more will be the machines required. Each time you add a new node to the cluster, you get more computing resources in addition to the new storage capacity.</span></div>
<div style="color: #333333; font-family: merriweather, georgia, serif; line-height: 1.9rem; margin-bottom: 20px; margin-top: 0.6em; padding: 0px;">
<span style="font-size: large;">Let's consider an example cluster growth plan based on storage and learn how to determine the storage needed, the amount of memory, and the number of DataNodes in the cluster.</span></div>
<table border="1" cellpadding="0" cellspacing="0" style="font-size: 1em; height: 216px; margin: 1em 0px; width:100%;"><tbody>
<tr><td style="border: 1px solid rgb(0, 0, 0); padding: 0.3em 0.5em; width: 542px;" valign="top"><div style="color: #333333; font-family: Merriweather, Georgia, serif; font-size: 1rem; line-height: 1.9rem; margin-bottom: 20px; margin-top: 0.6em; padding: 0px;">
<span style="font-size: 1.1em; font-weight: 700;">Daily data input</span></div>
</td><td style="border: 1px solid rgb(0, 0, 0); padding: 0.3em 0.5em; width: 542px;" valign="top"><div style="color: #333333; font-family: Merriweather, Georgia, serif; font-size: 1rem; line-height: 1.9rem; margin-bottom: 20px; margin-top: 0.6em; padding: 0px;">
100 GB</div>
</td><td rowspan="2" style="border: 1px solid rgb(0, 0, 0); padding: 0.3em 0.5em; width: 542px;" valign="top"><div style="color: #333333; font-family: Merriweather, Georgia, serif; font-size: 1rem; line-height: 1.9rem; margin-bottom: 20px; margin-top: 0.6em; padding: 0px;">
<em>Storage space used by daily data input = daily data input * replication factor = 300 GB</em></div>
</td></tr>
<tr><td style="border: 1px solid rgb(0, 0, 0); padding: 0.3em 0.5em; width: 542px;" valign="top"><div style="color: #333333; font-family: Merriweather, Georgia, serif; font-size: 1rem; line-height: 1.9rem; margin-bottom: 20px; margin-top: 0.6em; padding: 0px;">
<span style="font-size: 1.1em; font-weight: 700;">HDFS replication factor</span></div>
</td><td style="border: 1px solid rgb(0, 0, 0); padding: 0.3em 0.5em; width: 542px;" valign="top"><div style="color: #333333; font-family: Merriweather, Georgia, serif; font-size: 1rem; line-height: 1.9rem; margin-bottom: 20px; margin-top: 0.6em; padding: 0px;">
3</div>
</td></tr>
<tr><td style="border: 1px solid rgb(0, 0, 0); padding: 0.3em 0.5em; width: 542px;" valign="top"><div style="color: #333333; font-family: Merriweather, Georgia, serif; font-size: 1rem; line-height: 1.9rem; margin-bottom: 20px; margin-top: 0.6em; padding: 0px;">
<span style="font-size: 1.1em; font-weight: 700;">Monthly growth</span></div>
</td><td style="border: 1px solid rgb(0, 0, 0); padding: 0.3em 0.5em; width: 542px;" valign="top"><div style="color: #333333; font-family: Merriweather, Georgia, serif; font-size: 1rem; line-height: 1.9rem; margin-bottom: 20px; margin-top: 0.6em; padding: 0px;">
5%</div>
</td><td style="border: 1px solid rgb(0, 0, 0); padding: 0.3em 0.5em; width: 542px;" valign="top"><div style="color: #333333; font-family: Merriweather, Georgia, serif; font-size: 1rem; line-height: 1.9rem; margin-bottom: 20px; margin-top: 0.6em; padding: 0px;">
<em>Monthly volume = (300 * 30) + 5% = 9450 GB</em></div>
<div style="color: #333333; font-family: Merriweather, Georgia, serif; font-size: 1rem; line-height: 1.9rem; margin-bottom: 20px; margin-top: 0.6em; padding: 0px;">
<em>After one year = 9450 * (1 + 0.05)^12 = 16971 GB</em></div>
</td></tr>
<tr><td style="border: 1px solid rgb(0, 0, 0); padding: 0.3em 0.5em; width: 542px;" valign="top"><div style="color: #333333; font-family: Merriweather, Georgia, serif; font-size: 1rem; line-height: 1.9rem; margin-bottom: 20px; margin-top: 0.6em; padding: 0px;">
<span style="font-size: 1.1em; font-weight: 700;">Intermediate MapReduce data</span></div>
</td><td style="border: 1px solid rgb(0, 0, 0); padding: 0.3em 0.5em; width: 542px;" valign="top"><div style="color: #333333; font-family: Merriweather, Georgia, serif; font-size: 1rem; line-height: 1.9rem; margin-bottom: 20px; margin-top: 0.6em; padding: 0px;">
25%</div>
</td><td rowspan="3" style="border: 1px solid rgb(0, 0, 0); padding: 0.3em 0.5em; width: 542px;" valign="top"><div style="color: #333333; font-family: Merriweather, Georgia, serif; font-size: 1rem; line-height: 1.9rem; margin-bottom: 20px; margin-top: 0.6em; padding: 0px;">
<em>Dedicated space = HDD size * (1 - Non HDFS reserved space per disk / 100 + Intermediate MapReduce data / 100)</em></div>
<div style="color: #333333; font-family: Merriweather, Georgia, serif; font-size: 1rem; line-height: 1.9rem; margin-bottom: 20px; margin-top: 0.6em; padding: 0px;">
<em>= 4 * (1 - (0.25 + 0.30)) = 1.8 TB (which is the node capacity)</em></div>
</td></tr>
<tr><td style="border: 1px solid rgb(0, 0, 0); padding: 0.3em 0.5em; width: 542px;" valign="top"><div style="color: #333333; font-family: Merriweather, Georgia, serif; font-size: 1rem; line-height: 1.9rem; margin-bottom: 20px; margin-top: 0.6em; padding: 0px;">
<span style="font-size: 1.1em; font-weight: 700;">Non HDFS reserved space per disk</span></div>
</td><td style="border: 1px solid rgb(0, 0, 0); padding: 0.3em 0.5em; width: 542px;" valign="top"><div style="color: #333333; font-family: Merriweather, Georgia, serif; font-size: 1rem; line-height: 1.9rem; margin-bottom: 20px; margin-top: 0.6em; padding: 0px;">
30%</div>
</td></tr>
<tr><td style="border: 1px solid rgb(0, 0, 0); padding: 0.3em 0.5em; width: 542px;" valign="top"><div style="color: #333333; font-family: Merriweather, Georgia, serif; font-size: 1rem; line-height: 1.9rem; margin-bottom: 20px; margin-top: 0.6em; padding: 0px;">
<span style="font-size: 1.1em; font-weight: 700;">Size of a hard drive disk</span></div>
</td><td style="border: 1px solid rgb(0, 0, 0); padding: 0.3em 0.5em; width: 542px;" valign="top"><div style="color: #333333; font-family: Merriweather, Georgia, serif; font-size: 1rem; line-height: 1.9rem; margin-bottom: 20px; margin-top: 0.6em; padding: 0px;">
4 TB</div>
</td></tr>
<tr><td colspan="3" style="border: 1px solid rgb(0, 0, 0); padding: 0.3em 0.5em; width: 542px;" valign="top"><div style="color: #333333; font-family: Merriweather, Georgia, serif; font-size: 1rem; line-height: 1.9rem; margin-bottom: 20px; margin-top: 0.6em; padding: 0px;">
<span style="font-size: 1.1em; font-weight: 700;">Number of DataNodes needed to process:</span></div>
<div style="color: #333333; font-family: Merriweather, Georgia, serif; font-size: 1rem; line-height: 1.9rem; margin-bottom: 20px; margin-top: 0.6em; padding: 0px;">
<em>Whole first month data = 9.450 / 1800 ~= 6 nodes</em></div>
<div style="color: #333333; font-family: Merriweather, Georgia, serif; font-size: 1rem; line-height: 1.9rem; margin-bottom: 20px; margin-top: 0.6em; padding: 0px;">
<em>The 12th month data = 16.971/ 1800 ~= 10 nodes</em></div>
<div style="color: #333333; font-family: Merriweather, Georgia, serif; font-size: 1rem; line-height: 1.9rem; margin-bottom: 20px; margin-top: 0.6em; padding: 0px;">
<em>Whole year data = 157.938 / 1800 ~= 88 nodes</em></div>
</td></tr>
</tbody></table>
<div style="color: #333333; font-family: merriweather, georgia, serif; font-size: 1rem; line-height: 1.9rem; margin: 0.6em 40px 20px; padding: 0px;">
<em>Do not use RAID array disks on a DataNode. HDFS provides its own replication mechanism. It is also important to note that for every disk, 30 percent of its capacity should be reserved to non-HDFS use.</em></div>
<div style="color: #333333; font-family: merriweather, georgia, serif; line-height: 1.9rem; margin-bottom: 20px; margin-top: 0.6em; padding: 0px;">
<span style="font-size: large;">It is easy to determine the memory needed for both NameNode and Secondary NameNode. The memory needed by NameNode to manage the HDFS cluster metadata in memory and the memory needed for the OS must be added together. Typically, the memory needed by Secondary NameNode should be identical to NameNode. Then you can apply the following formulas to determine the memory amount:</span></div>
<table border="1" cellpadding="0" cellspacing="0" style="font-size: 1em; height: 83px; margin: 1em 0px; width:100%;"><tbody>
<tr><td style="border: 1px solid rgb(0, 0, 0); padding: 0.3em 0.5em; width: 181px;" valign="top"><div style="color: #333333; font-family: Merriweather, Georgia, serif; font-size: 1rem; line-height: 1.9rem; margin-bottom: 20px; margin-top: 0.6em; padding: 0px;">
<span style="font-size: 1.1em; font-weight: 700;">NameNode memory</span></div>
</td><td style="border: 1px solid rgb(0, 0, 0); padding: 0.3em 0.5em; width: 181px;" valign="top"><div style="color: #333333; font-family: Merriweather, Georgia, serif; font-size: 1rem; line-height: 1.9rem; margin-bottom: 20px; margin-top: 0.6em; padding: 0px;">
2 GB - 4 GB</div>
</td><td rowspan="4" style="border: 1px solid rgb(0, 0, 0); padding: 0.3em 0.5em; width: 181px;" valign="top"><div style="color: #333333; font-family: Merriweather, Georgia, serif; font-size: 1rem; line-height: 1.9rem; margin-bottom: 20px; margin-top: 0.6em; padding: 0px;">
<em>Memory amount = HDFS cluster management memory + NameNode memory + OS memory</em></div>
</td></tr>
<tr><td style="border: 1px solid rgb(0, 0, 0); padding: 0.3em 0.5em; width: 181px;" valign="top"><div style="color: #333333; font-family: Merriweather, Georgia, serif; font-size: 1rem; line-height: 1.9rem; margin-bottom: 20px; margin-top: 0.6em; padding: 0px;">
<span style="font-size: 1.1em; font-weight: 700;">Secondary NameNode memory</span></div>
</td><td style="border: 1px solid rgb(0, 0, 0); padding: 0.3em 0.5em; width: 181px;" valign="top"><div style="color: #333333; font-family: Merriweather, Georgia, serif; font-size: 1rem; line-height: 1.9rem; margin-bottom: 20px; margin-top: 0.6em; padding: 0px;">
2 GB - 4 GB</div>
</td></tr>
<tr><td style="border: 1px solid rgb(0, 0, 0); padding: 0.3em 0.5em; width: 181px;" valign="top"><div style="color: #333333; font-family: Merriweather, Georgia, serif; font-size: 1rem; line-height: 1.9rem; margin-bottom: 20px; margin-top: 0.6em; padding: 0px;">
<span style="font-size: 1.1em; font-weight: 700;">OS memory</span></div>
</td><td style="border: 1px solid rgb(0, 0, 0); padding: 0.3em 0.5em; width: 181px;" valign="top"><div style="color: #333333; font-family: Merriweather, Georgia, serif; font-size: 1rem; line-height: 1.9rem; margin-bottom: 20px; margin-top: 0.6em; padding: 0px;">
4 GB - 8 GB</div>
</td></tr>
<tr><td style="border: 1px solid rgb(0, 0, 0); padding: 0.3em 0.5em; width: 181px;" valign="top"><div style="color: #333333; font-family: Merriweather, Georgia, serif; font-size: 1rem; line-height: 1.9rem; margin-bottom: 20px; margin-top: 0.6em; padding: 0px;">
<span style="font-size: 1.1em; font-weight: 700;">HDFS memory</span></div>
</td><td style="border: 1px solid rgb(0, 0, 0); padding: 0.3em 0.5em; width: 181px;" valign="top"><div style="color: #333333; font-family: Merriweather, Georgia, serif; font-size: 1rem; line-height: 1.9rem; margin-bottom: 20px; margin-top: 0.6em; padding: 0px;">
2 GB - 8 GB</div>
</td></tr>
<tr><td colspan="3" style="border: 1px solid rgb(0, 0, 0); padding: 0.3em 0.5em; width: 181px;" valign="top"><div style="color: #333333; font-family: Merriweather, Georgia, serif; font-size: 1rem; line-height: 1.9rem; margin-bottom: 20px; margin-top: 0.6em; padding: 0px;">
<em>At least NameNode (Secondary NameNode) memory = 2 + 2 + 4 = 8 GB</em></div>
</td></tr>
</tbody></table>
<div style="color: #333333; font-family: merriweather, georgia, serif; line-height: 1.9rem; margin-bottom: 20px; margin-top: 0.6em; padding: 0px;">
<span style="font-size: large;">It is also easy to determine the DataNode memory amount. But this time, the memory amount depends on the physical CPU's core number installed on each DataNode.</span></div>
<table border="1" cellpadding="0" cellspacing="0" style="font-size: 1em; height: 96px; margin: 1em 0px; width:100%;"><tbody>
<tr><td style="border: 1px solid rgb(0, 0, 0); padding: 0.3em 0.5em; width: 181px;" valign="top"><div style="color: #333333; font-family: Merriweather, Georgia, serif; font-size: 1rem; line-height: 1.9rem; margin-bottom: 20px; margin-top: 0.6em; padding: 0px;">
<span style="font-size: 1.1em; font-weight: 700;">DataNode process memory</span></div>
</td><td style="border: 1px solid rgb(0, 0, 0); padding: 0.3em 0.5em; width: 181px;" valign="top"><div style="color: #333333; font-family: Merriweather, Georgia, serif; font-size: 1rem; line-height: 1.9rem; margin-bottom: 20px; margin-top: 0.6em; padding: 0px;">
4 GB - 8 GB</div>
</td><td rowspan="4" style="border: 1px solid rgb(0, 0, 0); padding: 0.3em 0.5em; width: 181px;" valign="top"><div style="color: #333333; font-family: Merriweather, Georgia, serif; font-size: 1rem; line-height: 1.9rem; margin-bottom: 20px; margin-top: 0.6em; padding: 0px;">
<em>Memory amount = Memory per CPU core * number of CPU's core + DataNode process memory + DataNode TaskTracker memory + OS memory</em></div>
</td></tr>
<tr><td style="border: 1px solid rgb(0, 0, 0); padding: 0.3em 0.5em; width: 181px;" valign="top"><div style="color: #333333; font-family: Merriweather, Georgia, serif; font-size: 1rem; line-height: 1.9rem; margin-bottom: 20px; margin-top: 0.6em; padding: 0px;">
<span style="font-size: 1.1em; font-weight: 700;">DataNode TaskTracker memory</span></div>
</td><td style="border: 1px solid rgb(0, 0, 0); padding: 0.3em 0.5em; width: 181px;" valign="top"><div style="color: #333333; font-family: Merriweather, Georgia, serif; font-size: 1rem; line-height: 1.9rem; margin-bottom: 20px; margin-top: 0.6em; padding: 0px;">
4 GB - 8 GB</div>
</td></tr>
<tr><td style="border: 1px solid rgb(0, 0, 0); padding: 0.3em 0.5em; width: 181px;" valign="top"><div style="color: #333333; font-family: Merriweather, Georgia, serif; font-size: 1rem; line-height: 1.9rem; margin-bottom: 20px; margin-top: 0.6em; padding: 0px;">
<span style="font-size: 1.1em; font-weight: 700;">OS memory</span></div>
</td><td style="border: 1px solid rgb(0, 0, 0); padding: 0.3em 0.5em; width: 181px;" valign="top"><div style="color: #333333; font-family: Merriweather, Georgia, serif; font-size: 1rem; line-height: 1.9rem; margin-bottom: 20px; margin-top: 0.6em; padding: 0px;">
4 GB - 8 GB</div>
</td></tr>
<tr><td style="border: 1px solid rgb(0, 0, 0); padding: 0.3em 0.5em; width: 181px;" valign="top"><div style="color: #333333; font-family: Merriweather, Georgia, serif; font-size: 1rem; line-height: 1.9rem; margin-bottom: 20px; margin-top: 0.6em; padding: 0px;">
<span style="font-size: 1.1em; font-weight: 700;">CPU's core number</span></div>
</td><td style="border: 1px solid rgb(0, 0, 0); padding: 0.3em 0.5em; width: 181px;" valign="top"><div style="color: #333333; font-family: Merriweather, Georgia, serif; font-size: 1rem; line-height: 1.9rem; margin-bottom: 20px; margin-top: 0.6em; padding: 0px;">
4+</div>
</td></tr>
<tr><td style="border: 1px solid rgb(0, 0, 0); padding: 0.3em 0.5em; width: 181px;" valign="top"><div style="color: #333333; font-family: Merriweather, Georgia, serif; font-size: 1rem; line-height: 1.9rem; margin-bottom: 20px; margin-top: 0.6em; padding: 0px;">
<span style="font-size: 1.1em; font-weight: 700;">Memory per CPU core</span></div>
</td><td style="border: 1px solid rgb(0, 0, 0); padding: 0.3em 0.5em; width: 181px;" valign="top"><div style="color: #333333; font-family: Merriweather, Georgia, serif; font-size: 1rem; line-height: 1.9rem; margin-bottom: 20px; margin-top: 0.6em; padding: 0px;">
4 GB - 8 GB</div>
</td><td style="border: 1px solid rgb(0, 0, 0); padding: 0.3em 0.5em; width: 181px;" valign="top"><div style="color: #333333; font-family: Merriweather, Georgia, serif; font-size: 1rem; line-height: 1.9rem; margin-bottom: 20px; margin-top: 0.6em; padding: 0px;">
<br /></div>
</td></tr>
<tr><td colspan="3" style="border: 1px solid rgb(0, 0, 0); padding: 0.3em 0.5em; width: 181px;" valign="top"><div style="color: #333333; font-family: Merriweather, Georgia, serif; font-size: 1rem; line-height: 1.9rem; margin-bottom: 20px; margin-top: 0.6em; padding: 0px;">
<em>At least DataNode memory = 4*4 + 4 + 4 + 4 = 28 GB</em></div>
</td></tr>
</tbody></table>
<div style="color: #333333; font-family: merriweather, georgia, serif; line-height: 1.9rem; margin-bottom: 20px; margin-top: 0.6em; padding: 0px;">
<span style="font-size: large;">Regarding how to determine the CPU and the network bandwidth, we suggest using the now-a-days multicore CPUs with at least four physical cores per CPU. The more physical CPU's cores you have, the more you will be able to enhance your job's performance (according to all rules discussed to avoid underutilization or overutilization). For the network switches, we recommend to use equipment having a high throughput (such as 10 GB) Ethernet intra rack with N x 10 GB Ethernet inter rack.</span></div>
<h1 style="font-weight: normal; margin: 20px 0px; padding: 0px; position: relative;">
<span style="font-size: x-large;"><br /></span></h1>
<h1 style="font-weight: normal; margin: 20px 0px; padding: 0px; position: relative;">
<span style="font-size: x-large;">
Configuring your cluster correctly</span></h1>
<div style="color: #333333; font-family: merriweather, georgia, serif; line-height: 1.9rem; margin-bottom: 20px; margin-top: 0.6em; padding: 0px;">
<span style="font-size: large;">To run <a href="https://www.packtpub.com/tech/hadoop" style="color: #548ed7; cursor: pointer; text-decoration: none;">Hadoop</a> and get a maximum performance, it needs to be configured correctly. But the question is how to do that. Well, based on our experiences, we can say that there is not one single answer to this question. The experiences gave us a clear indication that the Hadoop framework should be adapted for the cluster it is running on and sometimes also to the job.</span></div>
<div style="color: #333333; font-family: merriweather, georgia, serif; line-height: 1.9rem; margin-bottom: 20px; margin-top: 0.6em; padding: 0px;">
<span style="font-size: large;">In order to configure your cluster correctly, we recommend running a Hadoop job(s) the first time with its default configuration to get a baseline. Then, you will check the resource's weakness (if it exists) by analyzing the job history logfiles and report the results (measured time it took to run the jobs). After that, iteratively, you will tune your Hadoop configuration and re-run the job until you get the configuration that fits your business needs.</span></div>
<div style="color: #333333; font-family: merriweather, georgia, serif; line-height: 1.9rem; margin-bottom: 20px; margin-top: 0.6em; padding: 0px;">
<span style="font-size: large;">The number of mappers and reducer tasks that a job should use is important. Picking the right amount of tasks for a job can have a huge impact on Hadoop's performance.</span></div>
<div style="color: #333333; font-family: merriweather, georgia, serif; line-height: 1.9rem; margin-bottom: 20px; margin-top: 0.6em; padding: 0px;">
<span style="font-size: large;">The number of reducer tasks should be less than the number of mapper tasks. Google reports one reducer for 20 mappers; the others give different guidelines. This is because mapper tasks often process a lot of data, and the result of those tasks are passed to the reducer tasks. Often, a reducer task is just an aggregate function that processes a minor portion of the data compared to the mapper tasks. Also, the correct number of reducers must also be considered.</span></div>
<div style="color: #333333; font-family: merriweather, georgia, serif; line-height: 1.9rem; margin-bottom: 20px; margin-top: 0.6em; padding: 0px;">
<span style="font-size: large;">The number of mappers and reducers is related to the number of physical cores on the DataNode, which determines the maximum number of jobs that can run in parallel on DataNode.</span></div>
<div style="color: #333333; font-family: merriweather, georgia, serif; line-height: 1.9rem; margin-bottom: 20px; margin-top: 0.6em; padding: 0px;">
<span style="font-size: large;">In a Hadoop cluster, <span style="font-weight: 700;">master</span> nodes typically consist of machines where one machine is designed as a NameNode, and another as a JobTracker, while all other machines in the cluster are slave nodes that act as DataNodes and TaskTrackers. When starting the cluster, you begin starting the HDFS daemons on the master node and DataNode daemons on all data nodes machines. Then, you start the MapReduce daemons: JobTracker on the master node and the TaskTracker daemons on all slave nodes. The following diagram shows the Hadoop daemon's pseudo formula:</span></div>
<div style="color: #333333; font-family: merriweather, georgia, serif; line-height: 1.9rem; margin-bottom: 20px; margin-top: 0.6em; padding: 0px; text-align: center;">
<img alt="" border="0" src="https://www.packtpub.com/sites/default/files/Article-Images/5655OS_04_09.png" style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial; border: 1px solid black; box-shadow: rgba(0, 0, 0, 0.0980392) 1px 1px 5px; display: block; height: auto; max-width: 100%; min-height: 40px; padding: 5px;" title="Optimizing Hadoop MapReduce" /></div>
<div style="color: #333333; font-family: merriweather, georgia, serif; line-height: 1.9rem; margin-bottom: 20px; margin-top: 0.6em; padding: 0px;">
<span style="font-size: large;">When configuring your cluster, you need to consider the CPU cores and memory resources that need to be allocated to these daemons. In a huge data context, it is recommended to reserve 2 CPU cores on each DataNode for the HDFS and MapReduce daemons. While in a small and medium data context, you can reserve only one CPU core on each DataNode.</span></div>
<div style="color: #333333; font-family: merriweather, georgia, serif; line-height: 1.9rem; margin-bottom: 20px; margin-top: 0.6em; padding: 0px;">
<span style="font-size: large;">Once you have determined the maximum mapper's slot numbers, you need to determine the reducer's maximum slot numbers. Based on our experience, there is a distribution between the Map and Reduce tasks on DataNodes that give good performance result to define the reducer's slot numbers the same as the mapper's slot numbers or at least equal to two-third mapper slots.</span></div>
<div style="color: #333333; font-family: merriweather, georgia, serif; line-height: 1.9rem; margin-bottom: 20px; margin-top: 0.6em; padding: 0px;">
<span style="font-size: large;">Let's learn to correctly configure the number of mappers and reducers and assume the following cluster examples:</span></div>
<table border="1" cellpadding="0" cellspacing="0" style="font-size: 1em; height: 112px; margin: 1em 0px; width:100%;"><tbody>
<tr><td style="border: 1px solid rgb(0, 0, 0); padding: 0.3em 0.5em; width: 219px;" valign="top"><div style="color: #333333; font-family: Merriweather, Georgia, serif; font-size: 1rem; line-height: 1.9rem; margin-bottom: 20px; margin-top: 0.6em; padding: 0px;">
<span style="font-size: 1.1em; font-weight: 700;">Cluster machine</span></div>
</td><td style="border: 1px solid rgb(0, 0, 0); padding: 0.3em 0.5em; width: 219px;" valign="top"><div style="color: #333333; font-family: Merriweather, Georgia, serif; font-size: 1rem; line-height: 1.9rem; margin-bottom: 20px; margin-top: 0.6em; padding: 0px;">
<span style="font-size: 1.1em; font-weight: 700;">Nb</span></div>
</td><td style="border: 1px solid rgb(0, 0, 0); padding: 0.3em 0.5em; width: 219px;" valign="top"><div style="color: #333333; font-family: Merriweather, Georgia, serif; font-size: 1rem; line-height: 1.9rem; margin-bottom: 20px; margin-top: 0.6em; padding: 0px;">
<span style="font-size: 1.1em; font-weight: 700;">Medium data size</span></div>
</td><td style="border: 1px solid rgb(0, 0, 0); padding: 0.3em 0.5em; width: 219px;" valign="top"><div style="color: #333333; font-family: Merriweather, Georgia, serif; font-size: 1rem; line-height: 1.9rem; margin-bottom: 20px; margin-top: 0.6em; padding: 0px;">
<span style="font-size: 1.1em; font-weight: 700;">Large data size</span></div>
</td></tr>
<tr><td style="border: 1px solid rgb(0, 0, 0); padding: 0.3em 0.5em; width: 219px;" valign="top"><div style="color: #333333; font-family: Merriweather, Georgia, serif; font-size: 1rem; line-height: 1.9rem; margin-bottom: 20px; margin-top: 0.6em; padding: 0px;">
DataNode CPU cores</div>
</td><td style="border: 1px solid rgb(0, 0, 0); padding: 0.3em 0.5em; width: 219px;" valign="top"><div style="color: #333333; font-family: Merriweather, Georgia, serif; font-size: 1rem; line-height: 1.9rem; margin-bottom: 20px; margin-top: 0.6em; padding: 0px;">
8</div>
</td><td style="border: 1px solid rgb(0, 0, 0); padding: 0.3em 0.5em; width: 219px;" valign="top"><div style="color: #333333; font-family: Merriweather, Georgia, serif; font-size: 1rem; line-height: 1.9rem; margin-bottom: 20px; margin-top: 0.6em; padding: 0px;">
Reserve 1 CPU core</div>
</td><td style="border: 1px solid rgb(0, 0, 0); padding: 0.3em 0.5em; width: 219px;" valign="top"><div style="color: #333333; font-family: Merriweather, Georgia, serif; font-size: 1rem; line-height: 1.9rem; margin-bottom: 20px; margin-top: 0.6em; padding: 0px;">
Reserve 2 CPU cores</div>
</td></tr>
<tr><td style="border: 1px solid rgb(0, 0, 0); padding: 0.3em 0.5em; width: 219px;" valign="top"><div style="color: #333333; font-family: Merriweather, Georgia, serif; font-size: 1rem; line-height: 1.9rem; margin-bottom: 20px; margin-top: 0.6em; padding: 0px;">
DataNode TaskTracker daemon</div>
</td><td style="border: 1px solid rgb(0, 0, 0); padding: 0.3em 0.5em; width: 219px;" valign="top"><div style="color: #333333; font-family: Merriweather, Georgia, serif; font-size: 1rem; line-height: 1.9rem; margin-bottom: 20px; margin-top: 0.6em; padding: 0px;">
1</div>
</td><td style="border: 1px solid rgb(0, 0, 0); padding: 0.3em 0.5em; width: 219px;" valign="top"><div style="color: #333333; font-family: Merriweather, Georgia, serif; font-size: 1rem; line-height: 1.9rem; margin-bottom: 20px; margin-top: 0.6em; padding: 0px;">
1</div>
</td><td style="border: 1px solid rgb(0, 0, 0); padding: 0.3em 0.5em; width: 219px;" valign="top"><div style="color: #333333; font-family: Merriweather, Georgia, serif; font-size: 1rem; line-height: 1.9rem; margin-bottom: 20px; margin-top: 0.6em; padding: 0px;">
1</div>
</td></tr>
<tr><td style="border: 1px solid rgb(0, 0, 0); padding: 0.3em 0.5em; width: 219px;" valign="top"><div style="color: #333333; font-family: Merriweather, Georgia, serif; font-size: 1rem; line-height: 1.9rem; margin-bottom: 20px; margin-top: 0.6em; padding: 0px;">
DataNode HDFS daemon</div>
</td><td style="border: 1px solid rgb(0, 0, 0); padding: 0.3em 0.5em; width: 219px;" valign="top"><div style="color: #333333; font-family: Merriweather, Georgia, serif; font-size: 1rem; line-height: 1.9rem; margin-bottom: 20px; margin-top: 0.6em; padding: 0px;">
1</div>
</td><td style="border: 1px solid rgb(0, 0, 0); padding: 0.3em 0.5em; width: 219px;" valign="top"><div style="color: #333333; font-family: Merriweather, Georgia, serif; font-size: 1rem; line-height: 1.9rem; margin-bottom: 20px; margin-top: 0.6em; padding: 0px;">
1</div>
</td><td style="border: 1px solid rgb(0, 0, 0); padding: 0.3em 0.5em; width: 219px;" valign="top"><div style="color: #333333; font-family: Merriweather, Georgia, serif; font-size: 1rem; line-height: 1.9rem; margin-bottom: 20px; margin-top: 0.6em; padding: 0px;">
1</div>
</td></tr>
<tr><td style="border: 1px solid rgb(0, 0, 0); padding: 0.3em 0.5em; width: 219px;" valign="top"><div style="color: #333333; font-family: Merriweather, Georgia, serif; font-size: 1rem; line-height: 1.9rem; margin-bottom: 20px; margin-top: 0.6em; padding: 0px;">
Data block size</div>
</td><td style="border: 1px solid rgb(0, 0, 0); padding: 0.3em 0.5em; width: 219px;" valign="top"><div style="color: #333333; font-family: Merriweather, Georgia, serif; font-size: 1rem; line-height: 1.9rem; margin-bottom: 20px; margin-top: 0.6em; padding: 0px;">
<br /></div>
</td><td style="border: 1px solid rgb(0, 0, 0); padding: 0.3em 0.5em; width: 219px;" valign="top"><div style="color: #333333; font-family: Merriweather, Georgia, serif; font-size: 1rem; line-height: 1.9rem; margin-bottom: 20px; margin-top: 0.6em; padding: 0px;">
128 MB</div>
</td><td style="border: 1px solid rgb(0, 0, 0); padding: 0.3em 0.5em; width: 219px;" valign="top"><div style="color: #333333; font-family: Merriweather, Georgia, serif; font-size: 1rem; line-height: 1.9rem; margin-bottom: 20px; margin-top: 0.6em; padding: 0px;">
256 MB</div>
</td></tr>
<tr><td style="border: 1px solid rgb(0, 0, 0); padding: 0.3em 0.5em; width: 219px;" valign="top"><div style="color: #333333; font-family: Merriweather, Georgia, serif; font-size: 1rem; line-height: 1.9rem; margin-bottom: 20px; margin-top: 0.6em; padding: 0px;">
DataNode CPU % utilization</div>
</td><td style="border: 1px solid rgb(0, 0, 0); padding: 0.3em 0.5em; width: 219px;" valign="top"><div style="color: #333333; font-family: Merriweather, Georgia, serif; font-size: 1rem; line-height: 1.9rem; margin-bottom: 20px; margin-top: 0.6em; padding: 0px;">
<br /></div>
</td><td style="border: 1px solid rgb(0, 0, 0); padding: 0.3em 0.5em; width: 219px;" valign="top"><div style="color: #333333; font-family: Merriweather, Georgia, serif; font-size: 1rem; line-height: 1.9rem; margin-bottom: 20px; margin-top: 0.6em; padding: 0px;">
95% to 120%</div>
</td><td style="border: 1px solid rgb(0, 0, 0); padding: 0.3em 0.5em; width: 219px;" valign="top"><div style="color: #333333; font-family: Merriweather, Georgia, serif; font-size: 1rem; line-height: 1.9rem; margin-bottom: 20px; margin-top: 0.6em; padding: 0px;">
95% to 150%</div>
</td></tr>
<tr><td style="border: 1px solid rgb(0, 0, 0); padding: 0.3em 0.5em; width: 219px;" valign="top"><div style="color: #333333; font-family: Merriweather, Georgia, serif; font-size: 1rem; line-height: 1.9rem; margin-bottom: 20px; margin-top: 0.6em; padding: 0px;">
Cluster nodes</div>
</td><td style="border: 1px solid rgb(0, 0, 0); padding: 0.3em 0.5em; width: 219px;" valign="top"><div style="color: #333333; font-family: Merriweather, Georgia, serif; font-size: 1rem; line-height: 1.9rem; margin-bottom: 20px; margin-top: 0.6em; padding: 0px;">
<br /></div>
</td><td style="border: 1px solid rgb(0, 0, 0); padding: 0.3em 0.5em; width: 219px;" valign="top"><div style="color: #333333; font-family: Merriweather, Georgia, serif; font-size: 1rem; line-height: 1.9rem; margin-bottom: 20px; margin-top: 0.6em; padding: 0px;">
20</div>
</td><td style="border: 1px solid rgb(0, 0, 0); padding: 0.3em 0.5em; width: 219px;" valign="top"><div style="color: #333333; font-family: Merriweather, Georgia, serif; font-size: 1rem; line-height: 1.9rem; margin-bottom: 20px; margin-top: 0.6em; padding: 0px;">
40</div>
</td></tr>
<tr><td style="border: 1px solid rgb(0, 0, 0); padding: 0.3em 0.5em; width: 219px;" valign="top"><div style="color: #333333; font-family: Merriweather, Georgia, serif; font-size: 1rem; line-height: 1.9rem; margin-bottom: 20px; margin-top: 0.6em; padding: 0px;">
Replication factor</div>
</td><td style="border: 1px solid rgb(0, 0, 0); padding: 0.3em 0.5em; width: 219px;" valign="top"><div style="color: #333333; font-family: Merriweather, Georgia, serif; font-size: 1rem; line-height: 1.9rem; margin-bottom: 20px; margin-top: 0.6em; padding: 0px;">
<br /></div>
</td><td style="border: 1px solid rgb(0, 0, 0); padding: 0.3em 0.5em; width: 219px;" valign="top"><div style="color: #333333; font-family: Merriweather, Georgia, serif; font-size: 1rem; line-height: 1.9rem; margin-bottom: 20px; margin-top: 0.6em; padding: 0px;">
2</div>
</td><td style="border: 1px solid rgb(0, 0, 0); padding: 0.3em 0.5em; width: 219px;" valign="top"><div style="color: #333333; font-family: Merriweather, Georgia, serif; font-size: 1rem; line-height: 1.9rem; margin-bottom: 20px; margin-top: 0.6em; padding: 0px;">
3</div>
</td></tr>
</tbody></table>
<div style="color: #333333; font-family: merriweather, georgia, serif; line-height: 1.9rem; margin-bottom: 20px; margin-top: 0.6em; padding: 0px;">
<span style="font-size: large;">We want to use the CPU resources at least 95 percent, and due to Hyper-Threading, one CPU core might process more than one job at a time, so we can set the Hyper-Threading factor range between 120 percent and 170 percent.</span></div>
<table border="1" cellpadding="0" cellspacing="0" style="font-size: 1em; height: 166px; margin: 1em 0px; width:100%;"><tbody>
<tr><td style="border: 1px solid rgb(0, 0, 0); padding: 0.3em 0.5em; width: 230px;" valign="top"><div style="color: #333333; font-family: Merriweather, Georgia, serif; font-size: 1rem; line-height: 1.9rem; margin-bottom: 20px; margin-top: 0.6em; padding: 0px;">
Maximum mapper's slot numbers on<br />
one node in a large data context</div>
</td><td style="border: 1px solid rgb(0, 0, 0); padding: 0.3em 0.5em; width: 230px;" valign="top"><div style="color: #333333; font-family: Merriweather, Georgia, serif; font-size: 1rem; line-height: 1.9rem; margin-bottom: 20px; margin-top: 0.6em; padding: 0px;">
<em>= number of physical cores - reserved core * (0.95 -> 1.5)</em></div>
<div style="color: #333333; font-family: Merriweather, Georgia, serif; font-size: 1rem; line-height: 1.9rem; margin-bottom: 20px; margin-top: 0.6em; padding: 0px;">
<em>Reserved core = 1 for TaskTracker + 1 for HDFS</em></div>
</td></tr>
<tr><td colspan="2" style="border: 1px solid rgb(0, 0, 0); padding: 0.3em 0.5em; width: 230px;" valign="top"><div style="color: #333333; font-family: Merriweather, Georgia, serif; font-size: 1rem; line-height: 1.9rem; margin-bottom: 20px; margin-top: 0.6em; padding: 0px;">
Let's say the CPU on the node will use up to 120% (with Hyper-Threading)</div>
<div style="color: #333333; font-family: Merriweather, Georgia, serif; font-size: 1rem; line-height: 1.9rem; margin-bottom: 20px; margin-top: 0.6em; padding: 0px;">
<em>Maximum number of mapper slots = (8 - 2) * 1.2 = 7.2 rounded down to 7</em></div>
</td></tr>
<tr><td colspan="2" style="border: 1px solid rgb(0, 0, 0); padding: 0.3em 0.5em; width: 230px;" valign="top"><div style="color: #333333; font-family: Merriweather, Georgia, serif; font-size: 1rem; line-height: 1.9rem; margin-bottom: 20px; margin-top: 0.6em; padding: 0px;">
Let's apply the 2/3 mappers / reducers technique:</div>
<div style="color: #333333; font-family: Merriweather, Georgia, serif; font-size: 1rem; line-height: 1.9rem; margin-bottom: 20px; margin-top: 0.6em; padding: 0px;">
<em>Maximum number of reducers slots = 7 * 2/3 = 5</em></div>
</td></tr>
<tr><td colspan="2" style="border: 1px solid rgb(0, 0, 0); padding: 0.3em 0.5em; width: 230px;" valign="top"><div style="color: #333333; font-family: Merriweather, Georgia, serif; font-size: 1rem; line-height: 1.9rem; margin-bottom: 20px; margin-top: 0.6em; padding: 0px;">
Let's define the number of slots for the cluster:</div>
<div style="color: #333333; font-family: Merriweather, Georgia, serif; font-size: 1rem; line-height: 1.9rem; margin-bottom: 20px; margin-top: 0.6em; padding: 0px;">
<em>Mapper's slots: = 7 * 40 = 280</em></div>
<div style="color: #333333; font-family: Merriweather, Georgia, serif; font-size: 1rem; line-height: 1.9rem; margin-bottom: 20px; margin-top: 0.6em; padding: 0px;">
<em>Reducer's slots: = 5 * 40 = 200</em></div>
</td></tr>
</tbody></table>
<div style="color: #333333; font-family: merriweather, georgia, serif; line-height: 1.9rem; margin-bottom: 20px; margin-top: 0.6em; padding: 0px;">
<span style="font-size: large;">The block size is also used to enhance performance. The default Hadoop configuration uses 64 MB blocks, while we suggest using 128 MB in your configuration for a medium data context as well and 256 MB for a very large data context. This means that a mapper task can process one data block (for example, 128 MB) by only opening one block. In the default Hadoop configuration (set to 2 by default), two mapper tasks are needed to process the same amount of data. This may be considered as a drawback because initializing one more mapper task and opening one more file takes more time.</span></div>
<h1 style="font-weight: normal; margin: 20px 0px; padding: 0px; position: relative;">
<span style="font-size: x-large;">
Summary</span></h1>
<div style="color: #333333; font-family: merriweather, georgia, serif; line-height: 1.9rem; margin-bottom: 20px; margin-top: 0.6em; padding: 0px;">
<span style="font-size: large;">In this article, we learned about sizing and configuring the Hadoop cluster for optimizing it for MapReduce.</span><br />
<span style="font-size: large;"><br /></span></div>
<h2 style="font-weight: 100; line-height: 40.04px; margin: 20px 0px; padding: 0px; position: relative;">
<span style="font-size: x-large;">
Resources for Article:</span></h2>
<hr noshade="noshade" size="1" style="background: rgb(82, 148, 193); border: none; font-size: 14px; height: 1px; margin: 0px; padding: 0px;" />
<div style="color: #333333; font-family: merriweather, georgia, serif; font-size: 1rem; line-height: 1.9rem; margin-bottom: 20px; margin-top: 0.6em; padding: 0px;">
<span style="font-size: 1.1em; font-weight: 700;"><a href="https://www.blogger.com/null" name="more" style="color: #548ed7; cursor: pointer; text-decoration: none;"></a><span style="color: black;">Further resources on this subject:</span></span></div>
<ul style="font-size: 14px; line-height: 1.4; margin: 0.5em 0px 1em; padding: 0px;">
<li style="background: url("/themes/garland/images/menu-leaf.gif") 1px 0.35em no-repeat transparent; list-style-image: none; list-style-type: none; margin: 0.15em 0px 0.15em 0.5em; padding: 0px 0px 0.2em 1.5em;"><a href="https://www.packtpub.com/tech/hadoop" style="color: #548ed7; cursor: pointer; text-decoration: none;">Hadoop Tech Page</a></li>
<li style="background: url("/themes/garland/images/menu-leaf.gif") 1px 0.35em no-repeat transparent; list-style-image: none; list-style-type: none; margin: 0.15em 0px 0.15em 0.5em; padding: 0px 0px 0.2em 1.5em;"><a href="http://www.packtpub.com/article/hadoop-and-hdinsight-in-a-heartbeat" style="color: #548ed7; cursor: pointer; text-decoration: none;" target="_blank">Hadoop and HDInsight in a Heartbeat</a></li>
<li style="background: url("/themes/garland/images/menu-leaf.gif") 1px 0.35em no-repeat transparent; list-style-image: none; list-style-type: none; margin: 0.15em 0px 0.15em 0.5em; padding: 0px 0px 0.2em 1.5em;"><a href="http://www.packtpub.com/article/securing-the-hadoop-ecosystem" style="color: #548ed7; cursor: pointer; text-decoration: none;" target="_blank">Securing the Hadoop Ecosystem</a></li>
<li style="background: url("/themes/garland/images/menu-leaf.gif") 1px 0.35em no-repeat transparent; list-style-image: none; list-style-type: none; margin: 0.15em 0px 0.15em 0.5em; padding: 0px 0px 0.2em 1.5em;"><a href="http://www.packtpub.com/article/advanced-hadoop-mapreduce-administration" style="color: #548ed7; cursor: pointer; text-decoration: none;" target="_blank">Advanced Hadoop MapReduce Administration</a></li>
</ul>
<hr noshade="noshade" size="1" style="background: rgb(82, 148, 193); border: none; font-size: 14px; height: 1px; margin: 0px; padding: 0px;" />
</div>
<div class="article-featured-block" style="background-color: white; border-top: 2px solid rgb(255, 137, 1); box-sizing: border-box; color: #494949; font-family: Ubuntu, sans-serif; font-size: 14px; overflow: hidden; padding: 45px 0px;">
<span style="color: #333333; font-family: "merriweather" , "georgia" , serif; font-size: 16px;">This article, written by</span><span style="color: #333333; font-family: "merriweather" , "georgia" , serif; font-size: 16px;"> </span><span style="color: #333333; font-family: "merriweather" , "georgia" , serif; font-size: 1.1em; font-weight: 700;">Khaled Tannir</span><span style="color: #333333; font-family: "merriweather" , "georgia" , serif; font-size: 16px;">, the author of</span><span style="color: #333333; font-family: "merriweather" , "georgia" , serif; font-size: 16px;"> </span><a href="http://www.packtpub.com/learn-to-implement-and-use-hadoop-mapreduce-framework/book/?hadoopmapreduce-abr1/0214?utm_source=ps_hadoopmapreduce_abr1_0214&utm_medium=content&utm_campaign=paurnima" style="color: #548ed7; cursor: pointer; font-family: Merriweather, Georgia, serif; font-size: 16px; text-decoration: none;" target="_blank">Optimizing Hadoop for MapReduce</a><span style="color: #333333; font-family: "merriweather" , "georgia" , serif; font-size: 16px;">, discusses two of the most important aspects to consider while optimizing</span><span style="color: #333333; font-family: "merriweather" , "georgia" , serif; font-size: 16px;"> </span><a href="https://www.packtpub.com/tech/hadoop" style="color: #548ed7; cursor: pointer; font-family: Merriweather, Georgia, serif; font-size: 16px; text-decoration: none;">Hadoop</a><span style="color: #333333; font-family: "merriweather" , "georgia" , serif; font-size: 16px;"> for MapReduce: sizing and configuring the Hadoop cluster correctly.</span><br />
<div class="blog-featured-book left" style="float: left; margin-right: 30px;">
<a href="https://www.packtpub.com/web-development/optimizing-hadoop-mapreduce" style="color: #548ed7; cursor: pointer; text-decoration: none;"></a></div>
</div>
<div class="article-featured-block" style="background-color: white; border-top: 2px solid rgb(255, 137, 1); box-sizing: border-box; color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13.2px; overflow: hidden; padding: 45px 0px;">
<span style="color: #333333; font-family: "merriweather" , "georgia" , serif; font-size: 16px;">Source: </span><span style="color: #333333; font-family: "merriweather" , "georgia" , serif;"><a href="https://www.packtpub.com/books/content/sizing-and-configuring-your-hadoop-cluster" style="color: #e30d0d; text-decoration: none;">https://www.packtpub.com/books/content/sizing-and-configuring-your-hadoop-cluster</a></span></div>
</div>
Anonymoushttp://www.blogger.com/profile/01685371156177399870noreply@blogger.com2tag:blogger.com,1999:blog-2182833570422175384.post-47910094516361180062016-11-26T15:59:00.001-08:002016-11-26T15:59:14.693-08:00SCALA BASICS Practice on 26 Nov 2016<div dir="ltr" style="text-align: left;" trbidi="on">
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgqVk25ZjBJA3s4q4IVsdTlihwbv86-Jtbwhdbnr7SMUEhTFXSqFkI47UwLVdNGzda43GfYmw9vIhdD4ZyQAtGmSkGKeg-bgL3Q1IX1kreKTDiA1J9ccQn5PvBS1eamktPgk2nKfA4u-Lri/s1600/Scala+Coding.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="480" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgqVk25ZjBJA3s4q4IVsdTlihwbv86-Jtbwhdbnr7SMUEhTFXSqFkI47UwLVdNGzda43GfYmw9vIhdD4ZyQAtGmSkGKeg-bgL3Q1IX1kreKTDiA1J9ccQn5PvBS1eamktPgk2nKfA4u-Lri/s640/Scala+Coding.png" width="640" /></a></div>
<span style="font-size: large;"><br /></span>
<br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">Scala(Scalable Language)</span><br />
<span style="font-size: large;">---------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">Scala is Functional + Object Oriented Programing Language</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">Scala everything (function / method / class / object / variable ) is Object only</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">Java (Primitive data types / Wrapper Classes / Objects / Interfaces )</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">---------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">Java:</span><br />
<span style="font-size: large;">-----------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">int a = 1;</span><br />
<span style="font-size: large;">String name = "kalyan";</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">Mutable / Immutable data types</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">String is immutable</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">StringBuffer, StringBuilder are mutable</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><data type> <varaible name> = <data></span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">---------------------------------------------</span><br />
<span style="font-size: large;">Scala:</span><br />
<span style="font-size: large;">-----------------</span><br />
<span style="font-size: large;">val => value is immutable (we can't change the data)</span><br />
<span style="font-size: large;">var => variable is mutable (we can change the data)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">val <variable name > [ : <data type> ] = <data></span><br />
<span style="font-size: large;">var <variable name > [ : <data type> ] = <data></span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">---------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">Scala support command line feature (REPL)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">R => Read</span><br />
<span style="font-size: large;">E => Evaluate</span><br />
<span style="font-size: large;">P => Print</span><br />
<span style="font-size: large;">L => Loop</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">To start the scala => 'scala' is the command</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">orienit@kalyan:~$ scala</span><br />
<span style="font-size: large;">Welcome to Scala 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_66-internal).</span><br />
<span style="font-size: large;">Type in expressions for evaluation. Or try :help.</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> </span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">---------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">Scala provides `Type Infer`</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">---------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val name = "kalyan"</span><br />
<span style="font-size: large;">name: String = kalyan</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val name : String = "kalyan"</span><br />
<span style="font-size: large;">name: String = kalyan</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> name = "xyz"</span><br />
<span style="font-size: large;"><console>:12: error: reassignment to val</span><br />
<span style="font-size: large;"> name = "xyz"</span><br />
<span style="font-size: large;"> ^</span><br />
<span style="font-size: large;">---------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> var name = "kalyan"</span><br />
<span style="font-size: large;">name: String = kalyan</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> var name : String = "kalyan"</span><br />
<span style="font-size: large;">name: String = kalyan</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> name = "xyz"</span><br />
<span style="font-size: large;">name: String = xyz</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">---------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> var id = 1</span><br />
<span style="font-size: large;">id: Int = 1</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> var id : Int = 1</span><br />
<span style="font-size: large;">id: Int = 1</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> var id = 1l</span><br />
<span style="font-size: large;">id: Long = 1</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> var id : Long = 1</span><br />
<span style="font-size: large;">id: Long = 1</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> var id = 1d</span><br />
<span style="font-size: large;">id: Double = 1.0</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> var id : Double = 1</span><br />
<span style="font-size: large;">id: Double = 1.0</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> var id = 1f</span><br />
<span style="font-size: large;">id: Float = 1.0</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> var id : Float = 1</span><br />
<span style="font-size: large;">id: Float = 1.0</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> var id : Char = 1</span><br />
<span style="font-size: large;">id: Char = ?</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> var id : Char = 2</span><br />
<span style="font-size: large;">id: Char = ?</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> var id : Char = 3</span><br />
<span style="font-size: large;">id: Char = ?</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> var id : Char = 311</span><br />
<span style="font-size: large;">id: Char = ķ</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> var id : Char = 97</span><br />
<span style="font-size: large;">id: Char = a</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> var id : String = 97</span><br />
<span style="font-size: large;"><console>:11: error: type mismatch;</span><br />
<span style="font-size: large;"> found : Int(97)</span><br />
<span style="font-size: large;"> required: String</span><br />
<span style="font-size: large;"> var id : String = 97</span><br />
<span style="font-size: large;"> ^</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> var id : String = "97"</span><br />
<span style="font-size: large;">id: String = 97</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> var id : String = 97.toString</span><br />
<span style="font-size: large;">id: String = 97</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">---------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> var id = 1</span><br />
<span style="font-size: large;">id: Int = 1</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> 1</span><br />
<span style="font-size: large;">res0: Int = 1</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> 1</span><br />
<span style="font-size: large;">res1: Int = 1</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> 1 + 2</span><br />
<span style="font-size: large;">res2: Int = 3</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val sum = 1 + 2</span><br />
<span style="font-size: large;">sum: Int = 3</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">---------------------------------------------</span><br />
<span style="font-size: large;">Java don't have operator overloading</span><br />
<span style="font-size: large;">C++ / Scala have operator overloading</span><br />
<span style="font-size: large;">---------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> var a = 1</span><br />
<span style="font-size: large;">a: Int = 1</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> var b = 2</span><br />
<span style="font-size: large;">b: Int = 2</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> var c = a + b</span><br />
<span style="font-size: large;">c: Int = 3</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> var c = a.+(b)</span><br />
<span style="font-size: large;">c: Int = 3</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> var c = a - b</span><br />
<span style="font-size: large;">c: Int = -1</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> var c = a.-(b)</span><br />
<span style="font-size: large;">c: Int = -1</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">explanation:</span><br />
<span style="font-size: large;">---------------</span><br />
<span style="font-size: large;">a + b <==> a.+(b)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">---------------------------------------------</span><br />
<span style="font-size: large;">; importance:</span><br />
<span style="font-size: large;">------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> var a = 1 var b = 1</span><br />
<span style="font-size: large;"><console>:1: error: ';' expected but 'var' found.</span><br />
<span style="font-size: large;">var a = 1 var b = 1</span><br />
<span style="font-size: large;"> ^</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> var a = 1 ; var b = 1</span><br />
<span style="font-size: large;">a: Int = 1</span><br />
<span style="font-size: large;">b: Int = 1</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">---------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> var id = 1</span><br />
<span style="font-size: large;">id: Int = 1</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> var x = id.toLong</span><br />
<span style="font-size: large;">x: Long = 1</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> var x = id.toDouble</span><br />
<span style="font-size: large;">x: Double = 1.0</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> var x = id.toFloat</span><br />
<span style="font-size: large;">x: Float = 1.0</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> var x = id.toString</span><br />
<span style="font-size: large;">x: String = 1</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> var x = id.toByte</span><br />
<span style="font-size: large;">x: Byte = 1</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> var x = id.toChar</span><br />
<span style="font-size: large;">x: Char = ?</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">---------------------------------------------</span><br />
<span style="font-size: large;">isInstanceOf and asInstanceOf :</span><br />
<span style="font-size: large;">----------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> var id = 1</span><br />
<span style="font-size: large;">id: Int = 1</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> id.isInstanceOf[Int]</span><br />
<span style="font-size: large;">res3: Boolean = true</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> id.isInstanceOf[Long]</span><br />
<span style="font-size: large;">res4: Boolean = false</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> id.asInstanceOf[Long]</span><br />
<span style="font-size: large;">res5: Long = 1</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> var id : Long = 100</span><br />
<span style="font-size: large;">id: Long = 100</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> id.asInstanceOf[Long]</span><br />
<span style="font-size: large;">res6: Long = 100</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> id.asInstanceOf[Int]</span><br />
<span style="font-size: large;">res7: Int = 100</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> id.asInstanceOf[String]</span><br />
<span style="font-size: large;">java.lang.ClassCastException: java.lang.Long cannot be cast to java.lang.String</span><br />
<span style="font-size: large;"> ... 32 elided</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">---------------------------------------------</span><br />
<span style="font-size: large;">IF / IF-ELSE / IF-ELSEIF</span><br />
<span style="font-size: large;">---------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> if(true) { println("kalyan") }</span><br />
<span style="font-size: large;">kalyan</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> if(false) { println("kalyan") }</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> var id = 5</span><br />
<span style="font-size: large;">id: Int = 5</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> if(id == 5) { println("kalyan") }</span><br />
<span style="font-size: large;">kalyan</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> if(id != 5) { println("kalyan") }</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> if(id > 4) { println("kalyan") }</span><br />
<span style="font-size: large;">kalyan</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> if(id > 4) { println("kalyan") } else { println("xyz")}</span><br />
<span style="font-size: large;">kalyan</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> if(id > 6) { println("kalyan") } else { println("xyz")}</span><br />
<span style="font-size: large;">xyz</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">---------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> var a = 5</span><br />
<span style="font-size: large;">a: Int = 5</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> var b = "hi"</span><br />
<span style="font-size: large;">b: String = hi</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> var c = true</span><br />
<span style="font-size: large;">c: Boolean = true</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">if(a > 5) {</span><br />
<span style="font-size: large;"> println(a)</span><br />
<span style="font-size: large;">} else if (b.contains("a")) {</span><br />
<span style="font-size: large;"> println(b)</span><br />
<span style="font-size: large;">} else if (!c) {</span><br />
<span style="font-size: large;"> println(c)</span><br />
<span style="font-size: large;">} else {</span><br />
<span style="font-size: large;">println("Not found")</span><br />
<span style="font-size: large;">}</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> if(a > 5) {</span><br />
<span style="font-size: large;"> | println(a)</span><br />
<span style="font-size: large;"> | } else if (b.contains("a")) {</span><br />
<span style="font-size: large;"> | println(b)</span><br />
<span style="font-size: large;"> | } else if (!c) {</span><br />
<span style="font-size: large;"> | println(c)</span><br />
<span style="font-size: large;"> | } else {</span><br />
<span style="font-size: large;"> | println("Not found")</span><br />
<span style="font-size: large;"> | }</span><br />
<span style="font-size: large;">Not found</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">---------------------------------------------</span><br />
<span style="font-size: large;">Arrays in Scala:</span><br />
<span style="font-size: large;">------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> var nums = Array(1,2,3,4,5)</span><br />
<span style="font-size: large;">nums: Array[Int] = Array(1, 2, 3, 4, 5)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> var nums = Array[Int](1,2,3,4,5)</span><br />
<span style="font-size: large;">nums: Array[Int] = Array(1, 2, 3, 4, 5)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> var nums : Array[Int] = Array[Int](1,2,3,4,5)</span><br />
<span style="font-size: large;">nums: Array[Int] = Array(1, 2, 3, 4, 5)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> var nums = Array(5)</span><br />
<span style="font-size: large;">nums: Array[Int] = Array(5)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> var nums = new Array(5)</span><br />
<span style="font-size: large;">nums: Array[Nothing] = Array(null, null, null, null, null)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> var nums = new Array[Int](5)</span><br />
<span style="font-size: large;">nums: Array[Int] = Array(0, 0, 0, 0, 0)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">---------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> var nums = Array(1,2,3,4,5)</span><br />
<span style="font-size: large;">nums: Array[Int] = Array(1, 2, 3, 4, 5)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> nums[0]</span><br />
<span style="font-size: large;"><console>:1: error: identifier expected but integer literal found.</span><br />
<span style="font-size: large;">nums[0]</span><br />
<span style="font-size: large;"> ^</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> nums(0)</span><br />
<span style="font-size: large;">res17: Int = 1</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> nums(1)</span><br />
<span style="font-size: large;">res18: Int = 2</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> nums(1) = 11</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> nums</span><br />
<span style="font-size: large;">res20: Array[Int] = Array(1, 11, 3, 4, 5)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">---------------------------------------------</span><br />
<span style="font-size: large;">collections in scala:</span><br />
<span style="font-size: large;">-------------------------</span><br />
<span style="font-size: large;">1. mutable collection => scala.collection.mutable</span><br />
<span style="font-size: large;">2. immutable collection => scala.collection.immutable</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">---------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> 1 to 10</span><br />
<span style="font-size: large;">res22: scala.collection.immutable.Range.Inclusive = Range(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> 1 to 10 by 2</span><br />
<span style="font-size: large;">res23: scala.collection.immutable.Range = Range(1, 3, 5, 7, 9)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> 1 to 10 by 3</span><br />
<span style="font-size: large;">res24: scala.collection.immutable.Range = Range(1, 4, 7, 10)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> 1 to 10 by 4</span><br />
<span style="font-size: large;">res25: scala.collection.immutable.Range = Range(1, 5, 9)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> 1 to 10</span><br />
<span style="font-size: large;">res26: scala.collection.immutable.Range.Inclusive = Range(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> 1 until 10</span><br />
<span style="font-size: large;">res27: scala.collection.immutable.Range = Range(1, 2, 3, 4, 5, 6, 7, 8, 9)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">---------------------------------------------</span><br />
<span style="font-size: large;">For loop in scala:</span><br />
<span style="font-size: large;">------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> for(1 to 10) println(x) </span><br />
<span style="font-size: large;"><console>:1: error: '<-' expected but ')' found.</span><br />
<span style="font-size: large;">for(1 to 10) println(x)</span><br />
<span style="font-size: large;"> ^</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> for(x <- 1 to 10) println(x) </span><br />
<span style="font-size: large;">1</span><br />
<span style="font-size: large;">2</span><br />
<span style="font-size: large;">3</span><br />
<span style="font-size: large;">4</span><br />
<span style="font-size: large;">5</span><br />
<span style="font-size: large;">6</span><br />
<span style="font-size: large;">7</span><br />
<span style="font-size: large;">8</span><br />
<span style="font-size: large;">9</span><br />
<span style="font-size: large;">10</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> for(x <- 1 to 10) if(x % 2 == 1) println(x) </span><br />
<span style="font-size: large;">1</span><br />
<span style="font-size: large;">3</span><br />
<span style="font-size: large;">5</span><br />
<span style="font-size: large;">7</span><br />
<span style="font-size: large;">9</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> for(x <- 1 to 10 if x % 2 == 1) println(x) </span><br />
<span style="font-size: large;">1</span><br />
<span style="font-size: large;">3</span><br />
<span style="font-size: large;">5</span><br />
<span style="font-size: large;">7</span><br />
<span style="font-size: large;">9</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> for(x <- 1 to 10 if x % 2 == 1 and x > 5) println(x) </span><br />
<span style="font-size: large;"><console>:14: error: value and is not a member of Boolean</span><br />
<span style="font-size: large;"> for(x <- 1 to 10 if x % 2 == 1 and x > 5) println(x)</span><br />
<span style="font-size: large;"> ^</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> for(x <- 1 to 10 if x % 2 == 1 && x > 5) println(x) </span><br />
<span style="font-size: large;">7</span><br />
<span style="font-size: large;">9</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">---------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">collections in scala:</span><br />
<span style="font-size: large;">-------------------------</span><br />
<span style="font-size: large;">1. mutable collection => scala.collection.mutable</span><br />
<span style="font-size: large;">2. immutable collection => scala.collection.immutable</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">var arr = Array[Int](1,2,3,4,5)</span><br />
<span style="font-size: large;">var list = List[Int](1,2,3,4,5)</span><br />
<span style="font-size: large;">var set = Set[Int](1,2,3,4,5)</span><br />
<span style="font-size: large;">var seq = Seq[Int](1,2,3,4,5)</span><br />
<span style="font-size: large;">var vec = Vector[Int](1,2,3,4,5)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">var queue = Queue[Int](1,2,3,4,5)</span><br />
<span style="font-size: large;">var stack = Stack[Int](1,2,3,4,5)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">---------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> var arr = Array[Int](1,2,3,4,5)</span><br />
<span style="font-size: large;">arr: Array[Int] = Array(1, 2, 3, 4, 5)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> var list = List[Int](1,2,3,4,5)</span><br />
<span style="font-size: large;">list: List[Int] = List(1, 2, 3, 4, 5)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> var set = Set[Int](1,2,3,4,5)</span><br />
<span style="font-size: large;">set: scala.collection.immutable.Set[Int] = Set(5, 1, 2, 3, 4)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> var seq = Seq[Int](1,2,3,4,5)</span><br />
<span style="font-size: large;">seq: Seq[Int] = List(1, 2, 3, 4, 5)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> var vec = Vector[Int](1,2,3,4,5)</span><br />
<span style="font-size: large;">vec: scala.collection.immutable.Vector[Int] = Vector(1, 2, 3, 4, 5)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> var queue = Queue[Int](1,2,3,4,5)</span><br />
<span style="font-size: large;"><console>:11: error: not found: value Queue</span><br />
<span style="font-size: large;"> var queue = Queue[Int](1,2,3,4,5)</span><br />
<span style="font-size: large;"> ^</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> var queue = scala.collection.immutable.Queue[Int](1,2,3,4,5)</span><br />
<span style="font-size: large;">queue: scala.collection.immutable.Queue[Int] = Queue(1, 2, 3, 4, 5)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> var queue = scala.collection.mutable.Queue[Int](1,2,3,4,5)</span><br />
<span style="font-size: large;">queue: scala.collection.mutable.Queue[Int] = Queue(1, 2, 3, 4, 5)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">---------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> var list = List[Int](4,5,6)</span><br />
<span style="font-size: large;">list: List[Int] = List(4, 5, 6)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> var list = (4 :: 5 :: 6 :: Nil)</span><br />
<span style="font-size: large;">list: List[Int] = List(4, 5, 6)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> var list = (4 :: (5 :: (6 :: Nil)))</span><br />
<span style="font-size: large;">list: List[Int] = List(4, 5, 6)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> var list = List[Int](4,5,6)</span><br />
<span style="font-size: large;">list: List[Int] = List(4, 5, 6)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> list = 3 :: list</span><br />
<span style="font-size: large;">list: List[Int] = List(3, 4, 5, 6)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> list = 3 :: list :: 7</span><br />
<span style="font-size: large;"><console>:12: error: value :: is not a member of Int</span><br />
<span style="font-size: large;"> list = 3 :: list :: 7</span><br />
<span style="font-size: large;"> ^</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">---------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> var list = List[Int](4,5,6)</span><br />
<span style="font-size: large;">list: List[Int] = List(4, 5, 6)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> list = 3 +: list</span><br />
<span style="font-size: large;">list: List[Int] = List(3, 4, 5, 6)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> list = list :+ 7</span><br />
<span style="font-size: large;">list: List[Int] = List(3, 4, 5, 6, 7)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> list = 2 +: list :+ 8</span><br />
<span style="font-size: large;">list: List[Int] = List(2, 3, 4, 5, 6, 7, 8)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">---------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> var list1 = List[Int](4,5,6)</span><br />
<span style="font-size: large;">list1: List[Int] = List(4, 5, 6)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> var list2 = List[Int](7,8,9)</span><br />
<span style="font-size: large;">list2: List[Int] = List(7, 8, 9)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> var list3 = list1 :: list2</span><br />
<span style="font-size: large;">list3: List[Any] = List(List(4, 5, 6), 7, 8, 9)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> var list3 = list1 ::: list2</span><br />
<span style="font-size: large;">list3: List[Int] = List(4, 5, 6, 7, 8, 9)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> var list = List(1, 1.0, true, "hi")</span><br />
<span style="font-size: large;">list: List[Any] = List(1, 1.0, true, hi)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">---------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">Functions in Scala:</span><br />
<span style="font-size: large;">-------------------------</span><br />
<span style="font-size: large;">1. anonymous functions</span><br />
<span style="font-size: large;">2. named functions</span><br />
<span style="font-size: large;">3. curried functions</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">Anonymous functions:</span><br />
<span style="font-size: large;">------------------------</span><br />
<span style="font-size: large;">(x : Int) => { x + 1 }</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> (x : Int) => { x + 1 }</span><br />
<span style="font-size: large;">res33: Int => Int = <function1></span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val add = (x : Int) => { x + 1 }</span><br />
<span style="font-size: large;">add: Int => Int = <function1></span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> add(10)</span><br />
<span style="font-size: large;">res34: Int = 11</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">Named functions:</span><br />
<span style="font-size: large;">--------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">def add(x : Int) = { x + 1 }</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> def add(x : Int) = { x + 1 }</span><br />
<span style="font-size: large;">add: (x: Int)Int</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> def add(x : Int) : Int = { x + 1 }</span><br />
<span style="font-size: large;">add: (x: Int)Int</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> def add(x : Int) : Long = { x + 1 }</span><br />
<span style="font-size: large;">add: (x: Int)Long</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> add(10)</span><br />
<span style="font-size: large;">res35: Long = 11</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">---------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> def add(x : Int) = { x + 1 }</span><br />
<span style="font-size: large;">add: (x: Int)Int</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> add(10)</span><br />
<span style="font-size: large;">res36: Int = 11</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> def add(x : Int) = { var y = x + 1 ; println(y)}</span><br />
<span style="font-size: large;">add: (x: Int)Unit</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> add(10)</span><br />
<span style="font-size: large;">11</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> def add(x : Int) = { var y = x + 1 ; y}</span><br />
<span style="font-size: large;">add: (x: Int)Int</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> add(10)</span><br />
<span style="font-size: large;">res38: Int = 11</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">---------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> def add(x : Int) = { var y = x + 1 ; if(y > 1) y else x}</span><br />
<span style="font-size: large;">add: (x: Int)Int</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> def add(x : Int) = { var y = x + 1 ; if(y > 1) y else "not found"}</span><br />
<span style="font-size: large;">add: (x: Int)Any</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> def add(x : Int) = { var y = x + 1 ; if(y > 1) y.toString else "not found"}</span><br />
<span style="font-size: large;">add: (x: Int)String</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">---------------------------------------------</span><br />
<span style="font-size: large;">Recurrsive Functions in Scala:</span><br />
<span style="font-size: large;">------------------------------------</span><br />
<span style="font-size: large;">def factorial(n : Int) : Int = {</span><br />
<span style="font-size: large;"> var res = 1</span><br />
<span style="font-size: large;"> for(x <- 1 to n) res = res * x</span><br />
<span style="font-size: large;"> res</span><br />
<span style="font-size: large;">}</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> def factorial(n : Int) : Int = {</span><br />
<span style="font-size: large;"> | var res = 1</span><br />
<span style="font-size: large;"> | for(x <- 1 to n) res = res * x</span><br />
<span style="font-size: large;"> | res</span><br />
<span style="font-size: large;"> | }</span><br />
<span style="font-size: large;">factorial: (n: Int)Int</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> factorial(5)</span><br />
<span style="font-size: large;">res39: Int = 120</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> factorial(6)</span><br />
<span style="font-size: large;">res40: Int = 720</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">def factorial(n : Int) : Int = {</span><br />
<span style="font-size: large;"> if(n == 1) 1</span><br />
<span style="font-size: large;"> else n * factorial(n - 1)</span><br />
<span style="font-size: large;">}</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> def factorial(n : Int) : Int = {</span><br />
<span style="font-size: large;"> | if(n == 1) 1</span><br />
<span style="font-size: large;"> | else n * factorial(n - 1)</span><br />
<span style="font-size: large;"> | }</span><br />
<span style="font-size: large;">factorial: (n: Int)Int</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> factorial(5)</span><br />
<span style="font-size: large;">res41: Int = 120</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> factorial(6)</span><br />
<span style="font-size: large;">res42: Int = 720</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">---------------------------------------------</span><br />
<span style="font-size: large;">def factorial(n : Int) : Int = {</span><br />
<span style="font-size: large;"> println(n)</span><br />
<span style="font-size: large;"> if(n == 1) 1</span><br />
<span style="font-size: large;"> else n * factorial(n - 1)</span><br />
<span style="font-size: large;">}</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> def factorial(n : Int) : Int = {</span><br />
<span style="font-size: large;"> | println(n)</span><br />
<span style="font-size: large;"> | if(n == 1) 1</span><br />
<span style="font-size: large;"> | else n * factorial(n - 1)</span><br />
<span style="font-size: large;"> | }</span><br />
<span style="font-size: large;">factorial: (n: Int)Int</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> factorial(4)</span><br />
<span style="font-size: large;">4</span><br />
<span style="font-size: large;">3</span><br />
<span style="font-size: large;">2</span><br />
<span style="font-size: large;">1</span><br />
<span style="font-size: large;">res43: Int = 24</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">---------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">def factorial(n : Int) : Int = {</span><br />
<span style="font-size: large;"> def fact(start: Int, end: Int) : Int = {</span><br />
<span style="font-size: large;"> println(start)</span><br />
<span style="font-size: large;"> if(start == end) end</span><br />
<span style="font-size: large;"> else start * fact(start + 1, end)</span><br />
<span style="font-size: large;"> }</span><br />
<span style="font-size: large;"> fact(1,n)</span><br />
<span style="font-size: large;">}</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> def factorial(n : Int) : Int = {</span><br />
<span style="font-size: large;"> | def fact(start: Int, end: Int) : Int = {</span><br />
<span style="font-size: large;"> | println(start)</span><br />
<span style="font-size: large;"> | if(start == end) end</span><br />
<span style="font-size: large;"> | else start * fact(start + 1, end)</span><br />
<span style="font-size: large;"> | }</span><br />
<span style="font-size: large;"> | fact(1,n)</span><br />
<span style="font-size: large;"> | }</span><br />
<span style="font-size: large;">factorial: (n: Int)Int</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> factorial(4)</span><br />
<span style="font-size: large;">1</span><br />
<span style="font-size: large;">2</span><br />
<span style="font-size: large;">3</span><br />
<span style="font-size: large;">4</span><br />
<span style="font-size: large;">res46: Int = 24</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">Note: Tail-Recursion use @tailrec anotation</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">---------------------------------------------</span><br />
<span style="font-size: large;">Curried Functions:</span><br />
<span style="font-size: large;">--------------------</span><br />
<span style="font-size: large;">def add(a : Int, b : Int) : Int = { a + b }</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">def add(a : Int)(b : Int) : Int = { a + b }</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">---------------------------------------------</span><br />
<span style="font-size: large;">scala> def add(a : Int, b : Int) : Int = { a + b }</span><br />
<span style="font-size: large;">add: (a: Int, b: Int)Int</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> add(10,20)</span><br />
<span style="font-size: large;">res47: Int = 30</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> def add(a : Int)(b : Int) : Int = { a + b }</span><br />
<span style="font-size: large;">add: (a: Int)(b: Int)Int</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> add(10)(20)</span><br />
<span style="font-size: large;">res48: Int = 30</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">---------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">def add(a : Int)(b : Int) : Int = { a + b }</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">def addOne(x : Int) : Int = add(x)(1)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">def addTwo(x : Int) : Int = add(x)(2)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">def addAny(x : Int) : Int = add(10)_</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">---------------------------------------------</span><br />
<span style="font-size: large;">scala> def addOne(x : Int) : Int = add(x)(1)</span><br />
<span style="font-size: large;">addOne: (x: Int)Int</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> def addTwo(x : Int) : Int = add(x)(2)</span><br />
<span style="font-size: large;">addTwo: (x: Int)Int</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">---------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">def add(a : Int, b : Int) : Int = { a + b }</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">def add(a : Int, b : Int = 5) : Int = { a + b }</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">def add(a : Int = 1, b : Int = 5) : Int = { a + b }</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">---------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> def add(a : Int = 1, b : Int = 5) : Int = { a + b }</span><br />
<span style="font-size: large;">add: (a: Int, b: Int)Int</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> add()</span><br />
<span style="font-size: large;">res51: Int = 6</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> add(2)</span><br />
<span style="font-size: large;">res52: Int = 7</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> add(2,3)</span><br />
<span style="font-size: large;">res53: Int = 5</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> add(b = 2)</span><br />
<span style="font-size: large;">res54: Int = 3</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> add(b = 2, a = 1)</span><br />
<span style="font-size: large;">res55: Int = 3</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">---------------------------------------------</span><br />
<span style="font-size: large;">Java<span class="Apple-tab-span" style="white-space: pre;"> </span>=> <span class="Apple-tab-span" style="white-space: pre;"> </span>Scala</span><br />
<span style="font-size: large;">---------------------------------------</span><br />
<span style="font-size: large;">Interface<span class="Apple-tab-span" style="white-space: pre;"> </span>=><span class="Apple-tab-span" style="white-space: pre;"> </span>Trait</span><br />
<span style="font-size: large;">Abstract Class =><span class="Apple-tab-span" style="white-space: pre;"> </span>Abstract Class</span><br />
<span style="font-size: large;">Class<span class="Apple-tab-span" style="white-space: pre;"> </span>=> <span class="Apple-tab-span" style="white-space: pre;"> </span>Class & Case Class</span><br />
<span style="font-size: large;"><span class="Apple-tab-span" style="white-space: pre;"> </span>=><span class="Apple-tab-span" style="white-space: pre;"> </span>object</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">---------------------------------------------</span><br />
<span style="font-size: large;">String Interpolation:</span><br />
<span style="font-size: large;">----------------------------</span><br />
<span style="font-size: large;">var a = 1</span><br />
<span style="font-size: large;">var b = 10.5</span><br />
<span style="font-size: large;">var c = "kalyan"</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">var exp1 = "a = $a , b = $b , c = $c"</span><br />
<span style="font-size: large;">println(exp1)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">var exp2 = s"a = $a , b = $b , c = $c"</span><br />
<span style="font-size: large;">println(exp2)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">var exp3 = s"a = $a , b = $b%.2f , c = $c"</span><br />
<span style="font-size: large;">println(exp3)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">var exp4 = f"a = $a , b = $b%.2f , c = $c"</span><br />
<span style="font-size: large;">println(exp4)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">var exp5 = s"a = $a\nb = $b\nc = $c"</span><br />
<span style="font-size: large;">println(exp5)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">var exp6 = raw"a = $a\nb = $b\nc = $c"</span><br />
<span style="font-size: large;">println(exp6)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">---------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> var a = 1</span><br />
<span style="font-size: large;">a: Int = 1</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> var b = 10.5</span><br />
<span style="font-size: large;">b: Double = 10.5</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> var c = "kalyan"</span><br />
<span style="font-size: large;">c: String = kalyan</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> </span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> var exp1 = "a = $a , b = $b , c = $c"</span><br />
<span style="font-size: large;">exp1: String = a = $a , b = $b , c = $c</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> println(exp1)</span><br />
<span style="font-size: large;">a = $a , b = $b , c = $c</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> var exp2 = s"a = $a , b = $b , c = $c"</span><br />
<span style="font-size: large;">exp2: String = a = 1 , b = 10.5 , c = kalyan</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> println(exp2)</span><br />
<span style="font-size: large;">a = 1 , b = 10.5 , c = kalyan</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> var exp3 = s"a = $a , b = $b%.2f , c = $c"</span><br />
<span style="font-size: large;">exp3: String = a = 1 , b = 10.5%.2f , c = kalyan</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> println(exp3)</span><br />
<span style="font-size: large;">a = 1 , b = 10.5%.2f , c = kalyan</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> var exp4 = f"a = $a , b = $b%.2f , c = $c"</span><br />
<span style="font-size: large;">exp4: String = a = 1 , b = 10.50 , c = kalyan</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> println(exp4)</span><br />
<span style="font-size: large;">a = 1 , b = 10.50 , c = kalyan</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> var exp5 = s"a = $a\nb = $b\nc = $c"</span><br />
<span style="font-size: large;">exp5: String =</span><br />
<span style="font-size: large;">a = 1</span><br />
<span style="font-size: large;">b = 10.5</span><br />
<span style="font-size: large;">c = kalyan</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> println(exp5)</span><br />
<span style="font-size: large;">a = 1</span><br />
<span style="font-size: large;">b = 10.5</span><br />
<span style="font-size: large;">c = kalyan</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> var exp6 = raw"a = $a\nb = $b\nc = $c"</span><br />
<span style="font-size: large;">exp6: String = a = 1\nb = 10.5\nc = kalyan</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> println(exp6)</span><br />
<span style="font-size: large;">a = 1\nb = 10.5\nc = kalyan</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">---------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<br /></div>
Anonymoushttp://www.blogger.com/profile/01685371156177399870noreply@blogger.com1tag:blogger.com,1999:blog-2182833570422175384.post-75572910562373951712016-11-17T16:08:00.000-08:002016-11-17T16:08:30.947-08:00Zeppelin Visualization: Kalyan Spark Streaming + Phoenix + Flume + Play Project<div dir="ltr" style="text-align: left;" trbidi="on">
<div>
<span style="font-size: large;">============================================================</span><br />
<b><span style="font-size: large;"> Kalyan Spark + Phoenix + Flume + Play Project : Zeppelin Visualization</span></b><br />
<span style="font-size: large;">============================================================</span><br />
<span style="font-size: large;"><br /></span><span style="font-size: large;"><br /></span><span style="font-size: large;">1. Open new Terminal</span><br />
<span style="font-size: large;"><br /></span><span style="font-size: large;">2. Run the Zeppelin Project using below command</span><br />
<span style="font-size: large;"><br /></span><span style="font-size: large;">zeppelin-daemon.sh start</span><br />
<span style="font-size: large;"><br /></span><span style="font-size: large;">3. Open the browser using below link</span><br />
<span style="font-size: large;"><br /></span><a href="http://localhost:8080/"><span style="font-size: large;">http://localhost:8080/</span></a><br />
<span style="font-size: large;"><br /></span><span style="font-size: large;">4. Create a new Notebook with name is `Kalyan Spark Streaming Phoenix Project`</span><br />
<span style="font-size: large;"><br /></span><span style="font-size: large;">5. Follow the below screenshot</span><br />
<span style="font-size: large;"><br /></span><span style="font-size: large;"><img border="0" height="360" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhds6k_icVn1JVajMdcMA39rQtpeExblOVddLiMbCXoFe0EyKvnNz82MTkaiHLf4muBPCLRXH2b8m_u-CR-rk0o1E3Eylly3mJHGBHXiyFln97CkdcEslYD20aMjCtA3TNOTUTaqoTJeszb/s640/Zeppelin+Project+1.png" width="640" /></span><br />
<span style="font-size: large;"><br /></span><span style="font-size: large;">6. Run the paragraphs</span><br />
<span style="font-size: large;"><br /></span><span style="font-size: large;"><img border="0" height="360" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi6TS4hdL7KIf_qhLKUpzTCnp2RMvQprguUyQTlWlcKHx1-JWqvTFrbv_MNELNOHoSbRaYaqb2m-MJ1zJu-3cYmaOrEqGHOP-SraV609MmRB5-aUFn27-WKfMJP9JCfxhowjcBfpoU5_mgq/s640/Run+All+the+Paragraphs.png" width="640" /></span></div>
<div>
<span style="font-size: large;"><br /></span></div>
<div>
<span style="font-size: large;"><br /></span></div>
<span style="font-size: large;"><img border="0" height="360" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhjS6peTSF1DLNmuhBU5jb4ez6sCleG8VK45XF3vUiRzS9-hOHSROqPZnb_2-yegpD4K4dPvV7msh6IL4qpL20TzHbZN8OiC2ntZUWt6Ty7nghUFTinZpXOhRTZ4tfLUc5AhSlCUdcvMCnM/s640/Ok+Screen.png" width="640" /></span><br />
<div>
<span style="font-size: large;"><br /></span></div>
<div>
<span style="font-size: large;"><br /></span><span style="font-size: large;"><img border="0" height="360" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjZwSPq8oNZAWFiz-5R_k7g-RuQTSjDqiOfqO8teR7NlHfxKioXAAg4NKBXGV0QHixNKTQYTgCLaCDQHBNeYpHHoGwmhtVwGTvcXUqe5NUljPraEiZ1b4sSmSYzXq2SkQM3OxiDVuKdKqXk/s640/Zeppelin+Phoenix+Project+1.png" width="640" /></span><br />
<span style="font-size: large;"><br /></span><span style="font-size: large;">7. Execute below command</span><br />
<span style="font-size: large;"><br /></span><span style="font-size: large;">%sql</span><br />
<span style="font-size: large;"><br /></span><span style="font-size: large;">select * from prouctlog</span><br />
<span style="font-size: large;"><br /></span><span style="font-size: large;"><img border="0" height="360" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjDRMrJwu0fr8f0sFCt9Am5aIhmNvrQovVwqGpWzxo-3C04IvxotEuCw8Ir3NgCkIsrPExUhZkmDWYOQALTvHzdxyT8u0aOsCOmjsr9jp5hgwc3nqk3TYCkBE2x5iSeNZ8bYuhyphenhyphentwNRngFk/s640/Zeppelin+Phoenix+Project+2.png" width="640" /></span><br />
<span style="font-size: large;"><br /></span><span style="font-size: large;">8. Execute below command</span><br />
<span style="font-size: large;"><br /></span><span style="font-size: large;">%sql</span><br />
<span style="font-size: large;"><br /></span><span style="font-size: large;">select userid, count(*) as cnt from prouctlog group by userid</span><br />
<span style="font-size: large;"><br /></span><span style="font-size: large;"><img border="0" height="360" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhagEPDn-q-TDgrSZZ6QaWdZlerSKODzuuGZ-WCkRUojCgwRNSjkkiGdhqTL5AHKSPgclzyo7KhLlp018FewlCjbhZQ09dl82p6DnkBe1MQOGy36zccOs-HT9tDo7PEjbLrp1d7GO4IBK8H/s640/Zeppelin+Phoenix+Project+3.png" width="640" /></span><br />
<span style="font-size: large;"><br /></span><span style="font-size: large;">9. Check the below graphs</span><br />
<span style="font-size: large;"><br /></span><span style="font-size: large;"> </span><br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjQjUhuT1vmWwhgENYp59akFf05JqfkFIMpWPx_ae_R02owN7tKs6knyQp9F3xz6-9w4mGdN-SZr9HtLbMHyI6hGnKZEst6Eb3d-2CXE74Ifm2PAsnhVHhvJFJ6EtrmYGI65NLo_T_rtc8A/s1600/Zeppelin+Phoenix+Project+4.png" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><span style="font-size: large;"><img border="0" height="360" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjQjUhuT1vmWwhgENYp59akFf05JqfkFIMpWPx_ae_R02owN7tKs6knyQp9F3xz6-9w4mGdN-SZr9HtLbMHyI6hGnKZEst6Eb3d-2CXE74Ifm2PAsnhVHhvJFJ6EtrmYGI65NLo_T_rtc8A/s640/Zeppelin+Phoenix+Project+4.png" width="640" /></span></a></div>
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgRHkjicm-F3cAaHmb3cI-_Z7M_uKTbEugkTmB8lN-Hii7cWt7Er8zDjknRimwyR7WgQI6WtXIb5tBRpx9gH42avhmKmRIv4pU_leWFTvpnnEJuRvB2nYztwEg7_8DAOAHJirs98K2XgsGM/s1600/Zeppelin+Phoenix+Project+5.png"><span style="font-size: large;"><img border="0" height="360" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgRHkjicm-F3cAaHmb3cI-_Z7M_uKTbEugkTmB8lN-Hii7cWt7Er8zDjknRimwyR7WgQI6WtXIb5tBRpx9gH42avhmKmRIv4pU_leWFTvpnnEJuRvB2nYztwEg7_8DAOAHJirs98K2XgsGM/s640/Zeppelin+Phoenix+Project+5.png" width="640" /></span></a><br />
<span style="font-size: large;"><br /></span><span style="font-size: large;"><br /></span><span style="font-size: large;"><br /></span><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjaa3pKR81sAzcl4f7t-mWcKKCDxHOqzrIM_k61q-JQWxzG9U340Qm0RCgpxBLW4gb0ZBCa4zg_wLg42ha1ge_KqRdhHxaWcAQNqM7TJoN1G8Vuxpcwp27kvNCo5aBxG_YHnBRu7xpPnovT/s1600/Zeppelin+Phoenix+Project+6.png"><span style="font-size: large;"><img border="0" height="360" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjaa3pKR81sAzcl4f7t-mWcKKCDxHOqzrIM_k61q-JQWxzG9U340Qm0RCgpxBLW4gb0ZBCa4zg_wLg42ha1ge_KqRdhHxaWcAQNqM7TJoN1G8Vuxpcwp27kvNCo5aBxG_YHnBRu7xpPnovT/s640/Zeppelin+Phoenix+Project+6.png" width="640" /></span></a><br />
<span style="font-size: large;"><br /></span><span style="font-size: large;"><br /></span><span style="font-size: large;"><br /></span><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiyqSrvOrZ9Ht1COQ6XwPo3MewvfTiHO_3z0nx9DBNndyDYE6UPzubUiyi22C8XDKJkU66CHlrFkpkoxkuqIpv76pYMhYixrqVgskxM6tDail6BfPRBC3SiCLibC0qsk1KPmf2_7bjX7UA0/s1600/Zeppelin+Phoenix+Project+7.png"><span style="font-size: large;"><img border="0" height="360" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiyqSrvOrZ9Ht1COQ6XwPo3MewvfTiHO_3z0nx9DBNndyDYE6UPzubUiyi22C8XDKJkU66CHlrFkpkoxkuqIpv76pYMhYixrqVgskxM6tDail6BfPRBC3SiCLibC0qsk1KPmf2_7bjX7UA0/s640/Zeppelin+Phoenix+Project+7.png" width="640" /></span></a><br />
<span style="font-size: large;"><br /></span><span style="font-size: large;"><br /></span><span style="font-size: large;"><img border="0" height="360" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhLVgFm35-sJaVPRxoJRZxmiMCjFATPxQ5T5juFog5mwwm1kdnd-0rKmZFckfrb2Rff4nOcLrDJ1Y84kMml1ApEVxPl6Dj-zzn3zy2jXkUPdw7qyv6U1tbM0fYra2nFm3EHPGswCPijXXZE/s640/Zeppelin+Phoenix+Project+8.png" width="640" /></span><br />
<span style="font-size: large;"><br /></span></div>
</div>
Anonymoushttp://www.blogger.com/profile/01685371156177399870noreply@blogger.com3tag:blogger.com,1999:blog-2182833570422175384.post-633949748424499882016-11-17T15:44:00.000-08:002016-11-17T16:05:31.132-08:00Kalyan Spark Streaming + Phoenix + Flume + Play Project<div dir="ltr" style="text-align: left;" trbidi="on">
<span style="font-size: large;">=====================================================</span><br />
<b><span style="font-size: large;"> Kalyan Spark + Phoenix + Flume + Play Project</span></b><br />
<span style="font-size: large;">=====================================================</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">1. Generate Sample Users</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">2. Generate Sample Product Logs</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">3. Using the `<b>Flume</b>` transfer the `<b>product.log</b>` file changes to `<b>Phoenix</b>`</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">4. Use `<b>Spark Streaming</b>` to listen `<b>Flume</b>` data</span><br />
<span style="font-size: large;"><br /></span>
<br />
<div class="separator" style="clear: both; text-align: center;">
</div>
<div>
<span style="font-size: large;">5. Save the streaming data into `<b>Phoenix</b>` for analytics</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">6. Give Real-Time analysis on `<b>Phoenix</b>` data using UI tools</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">7. You can still add your own features</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">8. Visualize data in UI</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">======================================================</span><br />
<b><span style="font-size: large;"> Execute the below operations</span></b><br />
<span style="font-size: large;">======================================================</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">1. Execute the below `<b>Phoenix Operations</b>`</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">Start the Hadoop , Hbase & Phoenix</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">DROP TABLE users;</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">CREATE TABLE users("userid" bigint PRIMARY KEY, "username" varchar, "password" varchar, "email" varchar, "country" varchar, "state" varchar, "city" varchar, "dt" varchar);</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">DROP TABLE productlog;</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">CREATE TABLE IF NOT EXISTS productlog("userid" bigint not null, "username" varchar, "email" varchar, "product" varchar not null, "transaction" varchar, "country" varchar, "state" varchar, "city" varchar, "dt" varchar not null CONSTRAINT pk PRIMARY KEY ("userid", "product", "dt"));</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">2. Generate Sample Users by running `<b>GenerateUsers.scala</b>` code</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><img border="0" height="360" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjXopXVfmP9TByFo2vtBk5talo0Ubls2MVUsR8hI_lhbyZRQSKPDRgTSvVEnu1ausFj2ZrC8mGoXNLRkGMtUhx0jsulFkDZ89ykVtkZTA-sEk_cf8KMdRQkyptkSdArXXqa4c2-dBV0PSfi/s640/Kalyan+Spark+Phoenix+Project+2.png" width="640" /></span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">3. Execute the below `<b>Flume Operations</b>`</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">Create `<b>kalyan-spark-project</b>` folder in `<b>$FLUME_HOME</b>` folder</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">Copy `<b>exec-avro.conf</b>` file into `<b>$FLUME_HOME/kalyan-spark-project</b>` folder</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">Execute below command to start flume agent</span><br />
<span style="font-size: large;"><br /></span>
<b><span style="font-size: large;">$FLUME_HOME/bin/flume-ng agent -n agent --conf $FLUME_HOME/conf -f $FLUME_HOME/kalyan-spark-project/exec-avro.conf -Dflume.root.logger=DEBUG,console</span></b><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">4. Generate Sample Logs by running `<b>GenerateProductLog.scala</b>`</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><img border="0" height="360" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgO9HjkpvZyaLYTNfWTSCKfBAhSXlOd6I4mh1G9yBooKEAUz4NojMQLthKzAlE2j80TygNsbWxMtt-cCzjKkbiebpGjsocoHq6uaiznbfuqPt1QHS5N3i7bfeiLWP7BqEtsKdsj10acfMjw/s640/Kalyan+Spark+Phoenix+Project+1.png" width="640" /></span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">5. Execute the below `<b>Spark Operations</b>`</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">Stream Sample Logs From Flume Sink using `<b>Spark Streaming</b>` by running `<b>KalyanSparkFlumeStreaming.scala</b>` and pass the below arguments</span><br />
<span style="font-size: large;"><br /></span>
<b><span style="font-size: large;">2 localhost 1234 false</span></b><br />
<span style="font-size: large;"><b><br /></b>
<img border="0" height="360" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh5z-Om2fwFfbFZP3wT482uw_HKCREfF7VfdXU_jdcTqLlJzn0aJueQPk0x7gBnd-Bk2F0l1lBv5mBb51Jik68mgwRe_WQ8Ijoz_BybPxwY1iIieuXThyphenhyphen9iy7qr_9lzDsUPHTpiuXirC4Gi/s640/Kalyan+Spark+Phoenix+Project+3.png" width="640" /></span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">Stream Sample Logs From Kafka Producer using `<b>Spark Streaming</b>` by running `<b>KalyanSparkKafkaStreaming.scala</b>` and pass the below arguments</span><br />
<span style="font-size: large;"><br /></span>
<b><span style="font-size: large;">2 topic1,topic2 localhost:9092 false</span></b><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><img border="0" height="360" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEghGVLtcwgxvuVd0irbVkxMcS6nLTXf2wm0dTuSdb6zEXvCQ65EOK21rLB47dhxJQz8cnB8ZCwos63fTin797ro2bhmB8ElKIV_5e8t-aiY08Wv6BSE4xul8Qc4QaN3qLIdKOuT5xpY9uOV/s640/Kalyan+Spark+Phoenix+Project+4.png" width="640" /></span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">6. Use Zeppelin for UI</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">=============================================================</span><br />
<b><span style="font-size: large;"> Kalyan Spark + Phoenix + Flume + Play Project : Visualization</span></b><br />
<span style="font-size: large;">=============================================================</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">1. Open new Terminal</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">2. Change the directory using below command</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">cd /home/orienit/spark/workspace/kalyan-spark-streaming-phoenix-project</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">3. Run the Project using below command</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">activator run</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><img border="0" height="360" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhHnBQT3h2H3Ez6SOrjvYdfV_8Jv0sYa23Gwb68IUq4HnCGlbPxel-mV1gWyNHZwYgVCa3hIJIRSvV9GYnqKysC4l1UWMvCAd5MkF2dO_JUdW6TSaSjFzb9TeLTxHLeBaEixAq2ceuQizVF/s640/play+project.png" width="640" /></span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">4. Open the browser with below link : Home Page</span><br />
<span style="font-size: large;"><br /></span>
<a href="http://localhost:9000/"><span style="font-size: large;">http://localhost:9000/</span></a><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><img border="0" height="360" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi5SdXZbNcn_m8zluZZ5cEEipaJFAifS1fN3w80OwU1aZ1suTuXCwp7pkBGOi-DrOBLvOC4tanEiFxoMRTRV5Ed1-TTFSdDJfBtdpz9SMSt9F1M2wRUXDh_Knbs19yxIXS0axdM0OAYpA8z/s640/Home+Page.png" width="640" /></span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">5. Open the browser with below link : Users Page</span><br />
<span style="font-size: large;"><br /></span>
<a href="http://localhost:9000/kalyan/users"><span style="font-size: large;">http://localhost:9000/kalyan/users</span></a><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><img border="0" height="360" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgfH3aOrHg4tvVp-5btJmzmuyUQsqJseRJF0elCeDI1AKCh6UQAurMqHguH2ab98oKOxCZOTPB3gOeDD-kHcVgrnc8cDv7SeolmCQhcNqMXu0WSy0-PW13PftmFG9jx9rlODG-yTA-sEmSn/s640/users+page.png" width="640" /></span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">6. Open the browser with below link : Different Graphs</span><br />
<span style="font-size: large;"><br /></span>
<a href="http://localhost:9000/kalyan/home"><span style="font-size: large;">http://localhost:9000/kalyan/home</span></a><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><img border="0" height="360" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjiLdJcq9MK9bAggovlfZ5ThAkoHoyO_weRLOII2VN1lu4dOlwvjfE4z8Bdq7NPZaJsBjGuT7TGJgxjKm4sk30iATr1hLIslQPzwwbt0KclHeNADdm31VnI8Bbhrf_ABGkl5DoitXK3SoOP/s640/Different+Graphs.png" width="640" /></span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">7. Open the browser with below link : Products Page 1</span><br />
<span style="font-size: large;"><br /></span>
<a href="http://localhost:9000/kalyan/graphs/products"><span style="font-size: large;">http://localhost:9000/kalyan/graphs/products</span></a><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><img border="0" height="360" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiHc9COvr7Lz8JpnN0-bdpzdLZrmLxaS2qtCPWPN-RWfDBLUfmbonrrvQIjruQ4PK98XBtJBbmKcmGWvXyxHrSzkCy9v3WmHEYU_6UqJLyGGG5jkER2dymzYGMIempmzwp3fsLxwF5-TTWz/s640/product+page1.png" width="640" /></span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">8. Open the browser with below link : Products Page 2</span><br />
<span style="font-size: large;"><br /></span>
<a href="http://localhost:9000/kalyan/graphs/products"><span style="font-size: large;">http://localhost:9000/kalyan/graphs/products</span></a><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><img border="0" height="360" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi_PEth5fdCaoFjNPQn9XjU7fNzQ-KqiADIkKVWIXep9d48qGGuWf_li1VSCtO8RmvqjTtW_UPddo_Q3WMDJ1ZRf5mo2Ho-yhG7zSIi_Y9HXH2uuB5NVh9CJM_o6EY6V2i9gVMlUFA90x8Q/s640/product+page2.png" width="640" /></span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">============================================================</span><br />
<b><span style="font-size: large;"> Kalyan Spark + Phoenix + Flume + Play Project : Zeppelin Visualization</span></b><br />
<span style="font-size: large;">============================================================</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">1. Open new Terminal</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">2. Run the Zeppelin Project using below command</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">zeppelin-daemon.sh start</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">3. Open the browser using below link</span><br />
<span style="font-size: large;"><br /></span>
<a href="http://localhost:8080/"><span style="font-size: large;">http://localhost:8080/</span></a><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">4. Create a new Notebook with name is `Kalyan Spark Streaming Phoenix Project`</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">5. Follow the below screenshot</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><img border="0" height="360" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhds6k_icVn1JVajMdcMA39rQtpeExblOVddLiMbCXoFe0EyKvnNz82MTkaiHLf4muBPCLRXH2b8m_u-CR-rk0o1E3Eylly3mJHGBHXiyFln97CkdcEslYD20aMjCtA3TNOTUTaqoTJeszb/s640/Zeppelin+Project+1.png" width="640" /></span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">6. Run the paragraphs</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><img border="0" height="360" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi6TS4hdL7KIf_qhLKUpzTCnp2RMvQprguUyQTlWlcKHx1-JWqvTFrbv_MNELNOHoSbRaYaqb2m-MJ1zJu-3cYmaOrEqGHOP-SraV609MmRB5-aUFn27-WKfMJP9JCfxhowjcBfpoU5_mgq/s640/Run+All+the+Paragraphs.png" width="640" /></span></div>
<div>
<span style="font-size: large;"><br /></span></div>
<div>
<span style="font-size: large;"><br /></span></div>
<span style="font-size: large;"><img border="0" height="360" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhjS6peTSF1DLNmuhBU5jb4ez6sCleG8VK45XF3vUiRzS9-hOHSROqPZnb_2-yegpD4K4dPvV7msh6IL4qpL20TzHbZN8OiC2ntZUWt6Ty7nghUFTinZpXOhRTZ4tfLUc5AhSlCUdcvMCnM/s640/Ok+Screen.png" width="640" /></span><br />
<div>
<span style="font-size: large;"><br /></span></div>
<div>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><img border="0" height="360" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjZwSPq8oNZAWFiz-5R_k7g-RuQTSjDqiOfqO8teR7NlHfxKioXAAg4NKBXGV0QHixNKTQYTgCLaCDQHBNeYpHHoGwmhtVwGTvcXUqe5NUljPraEiZ1b4sSmSYzXq2SkQM3OxiDVuKdKqXk/s640/Zeppelin+Phoenix+Project+1.png" width="640" /></span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">7. Execute below command</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">%sql</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">select * from prouctlog</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><img border="0" height="360" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjDRMrJwu0fr8f0sFCt9Am5aIhmNvrQovVwqGpWzxo-3C04IvxotEuCw8Ir3NgCkIsrPExUhZkmDWYOQALTvHzdxyT8u0aOsCOmjsr9jp5hgwc3nqk3TYCkBE2x5iSeNZ8bYuhyphenhyphentwNRngFk/s640/Zeppelin+Phoenix+Project+2.png" width="640" /></span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">8. Execute below command</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">%sql</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">select userid, count(*) as cnt from prouctlog group by userid</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><img border="0" height="360" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhagEPDn-q-TDgrSZZ6QaWdZlerSKODzuuGZ-WCkRUojCgwRNSjkkiGdhqTL5AHKSPgclzyo7KhLlp018FewlCjbhZQ09dl82p6DnkBe1MQOGy36zccOs-HT9tDo7PEjbLrp1d7GO4IBK8H/s640/Zeppelin+Phoenix+Project+3.png" width="640" /></span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">9. Check the below graphs</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"> </span><br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjQjUhuT1vmWwhgENYp59akFf05JqfkFIMpWPx_ae_R02owN7tKs6knyQp9F3xz6-9w4mGdN-SZr9HtLbMHyI6hGnKZEst6Eb3d-2CXE74Ifm2PAsnhVHhvJFJ6EtrmYGI65NLo_T_rtc8A/s1600/Zeppelin+Phoenix+Project+4.png" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><span style="font-size: large;"><img border="0" height="360" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjQjUhuT1vmWwhgENYp59akFf05JqfkFIMpWPx_ae_R02owN7tKs6knyQp9F3xz6-9w4mGdN-SZr9HtLbMHyI6hGnKZEst6Eb3d-2CXE74Ifm2PAsnhVHhvJFJ6EtrmYGI65NLo_T_rtc8A/s640/Zeppelin+Phoenix+Project+4.png" width="640" /></span></a></div>
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgRHkjicm-F3cAaHmb3cI-_Z7M_uKTbEugkTmB8lN-Hii7cWt7Er8zDjknRimwyR7WgQI6WtXIb5tBRpx9gH42avhmKmRIv4pU_leWFTvpnnEJuRvB2nYztwEg7_8DAOAHJirs98K2XgsGM/s1600/Zeppelin+Phoenix+Project+5.png"><span style="font-size: large;"><img border="0" height="360" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgRHkjicm-F3cAaHmb3cI-_Z7M_uKTbEugkTmB8lN-Hii7cWt7Er8zDjknRimwyR7WgQI6WtXIb5tBRpx9gH42avhmKmRIv4pU_leWFTvpnnEJuRvB2nYztwEg7_8DAOAHJirs98K2XgsGM/s640/Zeppelin+Phoenix+Project+5.png" width="640" /></span></a><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjaa3pKR81sAzcl4f7t-mWcKKCDxHOqzrIM_k61q-JQWxzG9U340Qm0RCgpxBLW4gb0ZBCa4zg_wLg42ha1ge_KqRdhHxaWcAQNqM7TJoN1G8Vuxpcwp27kvNCo5aBxG_YHnBRu7xpPnovT/s1600/Zeppelin+Phoenix+Project+6.png"><span style="font-size: large;"><img border="0" height="360" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjaa3pKR81sAzcl4f7t-mWcKKCDxHOqzrIM_k61q-JQWxzG9U340Qm0RCgpxBLW4gb0ZBCa4zg_wLg42ha1ge_KqRdhHxaWcAQNqM7TJoN1G8Vuxpcwp27kvNCo5aBxG_YHnBRu7xpPnovT/s640/Zeppelin+Phoenix+Project+6.png" width="640" /></span></a><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiyqSrvOrZ9Ht1COQ6XwPo3MewvfTiHO_3z0nx9DBNndyDYE6UPzubUiyi22C8XDKJkU66CHlrFkpkoxkuqIpv76pYMhYixrqVgskxM6tDail6BfPRBC3SiCLibC0qsk1KPmf2_7bjX7UA0/s1600/Zeppelin+Phoenix+Project+7.png"><span style="font-size: large;"><img border="0" height="360" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiyqSrvOrZ9Ht1COQ6XwPo3MewvfTiHO_3z0nx9DBNndyDYE6UPzubUiyi22C8XDKJkU66CHlrFkpkoxkuqIpv76pYMhYixrqVgskxM6tDail6BfPRBC3SiCLibC0qsk1KPmf2_7bjX7UA0/s640/Zeppelin+Phoenix+Project+7.png" width="640" /></span></a><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><img border="0" height="360" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhLVgFm35-sJaVPRxoJRZxmiMCjFATPxQ5T5juFog5mwwm1kdnd-0rKmZFckfrb2Rff4nOcLrDJ1Y84kMml1ApEVxPl6Dj-zzn3zy2jXkUPdw7qyv6U1tbM0fYra2nFm3EHPGswCPijXXZE/s640/Zeppelin+Phoenix+Project+8.png" width="640" /></span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
</div>
</div>
Anonymoushttp://www.blogger.com/profile/01685371156177399870noreply@blogger.com4tag:blogger.com,1999:blog-2182833570422175384.post-79201836632494237822016-10-30T17:45:00.000-07:002016-11-08T00:55:34.388-08:00Control Structures : If Else Expressions : Day 6 Learnings<div dir="ltr" style="text-align: left;" trbidi="on">
<span style="font-size: large;">The Scala <b>built-in control structures</b> are <b>if, while, for, try, and match expressions</b>. Scala’s built-in control structures are sufficient to provide features that their imperative equivalents provide, but because all of Scala’s control structures (except while loops) result in some value, these control structures support functional approach as well. <br /><br /><b><u><span style="color: red;"> If..Else Expression Blocks </span></u></b></span><br />
<span style="font-size: large;"><span style="color: red;"><b><u><br /></u></b></span> The If..Else conditional expression is a classic programming construct for choosing a branch of code based on whether an expression resolves to <b>true</b> or <b>false</b>. </span><br />
<div>
<span style="font-size: large;"><br /></span></div>
<div>
<span style="font-size: large;">In many languages this takes the form of an “if .. else if .. else” block, which <b>starts with</b> an “<b>if</b>,” continues with <b>zero to many </b>“<b>else if </b>” sections, and <b>ends</b> with a final “<b>else</b>” catch-all statement. <br /><br />As a matter of practice you can write these same “if .. else if .. else” blocks in Scala and they will work just as you have experienced them in Java and other languages. As a matter of formal syntax, however, Scala only supports a single “<b>if</b> ” and optional “<b>else</b>” block, and does not recognise the “<b>else if </b>” block as a single construct. <br /><br />So how do “else if ” blocks still work correctly in Scala? Because “if .. else” blocks are based on expression blocks, and expression blocks can be easily nested, an “if .. else if ..else” expression is equivalent to a nested “if .. else { if .. else }” expression. </span></div>
<div>
<span style="font-size: large;"><br /></span></div>
<div>
<span style="font-size: large;">Logically this is exactly the same as an “if .. else if .. else” block, and as a matter of syntax Scala recognises the second “if else” as a nested expression of the outer “if .. else” block. <br /><br />Let’s start exploring actual “if ” and “if .. else” blocks by looking at the syntax for the simple “if ” block. <br /><br /><b><u><span style="color: red;"> If Expressions </span></u></b></span><br />
<span style="font-size: large;"><span style="color: red;"><b><u><br /></u></b></span><span style="color: purple;"> Syntax: Using an If Expression <br /><br /> if (<Boolean expression>) <expression></span></span><br />
<span style="font-size: large;"><span style="color: purple;"></span><br />The term <b>Boolean expression </b>here indicates an <b>expression</b> that will return either <b>true</b> or <b>false</b> . <br /><br />Here is a simple if block that prints a notice if the Boolean expression is true: <br /><br /><span style="color: purple;"> scala> if ( 47 % 3 > 0 ) println("Not a multiple of 3") <br />Not a multiple of 3 </span></span><br />
<span style="font-size: large;"><span style="color: purple;"><br /></span> Of course 47 isn’t a multiple of 3, so the Boolean expression was true and the println was triggered. </span></div>
<div>
<span style="font-size: large;"><br />Although an if block can act as an expression, it is better suited for statements like this one. The problem with using if blocks as expressions is that they only conditionally return a value. If the Boolean expression returns false, what do you expect the if block to return? </span></div>
<div>
<span style="font-size: large;"><br /></span><span style="color: purple; font-size: large;">scala> val result = if ( false ) "what does this return?" </span></div>
<div>
<span style="font-size: large;"><span style="color: purple;">result: Any = () </span></span><br />
<span style="font-size: large;"><span style="color: purple;"><br /></span></span></div>
<div>
<span style="font-size: large;">The type of the result value in this example is unspecified so the compiler used type inference to determine the most appropriate type. Either a String or Unit could have been returned, so the compiler chose the root class Any . This is the one class common to both String (which extends AnyRef ) and to Unit (which extends AnyVal ). Unlike the solitary “if ” block, the “if .. else” block is well suited to working with expressions. <br /><br /><br /><b><u><span style="color: red;"> If-Else Expressions </span></u></b></span><br />
<span style="font-size: large;"><span style="color: red;"><b><u><br /></u></b></span><span style="color: purple;"> Syntax: If .. Else Expressions </span></span></div>
<div>
<span style="font-size: large;"><span style="color: purple;"><br />if (<Boolean expression>) <expression> <br />else <expression> </span><br /><br /><span style="color: red;"> Here is an example: </span></span><br />
<span style="font-size: large;"><span style="color: red;"><br /></span><span style="color: purple;"> scala> val x = 10; val y = 20 <br />x: Int = 10 <br />y: Int = 20 <br /><br /> scala> val max = if (x > y) x else y <br />max: Int = 20 </span><br /><br />You can see that the x and y values make up the entirety of the if and else expressions. The resulting value is assigned to max , which we and the Scala compiler know will be an Int because both expressions have return values of type Int . <br /><br /><b><span style="color: red;">Note: </span></b>Some wonder why Scala doesn’t have a <b>ternary expression</b> (popular in C and Java) where the punctuation characters ? and : act as a one-line if and else expression. It should be clear from this example that Scala doesn’t really need it because its if and else blocks can fit compactly on a single line (and, unlike in C and Java, they are already an expression). <br /><br />Using a single expression without an expression block in if..else expressions works well if everything fits on one line. When your if..else expression doesn’t easily fit on a single line, however, consider using expression blocks to make your code more readable. if expressions without an else should always use curly braces, because they tend to be statements that create side effects. <br /><br /> if..else blocks are a simple and common way to write conditional logic. There are other, more elegant ways to do so in Scala, however, using match expressions. <br /><br /><style type="text/css">
@page { margin: 2cm }
p { margin-bottom: 0.25cm; line-height: 120% }
</style></span><br />
<div>
<div>
<span style="font-size: large;"><b><i><span style="color: red;"><u>Note:</u> Sample examples on if..else</span></i></b></span></div>
<div>
<span style="font-size: large;"><br /></span></div>
<div>
<div>
<span style="color: purple; font-size: large;"><b><i>If Expression in Scala</i></b></span></div>
<div>
<span style="font-size: large;"><i>scala> if (exp) println("yes")</i></span></div>
</div>
<div>
<span style="font-size: large;"><i><br /></i></span></div>
<div>
<span style="color: purple; font-size: large;"><b><i>Multi-line If Expression</i></b></span></div>
<div>
<span style="font-size: large;"><i></i></span><br />
<div>
<span style="font-size: large;"><i>scala> if (exp) {</i></span></div>
<span style="font-size: large;"><i>
<div>
| println("Line one")</div>
<div>
| println("Line two")</div>
<div>
| }</div>
</i></span></div>
<div>
<span style="font-size: large;"><i><br /></i></span></div>
<div>
<div>
<span style="color: purple; font-size: large;"><b><i>Multi-line Else Expression</i></b></span></div>
<div>
<i><span style="color: purple; font-size: large;"></span></i></div>
<span style="font-size: large;"></span><br />
<div>
<span style="font-size: large;"><i>scala> if (exp) {</i></span></div>
<span style="font-size: large;">
<div>
<i> | println("Hello")</i></div>
<div>
<i> | } else {</i></div>
<div>
<i> | println("Line one")</i></div>
<div>
<i> | println("Line two")</i></div>
<div>
<i> | }</i></div>
<div>
<br /></div>
<div>
<div style="font-size: medium;">
<span style="color: purple; font-size: large;"><b><i>Multiple If-Else and Else-If Expression</i></b></span></div>
<div>
<i>scala> if (exp1) {</i></div>
</div>
</span></div>
</div>
<div>
<span style="font-size: large;"></span><br />
<div>
<span style="font-size: large;"><i> | println("Line one")</i></span></div>
<span style="font-size: large;">
<div>
<i> | println("Line two")</i></div>
<div>
<i> | } else if (exp2) {</i></div>
<div>
<i> | println("Line one")</i></div>
<div>
<i> | println("Line two")</i></div>
<div>
<i> | } else if (exp3) {</i></div>
<div>
<i> | println("Line one")</i></div>
<div>
<i> | println("Line two")</i></div>
<div>
<i> | } else {</i></div>
<div>
<i> | println("Line one")</i></div>
<div>
<i> | println("Line two")</i></div>
<div>
<i> | }</i></div>
<div>
<br /></div>
</span></div>
<div>
<span style="font-size: large;"><br /></span></div>
<div>
<br /></div>
</div>
</div>
Anonymoushttp://www.blogger.com/profile/01685371156177399870noreply@blogger.com2tag:blogger.com,1999:blog-2182833570422175384.post-40990897787239639102016-10-27T22:36:00.004-07:002016-10-27T22:36:38.735-07:00SPARK BASICS DAY 2 Practice on 28 Oct 2016<div dir="ltr" style="text-align: left;" trbidi="on">
<span style="font-size: large;"><span style="background-color: white; color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif;">Spark Day 2 Practice:</span><br style="background-color: white; color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif;" /><span style="background-color: white; color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif;">==============================================</span></span><br />
<br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">To work with Spark Sql + Hive :</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">----------------------------------------------------------</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">1. copy "hive-site.xml" from "$HIVE_HOME/conf" folder to "$SPARK_HOME/conf"</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;"><br /></span>
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">2. copy "mysq connector jar" from "$HIVE_HOME/lib" folder to "$SPARK_HOME/lib</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;"><br /></span>
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">----------------------------------------------------------</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">hive (kalyan)> select course, count(*) from kalyan.student group by course;</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;"><br /></span>
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">OK</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">cse<span class="Apple-tab-span" style="white-space: pre;"> </span>7</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">mech<span class="Apple-tab-span" style="white-space: pre;"> </span>7</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">Time taken: 22.567 seconds, Fetched: 2 row(s)</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;"><br /></span>
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">----------------------------------------------------------</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">sqlContext.sql("select course, count(*) from kalyan.student group by course").show()</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;"><br /></span>
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;"><br /></span>
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;"><br /></span>
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">scala> sqlContext.sql("select course, count(*) from kalyan.student group by course").show()</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">+------+---+</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">|course|_c1|</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">+------+---+</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">| mech| 7|</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">| cse| 7|</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">+------+---+</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;"><br /></span>
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;"><br /></span>
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">scala> sqlContext.sql("select year, count(*) from kalyan.student group by year").show()</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">+----+---+</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">|year|_c1|</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">+----+---+</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">| 1| 7|</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">| 2| 7|</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">+----+---+</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;"><br /></span>
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;"><br /></span>
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">----------------------------------------------------------</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;"><br /></span>
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">scala> sqlContext.sql("select course, count(*) from kalyan.student group by course")</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">res7: org.apache.spark.sql.DataFrame = [course: string, _c1: bigint]</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;"><br /></span>
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;"><br /></span>
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">hive (kalyan)> describe kalyan.student;</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">OK</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">name <span class="Apple-tab-span" style="white-space: pre;"> </span>string <span class="Apple-tab-span" style="white-space: pre;"> </span>student name </span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">id <span class="Apple-tab-span" style="white-space: pre;"> </span>int <span class="Apple-tab-span" style="white-space: pre;"> </span>student id </span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">course <span class="Apple-tab-span" style="white-space: pre;"> </span>string <span class="Apple-tab-span" style="white-space: pre;"> </span>student course </span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">year <span class="Apple-tab-span" style="white-space: pre;"> </span>int <span class="Apple-tab-span" style="white-space: pre;"> </span>student year </span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">Time taken: 0.349 seconds, Fetched: 4 row(s)</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;"><br /></span>
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;"><br /></span>
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">scala> sqlContext.sql("select * from kalyan.student")</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">res8: org.apache.spark.sql.DataFrame = [name: string, id: int, course: string, year: int]</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;"><br /></span>
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;"><br /></span>
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">scala> val df = sqlContext.sql("select * from kalyan.student")</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">df: org.apache.spark.sql.DataFrame = [name: string, id: int, course: string, year: int]</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;"><br /></span>
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">scala> df.show</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">+------+---+------+----+</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">| name| id|course|year|</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">+------+---+------+----+</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">| arun| 1| cse| 1|</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">| sunil| 2| cse| 1|</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">| raj| 3| cse| 1|</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">|naveen| 4| cse| 1|</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">| venki| 5| cse| 2|</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">|prasad| 6| cse| 2|</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">| sudha| 7| cse| 2|</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">| ravi| 1| mech| 1|</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">| raju| 2| mech| 1|</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">| roja| 3| mech| 1|</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">| anil| 4| mech| 2|</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">| rani| 5| mech| 2|</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">|anvith| 6| mech| 2|</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">| madhu| 7| mech| 2|</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">+------+---+------+----+</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;"><br /></span>
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;"><br /></span>
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">scala> df.select("name","id").show</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">+------+---+</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">| name| id|</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">+------+---+</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">| arun| 1|</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">| sunil| 2|</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">| raj| 3|</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">|naveen| 4|</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">| venki| 5|</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">|prasad| 6|</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">| sudha| 7|</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">| ravi| 1|</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">| raju| 2|</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">| roja| 3|</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">| anil| 4|</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">| rani| 5|</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">|anvith| 6|</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">| madhu| 7|</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">+------+---+</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;"><br /></span>
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;"><br /></span>
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">scala> df.filter(df("year") > 1).show</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">+------+---+------+----+</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">| name| id|course|year|</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">+------+---+------+----+</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">| venki| 5| cse| 2|</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">|prasad| 6| cse| 2|</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">| sudha| 7| cse| 2|</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">| anil| 4| mech| 2|</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">| rani| 5| mech| 2|</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">|anvith| 6| mech| 2|</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">| madhu| 7| mech| 2|</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">+------+---+------+----+</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;"><br /></span>
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;"><br /></span>
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">----------------------------------------------------------</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;"><br /></span>
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">scala> df.filter(df("year") > 1).filter(df("id") === 5).show</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">+-----+---+------+----+</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">| name| id|course|year|</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">+-----+---+------+----+</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">|venki| 5| cse| 2|</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">| rani| 5| mech| 2|</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">+-----+---+------+----+</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;"><br /></span>
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;"><br /></span>
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">scala> df.filter(df("year") > 1).where(df("id") === 5).show</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">+-----+---+------+----+</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">| name| id|course|year|</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">+-----+---+------+----+</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">|venki| 5| cse| 2|</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">| rani| 5| mech| 2|</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">+-----+---+------+----+</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;"><br /></span>
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;"><br /></span>
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">scala> df.filter(df("year") > 1).where(df("id") === 5).select("name","id")show</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">warning: there were 1 feature warning(s); re-run with -feature for details</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">+-----+---+</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">| name| id|</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">+-----+---+</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">|venki| 5|</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">| rani| 5|</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">+-----+---+</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;"><br /></span>
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;"><br /></span>
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">scala> val gd = df.groupBy(df("year"), df("course"))</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">gd: org.apache.spark.sql.GroupedData = org.apache.spark.sql.GroupedData@cab0abd</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;"><br /></span>
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">scala> gd.count.show</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">+----+------+-----+</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">|year|course|count|</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">+----+------+-----+</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">| 1| mech| 3|</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">| 1| cse| 4|</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">| 2| mech| 4|</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">| 2| cse| 3|</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">+----+------+-----+</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;"><br /></span>
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;"><br /></span>
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;"><br /></span>
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;"><br /></span>
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">----------------------------------------------------------</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;"><br /></span>
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;"><br /></span>
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">scala> df.registerTempTable("student")</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;"><br /></span>
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">scala> sqlContext.sql("select * from student")</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">res36: org.apache.spark.sql.DataFrame = [name: string, id: int, course: string, year: int]</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;"><br /></span>
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">scala> sqlContext.sql("select * from student").show</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">+------+---+------+----+</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">| name| id|course|year|</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">+------+---+------+----+</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">| arun| 1| cse| 1|</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">| sunil| 2| cse| 1|</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">| raj| 3| cse| 1|</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">|naveen| 4| cse| 1|</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">| venki| 5| cse| 2|</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">|prasad| 6| cse| 2|</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">| sudha| 7| cse| 2|</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">| ravi| 1| mech| 1|</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">| raju| 2| mech| 1|</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">| roja| 3| mech| 1|</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">| anil| 4| mech| 2|</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">| rani| 5| mech| 2|</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">|anvith| 6| mech| 2|</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">| madhu| 7| mech| 2|</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">+------+---+------+----+</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;"><br /></span>
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;"><br /></span>
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">scala> sqlContext.sql("select year, course, count(*) from student group by year, course").show</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">+----+------+---+</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">|year|course|_c2|</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">+----+------+---+</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">| 1| mech| 3|</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">| 1| cse| 4|</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">| 2| mech| 4|</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">| 2| cse| 3|</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">+----+------+---+</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;"><br /></span>
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">----------------------------------------------------------</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;"><br /></span>
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">val prop = new java.util.Properties</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">prop.setProperty("driver","com.mysql.jdbc.Driver")</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">prop.setProperty("user","root")</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">prop.setProperty("password","hadoop")</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;"><br /></span>
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">val jdbcDF = sqlContext.read.jdbc("jdbc:mysql://localhost:3306/kalyan", "student", prop)</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;"><br /></span>
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">----------------------------------------------------------</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;"><br /></span>
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">scala> val jdbcDF = sqlContext.read.jdbc("jdbc:mysql://localhost:3306/kalyan", "student", prop)</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">jdbcDF: org.apache.spark.sql.DataFrame = [name: string, id: int, course: string, year: int]</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;"><br /></span>
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;"><br /></span>
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">val hiveDF = sqlContext.sql("select * from kalyan.student")</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;"><br /></span>
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;"><br /></span>
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;"><br /></span>
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">scala> val hiveDF = sqlContext.sql("select * from kalyan.student")</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">hiveDF: org.apache.spark.sql.DataFrame = [name: string, id: int, course: string, year: int]</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;"><br /></span>
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">scala> hiveDF.show</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">+------+---+------+----+</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">| name| id|course|year|</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">+------+---+------+----+</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">| arun| 1| cse| 1|</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">| sunil| 2| cse| 1|</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">| raj| 3| cse| 1|</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">|naveen| 4| cse| 1|</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">| venki| 5| cse| 2|</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">|prasad| 6| cse| 2|</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">| sudha| 7| cse| 2|</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">| ravi| 1| mech| 1|</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">| raju| 2| mech| 1|</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">| roja| 3| mech| 1|</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">| anil| 4| mech| 2|</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">| rani| 5| mech| 2|</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">|anvith| 6| mech| 2|</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">| madhu| 7| mech| 2|</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">+------+---+------+----+</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;"><br /></span>
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">----------------------------------------------------------</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;"><br /></span>
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">val jsonDF = sqlContext.read.json("file:///home/orienit/spark/input/student.json")</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;"><br /></span>
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">val parquetDF = sqlContext.read.parquet("file:///home/orienit/spark/input/student.parquet")</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;"><br /></span>
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;"><br /></span>
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">----------------------------------------------------------</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;"><br /></span>
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">scala> val jsonDF = sqlContext.read.json("file:///home/orienit/spark/input/student.json")</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">jsonDF: org.apache.spark.sql.DataFrame = [course: string, id: bigint, name: string, year: bigint]</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;"><br /></span>
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">scala> jsonDF.show</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">+------+---+------+----+</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">|course| id| name|year|</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">+------+---+------+----+</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">| spark| 1| anil|2016|</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">|hadoop| 5|anvith|2015|</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">|hadoop| 6| dev|2015|</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">| spark| 3| raj|2016|</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">|hadoop| 4| sunil|2015|</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">| spark| 2|venkat|2016|</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">+------+---+------+----+</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;"><br /></span>
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;"><br /></span>
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">----------------------------------------------------------</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;"><br /></span>
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;"><br /></span>
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">scala> val parquetDF = sqlContext.read.parquet("file:///home/orienit/spark/input/student.parquet")</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">SLF4J: Defaulting to no-operation (NOP) logger implementation</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">parquetDF: org.apache.spark.sql.DataFrame = [name: string, id: int, course: string, year: int]</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;"><br /></span>
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">scala> parquetDF.show</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">+------+---+------+----+</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">| name| id|course|year|</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">+------+---+------+----+</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">| anil| 1| spark|2016|</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">|anvith| 5|hadoop|2015|</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">| dev| 6|hadoop|2015|</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">| raj| 3| spark|2016|</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">| sunil| 4|hadoop|2015|</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">|venkat| 2| spark|2016|</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">+------+---+------+----+</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;"><br /></span>
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;"><br /></span>
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;"><br /></span>
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">----------------------------------------------------------</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;"><br /></span>
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">From RDMBS <span class="Apple-tab-span" style="white-space: pre;"> </span>=> jdbcDF<span class="Apple-tab-span" style="white-space: pre;"> </span>=> create table => jdbcstudent</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">From HIVE <span class="Apple-tab-span" style="white-space: pre;"> </span>=> hiveDF<span class="Apple-tab-span" style="white-space: pre;"> </span>=> create table => hivestudent</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">From JSON <span class="Apple-tab-span" style="white-space: pre;"> </span>=> jsonDF<span class="Apple-tab-span" style="white-space: pre;"> </span>=> create table => jsonstudent</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">From PARQUET <span class="Apple-tab-span" style="white-space: pre;"> </span>=> parquetDF<span class="Apple-tab-span" style="white-space: pre;"> </span>=> create table => parquetstudent</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;"><br /></span>
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">jdbcDF.registerTempTable("jdbcstudent")</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">hiveDF.registerTempTable("hivestudent")</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">jsonDF.registerTempTable("jsonstudent")</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">parquetDF.registerTempTable("parquetstudent")</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;"><br /></span>
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;"><br /></span>
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">val query1 = "select jdbcstudent.*, hivestudent.* from jdbcstudent join hivestudent on jdbcstudent.name == hivestudent.name"</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;"><br /></span>
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">sqlContext.sql(query1).show</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;"><br /></span>
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;"><br /></span>
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">scala> val query1 = "select jdbcstudent.*, hivestudent.* from jdbcstudent join hivestudent on jdbcstudent.name == hivestudent.name"</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">query1: String = select jdbcstudent.*, hivestudent.* from jdbcstudent join hivestudent on jdbcstudent.name == hivestudent.name</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;"><br /></span>
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">scala> sqlContext.sql(query1).show</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">[Stage 61:===========================================> (161 + 4 +------+---+------+----+------+---+------+----+</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">| name| id|course|year| name| id|course|year|</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">+------+---+------+----+------+---+------+----+</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">| raj| 3| spark|2016| raj| 3| cse| 1|</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">|anvith| 5|hadoop|2015|anvith| 6| mech| 2|</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">| sunil| 4|hadoop|2015| sunil| 2| cse| 1|</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">| anil| 1| spark|2016| anil| 4| mech| 2|</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">+------+---+------+----+------+---+------+----+</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;"><br /></span>
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;"><br /></span>
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">----------------------------------------------------------</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;"><br /></span>
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">val query2 = "select jsonstudent.*, parquetstudent.* from jsonstudent join parquetstudent on jsonstudent.name == parquetstudent.name"</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;"><br /></span>
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">sqlContext.sql(query2).show</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;"><br /></span>
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;"><br /></span>
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">----------------------------------------------------------</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;"><br /></span>
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">scala> val query2 = "select jsonstudent.*, parquetstudent.* from jsonstudent join parquetstudent on jsonstudent.name == parquetstudent.name"</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">query2: String = select jsonstudent.*, parquetstudent.* from jsonstudent join parquetstudent on jsonstudent.name == parquetstudent.name</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;"><br /></span>
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">scala> </span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;"><br /></span>
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">scala> sqlContext.sql(query2).show</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">+------+---+------+----+------+---+------+----+</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">|course| id| name|year| name| id|course|year|</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">+------+---+------+----+------+---+------+----+</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">| spark| 1| anil|2016| anil| 1| spark|2016|</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">|hadoop| 5|anvith|2015|anvith| 5|hadoop|2015|</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">|hadoop| 6| dev|2015| dev| 6|hadoop|2015|</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">| spark| 3| raj|2016| raj| 3| spark|2016|</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">|hadoop| 4| sunil|2015| sunil| 4|hadoop|2015|</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">| spark| 2|venkat|2016|venkat| 2| spark|2016|</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">+------+---+------+----+------+---+------+----+</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;"><br /></span>
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;"><br /></span>
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;"><br /></span>
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;"><br /></span>
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;">----------------------------------------------------------</span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: large;"><br /></span>
<br /></div>
Anonymoushttp://www.blogger.com/profile/01685371156177399870noreply@blogger.com1tag:blogger.com,1999:blog-2182833570422175384.post-9578788135981161642016-10-27T16:29:00.004-07:002016-10-27T16:29:59.763-07:00SPARK BASICS DAY 1 Practice on 27 Oct 2016<div dir="ltr" style="text-align: left;" trbidi="on">
<span style="font-size: large;"><span style="background-color: white; color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif;">Spark Day 1 Practice:</span><br style="background-color: white; color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif;" /><span style="background-color: white; color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif;">==============================================</span></span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">SparkContext (sc) => Main point of contact in spark</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">RDD => Resilient Distributed Dataset</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">RDD Features:</span><br />
<span style="font-size: large;">------------------</span><br />
<span style="font-size: large;">-> Immutable</span><br />
<span style="font-size: large;">-> Lazy evaluated</span><br />
<span style="font-size: large;">-> Cacheable</span><br />
<span style="font-size: large;">-> Type inferred</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">list <- (1,2,3,4)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">RDD operations:</span><br />
<span style="font-size: large;">--------------------</span><br />
<span style="font-size: large;">Transformation:</span><br />
<span style="font-size: large;">---------------</span><br />
<span style="font-size: large;">old rdd => new rdd</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">f(x) -> { x + 1 }</span><br />
<span style="font-size: large;">f(list) <- (2,3,4,5)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">f(x) -> { x * x }</span><br />
<span style="font-size: large;">f(list) <- (1,4,9,16)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">Action:</span><br />
<span style="font-size: large;">---------------</span><br />
<span style="font-size: large;">rdd => result</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">min(list) -> 1</span><br />
<span style="font-size: large;">max(list) -> 4</span><br />
<span style="font-size: large;">sum(list) -> 10</span><br />
<span style="font-size: large;">avg(list) -> 2.5</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">Spark Supports Scala, Java, Python and R </span><br />
<span style="font-size: large;">-------------------------------------------</span><br />
<span style="font-size: large;">Scala + Spark <span class="Apple-tab-span" style="white-space: pre;"> </span>=> spark-shell</span><br />
<span style="font-size: large;">Python + Spark <span class="Apple-tab-span" style="white-space: pre;"> </span>=> pyspark</span><br />
<span style="font-size: large;">R + Spark <span class="Apple-tab-span" style="white-space: pre;"> </span>=> SparkR</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">Spark with Scala examples:</span><br />
<span style="font-size: large;">--------------------------------------------</span><br />
<span style="font-size: large;">Create RDD in 2 ways:</span><br />
<span style="font-size: large;">------------------------</span><br />
<span style="font-size: large;">1. collection ( list / set / seq / ..)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">2. data sets (text / csv / tsv / json / ...)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val rdd1 = sc.parallelize(List(1,2,3,4))</span><br />
<span style="font-size: large;">rdd1: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[0] at parallelize at <console>:27</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">Transformations on RDD:</span><br />
<span style="font-size: large;">----------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> rdd1.foreach(println)</span><br />
<span style="font-size: large;">3</span><br />
<span style="font-size: large;">4</span><br />
<span style="font-size: large;">2</span><br />
<span style="font-size: large;">1</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> rdd1.map(x => x + 1).foreach(println)</span><br />
<span style="font-size: large;">5</span><br />
<span style="font-size: large;">4</span><br />
<span style="font-size: large;">2</span><br />
<span style="font-size: large;">3</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> rdd1.map(x => x + 1).foreach(println)</span><br />
<span style="font-size: large;">2</span><br />
<span style="font-size: large;">4</span><br />
<span style="font-size: large;">5</span><br />
<span style="font-size: large;">3</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">How to get the number of partitions:</span><br />
<span style="font-size: large;">---------------------------------------</span><br />
<span style="font-size: large;">scala> rdd1.getNumPartitions</span><br />
<span style="font-size: large;">res7: Int = 4</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> rdd1.partitions.length</span><br />
<span style="font-size: large;">res8: Int = 4</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">Create a RDD with specific number of partitions:</span><br />
<span style="font-size: large;">----------------------------------------------</span><br />
<span style="font-size: large;">scala> val rdd2 = sc.parallelize(List(1,2,3,4), 2)</span><br />
<span style="font-size: large;">rdd2: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[6] at parallelize at <console>:27</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> rdd2.getNumPartitions</span><br />
<span style="font-size: large;">res9: Int = 2</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> rdd2.partitions.length</span><br />
<span style="font-size: large;">res10: Int = 2</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">Use collect to preserve the order:</span><br />
<span style="font-size: large;">-----------------------------------------</span><br />
<span style="font-size: large;">Note: Don't use collect in Production Environment (Large Data Sets)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> rdd2.collect.foreach(println)</span><br />
<span style="font-size: large;">1</span><br />
<span style="font-size: large;">2</span><br />
<span style="font-size: large;">3</span><br />
<span style="font-size: large;">4</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> rdd2.collect.map(x => x + 1).foreach(println)</span><br />
<span style="font-size: large;">2</span><br />
<span style="font-size: large;">3</span><br />
<span style="font-size: large;">4</span><br />
<span style="font-size: large;">5</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> rdd2.collect.map(x => x * x).foreach(println)</span><br />
<span style="font-size: large;">1</span><br />
<span style="font-size: large;">4</span><br />
<span style="font-size: large;">9</span><br />
<span style="font-size: large;">16</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> rdd2.collect.filter(x => x % 2 == 0).foreach(println)</span><br />
<span style="font-size: large;">2</span><br />
<span style="font-size: large;">4</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">Actions on RDD:</span><br />
<span style="font-size: large;">----------------------------</span><br />
<span style="font-size: large;">scala> rdd2.min</span><br />
<span style="font-size: large;">res25: Int = 1</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> rdd2.max</span><br />
<span style="font-size: large;">res26: Int = 4</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> rdd2.sum</span><br />
<span style="font-size: large;">res27: Double = 10.0</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> rdd2.count</span><br />
<span style="font-size: large;">res28: Long = 4</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">Create a RDD from Text File:</span><br />
<span style="font-size: large;">---------------------------------</span><br />
<span style="font-size: large;">scala> val fileRdd = sc.textFile("file:///home/orienit/work/input/demoinput")</span><br />
<span style="font-size: large;">fileRdd: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[13] at textFile at <console>:27</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> fileRdd.getNumPartitions</span><br />
<span style="font-size: large;">res29: Int = 2</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> fileRdd.partitions.length</span><br />
<span style="font-size: large;">res30: Int = 2</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">Create a RDD from Text File with specific partition number:</span><br />
<span style="font-size: large;">-------------------------------------------------------------</span><br />
<span style="font-size: large;">scala> val fileRdd = sc.textFile("file:///home/orienit/work/input/demoinput", 1)</span><br />
<span style="font-size: large;">fileRdd: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[15] at textFile at <console>:27</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> fileRdd.getNumPartitions</span><br />
<span style="font-size: large;">res31: Int = 1</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> fileRdd.partitions.length</span><br />
<span style="font-size: large;">res32: Int = 1</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">Word Count Problem in Spark</span><br />
<span style="font-size: large;">---------------------------------------</span><br />
<span style="font-size: large;">scala> val fileRdd = sc.textFile("file:///home/orienit/work/input/demoinput", 1)</span><br />
<span style="font-size: large;">fileRdd: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[17] at textFile at <console>:27</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> fileRdd.collect.foreach(println)</span><br />
<span style="font-size: large;">I am going</span><br />
<span style="font-size: large;">to hyd</span><br />
<span style="font-size: large;">I am learning</span><br />
<span style="font-size: large;">hadoop course</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> fileRdd.flatMap(line => line).collect.foreach(println)</span><br />
<span style="font-size: large;">scala> fileRdd.flatMap(line => line.split("")).collect.foreach(println)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">I</span><br />
<br />
<span style="font-size: large;">a</span><br />
<span style="font-size: large;">m</span><br />
<br />
<span style="font-size: large;">g</span><br />
<span style="font-size: large;">o</span><br />
<span style="font-size: large;">i</span><br />
<span style="font-size: large;">n</span><br />
<span style="font-size: large;">g</span><br />
<span style="font-size: large;">.....</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> fileRdd.flatMap(line => line.split(" ")).collect.foreach(println) </span><br />
<span style="font-size: large;">I</span><br />
<span style="font-size: large;">am</span><br />
<span style="font-size: large;">going</span><br />
<span style="font-size: large;">to</span><br />
<span style="font-size: large;">hyd</span><br />
<span style="font-size: large;">I</span><br />
<span style="font-size: large;">am</span><br />
<span style="font-size: large;">learning</span><br />
<span style="font-size: large;">hadoop</span><br />
<span style="font-size: large;">course</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val wordsRdd = fileRdd.flatMap(line => line.split(" "))</span><br />
<span style="font-size: large;">wordsRdd: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[21] at flatMap at <console>:29</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> wordsRdd.collect.foreach(println)</span><br />
<span style="font-size: large;">I</span><br />
<span style="font-size: large;">am</span><br />
<span style="font-size: large;">going</span><br />
<span style="font-size: large;">to</span><br />
<span style="font-size: large;">hyd</span><br />
<span style="font-size: large;">I</span><br />
<span style="font-size: large;">am</span><br />
<span style="font-size: large;">learning</span><br />
<span style="font-size: large;">hadoop</span><br />
<span style="font-size: large;">course</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val countsRdd = wordsRdd.map(word => (word,1))</span><br />
<span style="font-size: large;">countsRdd: org.apache.spark.rdd.RDD[(String, Int)] = MapPartitionsRDD[22] at map at <console>:31</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> countsRdd.collect.foreach(println)</span><br />
<span style="font-size: large;">(I,1)</span><br />
<span style="font-size: large;">(am,1)</span><br />
<span style="font-size: large;">(going,1)</span><br />
<span style="font-size: large;">(to,1)</span><br />
<span style="font-size: large;">(hyd,1)</span><br />
<span style="font-size: large;">(I,1)</span><br />
<span style="font-size: large;">(am,1)</span><br />
<span style="font-size: large;">(learning,1)</span><br />
<span style="font-size: large;">(hadoop,1)</span><br />
<span style="font-size: large;">(course,1)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val wordCountRdd = countsRdd.reduceByKey( _ + _)</span><br />
<span style="font-size: large;">wordCountRdd: org.apache.spark.rdd.RDD[(String, Int)] = ShuffledRDD[23] at reduceByKey at <console>:33</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val wordCountRdd = countsRdd.reduceByKey((a,b) => a + b)</span><br />
<span style="font-size: large;">wordCountRdd: org.apache.spark.rdd.RDD[(String, Int)] = ShuffledRDD[24] at reduceByKey at <console>:33</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> wordCountRdd.collect.foreach(println)</span><br />
<span style="font-size: large;">(learning,1)</span><br />
<span style="font-size: large;">(hadoop,1)</span><br />
<span style="font-size: large;">(am,2)</span><br />
<span style="font-size: large;">(hyd,1)</span><br />
<span style="font-size: large;">(I,2)</span><br />
<span style="font-size: large;">(to,1)</span><br />
<span style="font-size: large;">(going,1)</span><br />
<span style="font-size: large;">(course,1)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">Solutions:</span><br />
<span style="font-size: large;">-------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">Solution1:</span><br />
<span style="font-size: large;">-------------</span><br />
<span style="font-size: large;">val fileRdd = sc.textFile("file:///home/orienit/work/input/demoinput", 1)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">val wordsRdd = fileRdd.flatMap(line => line.split(" "))</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">val countsRdd = wordsRdd.map(word => (word,1))</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">val wordCountRdd = countsRdd.reduceByKey( _ + _)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">wordCountRdd.saveAsTextFile("file:///home/orienit/work/output/wordcount-op")</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">Solution2:</span><br />
<span style="font-size: large;">-------------</span><br />
<span style="font-size: large;">val fileRdd = sc.textFile("file:///home/orienit/work/input/demoinput", 1)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">val wordCountRdd = fileRdd.flatMap(line => line.split(" ")).map(word => (word,1)).reduceByKey( _ + _)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">wordCountRdd.saveAsTextFile("file:///home/orienit/work/output/wordcount-op")</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">Solution3:</span><br />
<span style="font-size: large;">-------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">sc.textFile("file:///home/orienit/work/input/demoinput", 1).flatMap(line => line.split(" ")).map(word => (word,1)).reduceByKey( _ + _).saveAsTextFile("file:///home/orienit/work/output/wordcount-op")</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">Grep Job using Spark:</span><br />
<span style="font-size: large;">------------------------------------</span><br />
<span style="font-size: large;">val fileRdd = sc.textFile("file:///home/orienit/work/input/demoinput", 1)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">val filterRdd = fileRdd.filter(line => line.contains("am"))</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">filterRdd.saveAsTextFile("file:///home/orienit/work/output/grep-op")</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">Sed Job using Spark:</span><br />
<span style="font-size: large;">------------------------------------</span><br />
<span style="font-size: large;">val fileRdd = sc.textFile("file:///home/orienit/work/input/demoinput", 1)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">val mapRdd = fileRdd.map(line => line.replace("am", "at"))</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">mapRdd.saveAsTextFile("file:///home/orienit/work/output/sed-op")</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<br /></div>
Anonymoushttp://www.blogger.com/profile/01685371156177399870noreply@blogger.com0tag:blogger.com,1999:blog-2182833570422175384.post-40041633557895436552016-10-26T17:27:00.006-07:002016-10-27T00:48:47.563-07:00Twitter Data Sentiment Analysis Using Pig<div dir="ltr" style="text-align: left;" trbidi="on">
<span style="font-size: large;"><b><u>Pre-Requisites of Twitter Data + Pig + Sentiment Analysis Project:</u></b></span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">hadoop-2.6.0</span><br />
<span style="font-size: large;">pig-0.15.0</span><br />
<span style="font-size: large;">java-1.7</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><b><u>NOTE: Make sure that install all the above components</u></b></span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><b><u>Twitter Data + Pig + Sentiment Analysis Project Download Links:</u></b></span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">`hadoop-2.6.0.tar.gz` ==> <a href="https://archive.apache.org/dist/hadoop/core/hadoop-2.6.0/hadoop-2.6.0.tar.gz">link</a></span><br />
<span style="font-size: large;">`pig-0.15.0.tar.gz` ==> <a href="http://mirror.fibergrid.in/apache/pig/pig-0.15.0/pig-0.15.0.tar.gz">link</a></span><br />
<span style="font-size: large;">`sentimentanalysis-pig.jar` ==> <a href="https://drive.google.com/open?id=0B9ji3P-nUjrxbjY3TThCeTBrcFE">link</a></span><br />
<span style="font-size: large;">`tweets` ==> <a href="https://drive.google.com/open?id=0B9ji3P-nUjrxdnN1dHFqbm1xOVU">link</a></span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">-----------------------------------------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="color: purple; font-size: large;">1. Create `<b>sentimentanalysis</b>` folder in your machine</span><br />
<span style="font-size: large;"><br /></span>
<span style="color: red; font-size: large;"><i>command: mkdir ~/sentimentanalysis</i></span><br />
<span style="font-size: large;"><br /></span>
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjCPtrju90E3BEEj7DffX5e3fsEwZ0AUBd9-SjTXqCLWr-ty-WdF5sRC3J6HMkdotqvceW3MzBBTwHUZweFqn2eKrA8iiQN01kV44rdwOy5LrTP3qLXwERZrxL2jRxub4CYRcoUzfB8RxXk/s1600/create+a+folder.png"><span style="font-size: large;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjCPtrju90E3BEEj7DffX5e3fsEwZ0AUBd9-SjTXqCLWr-ty-WdF5sRC3J6HMkdotqvceW3MzBBTwHUZweFqn2eKrA8iiQN01kV44rdwOy5LrTP3qLXwERZrxL2jRxub4CYRcoUzfB8RxXk/s640/create+a+folder.png" /></span></a><br />
<span style="font-size: large;"><br /></span>
<span style="color: purple; font-size: large;"><br /></span>
<span style="color: purple; font-size: large;">2. Download sample tweets or Download twitter data using flume to do Sentiment Analysis and copy to '<b>~/sentimentanalysis</b>' folder</span><br />
<span style="font-size: large;"><br /></span>
<span style="color: red; font-size: large;">Note: Download sample tweets <a href="https://drive.google.com/open?id=0B9ji3P-nUjrxdnN1dHFqbm1xOVU">link</a></span><br />
<span style="font-size: large;"><br /></span>
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgdaKMmGS7kowkWEt1-iONt9q6p4AC74vNjxReRGrtB58qIiil21yV86c3VnrF1lsx__b11gV2zixhRiv6TFydItZsIZRGckD56EzLsVGtevlak5Ppbc8sO1KZn21pbuzVB9duIHA9VfsXn/s1600/tweets.png"><span style="font-size: large;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgdaKMmGS7kowkWEt1-iONt9q6p4AC74vNjxReRGrtB58qIiil21yV86c3VnrF1lsx__b11gV2zixhRiv6TFydItZsIZRGckD56EzLsVGtevlak5Ppbc8sO1KZn21pbuzVB9duIHA9VfsXn/s640/tweets.png" /></span></a><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="color: purple; font-size: large;"><b>Example: Sample Tweets</b></span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">i am learning hadoop course</span><br />
<span style="font-size: large;">i am good in hadoop</span><br />
<span style="font-size: large;">i am learning hadoop</span><br />
<span style="font-size: large;">i am not feeling well</span><br />
<span style="font-size: large;">why we need bigdata </span><br />
<span style="font-size: large;">i am not happy with rdbms </span><br />
<span style="font-size: large;">ravi is not working today </span><br />
<span style="font-size: large;">india got the world cup </span><br />
<span style="font-size: large;">learn hadoop from kalyan blog </span><br />
<span style="font-size: large;">learn spark from kalyan blog </span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="color: purple; font-size: large;">3. verify using <b>cat</b> command </span><br />
<span style="font-size: large;"><br /></span>
<span style="color: red; font-size: large;"><i>command: cat ~/sentimentanalysis/tweets</i></span><br />
<span style="font-size: large;"><br /></span>
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhIyRp2yMCa96vlTQWWsc4IMgx9fNPzXLoM8QaecOGNC6UXc1Uqwt2DQmXVd3ExpbqwdfBBz6jX5ymFkebHAOemxDH47yu6CDuXM_r00Jp4yDfZwOU-vcu3lgWYctuSnvDtyjJMHykuif_6/s1600/cat+command.png"><span style="font-size: large;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhIyRp2yMCa96vlTQWWsc4IMgx9fNPzXLoM8QaecOGNC6UXc1Uqwt2DQmXVd3ExpbqwdfBBz6jX5ymFkebHAOemxDH47yu6CDuXM_r00Jp4yDfZwOU-vcu3lgWYctuSnvDtyjJMHykuif_6/s640/cat+command.png" /></span></a><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="color: purple; font-size: large;">4. start the hadoop using below command</span><br />
<span style="font-size: large;"><br /></span>
<span style="color: red; font-size: large;"><i>command: start-all.sh</i></span><br />
<span style="font-size: large;"><br /></span>
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh8y6imu7la3srCMjHBApNGz-kjpjDMBjVHN8jH7UCY7bZZrGbFZbmVjZULyiJEoKfXK8Uk9TxdefRiEApDrZGDdFc3TB6Nb0xgKKK8ARmJEXvmrCqVs9PReuGEgqC1ol8IfqICTiI9OyT3/s1600/start+the+hadoop.png"><span style="font-size: large;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh8y6imu7la3srCMjHBApNGz-kjpjDMBjVHN8jH7UCY7bZZrGbFZbmVjZULyiJEoKfXK8Uk9TxdefRiEApDrZGDdFc3TB6Nb0xgKKK8ARmJEXvmrCqVs9PReuGEgqC1ol8IfqICTiI9OyT3/s640/start+the+hadoop.png" /></span></a><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="color: purple; font-size: large;">5. verify is running or not using "<b>jps</b>" command</span><br />
<span style="font-size: large;"><br /></span>
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh8y6imu7la3srCMjHBApNGz-kjpjDMBjVHN8jH7UCY7bZZrGbFZbmVjZULyiJEoKfXK8Uk9TxdefRiEApDrZGDdFc3TB6Nb0xgKKK8ARmJEXvmrCqVs9PReuGEgqC1ol8IfqICTiI9OyT3/s1600/start+the+hadoop.png"><span style="font-size: large;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh8y6imu7la3srCMjHBApNGz-kjpjDMBjVHN8jH7UCY7bZZrGbFZbmVjZULyiJEoKfXK8Uk9TxdefRiEApDrZGDdFc3TB6Nb0xgKKK8ARmJEXvmrCqVs9PReuGEgqC1ol8IfqICTiI9OyT3/s640/start+the+hadoop.png" /></span></a><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="color: purple; font-size: large;">6. Open browser using below url</span><br />
<span style="font-size: large;"><br /></span>
<a href="http://localhost:50070/dfshealth.jsp"><span style="color: red; font-size: large;">http://localhost:50070/dfshealth.jsp</span></a><br />
<span style="font-size: large;"><br /></span>
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEimOqkN5j5X6r7q7SZ5TY5puOzimzoGSbE0Wz1J7e5E6zma5gDtyH9gpGE7BkWGDBzbeHjOP0hzvViQ1ZZ6dGOVTRjHOBo8Ptgz-wGxrtpY5KprlleNrk2pWsy5oTkblSLSayU7KTYHyWuq/s1600/browser+data.png"><span style="font-size: large;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEimOqkN5j5X6r7q7SZ5TY5puOzimzoGSbE0Wz1J7e5E6zma5gDtyH9gpGE7BkWGDBzbeHjOP0hzvViQ1ZZ6dGOVTRjHOBo8Ptgz-wGxrtpY5KprlleNrk2pWsy5oTkblSLSayU7KTYHyWuq/s640/browser+data.png" /></span></a><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="color: purple; font-size: large;">7. Load the sample <b>tweets</b> into <b>HDFS</b></span><br />
<span style="font-size: large;"><br /></span>
<span style="color: red; font-size: large;"><i>hadoop fs -mkdir -p /kalyan/sentimentanalysis/pig/input</i></span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjm5ge-WStZeU21IWcJ0BnPBYld0xOyxE70u_B3Jxjg1rzSWSnh9M54VIeF2BK7DnF6higwLxMKh8VG945D7QsUFcsJsHLczQcWVAMQGK_zp1Lhqlq7sBRJrBcKzZiuOa_wZUH8wPzZaqQu/s1600/create+input+folder.png"><span style="font-size: large;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjm5ge-WStZeU21IWcJ0BnPBYld0xOyxE70u_B3Jxjg1rzSWSnh9M54VIeF2BK7DnF6higwLxMKh8VG945D7QsUFcsJsHLczQcWVAMQGK_zp1Lhqlq7sBRJrBcKzZiuOa_wZUH8wPzZaqQu/s640/create+input+folder.png" /></span></a><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj_bYYHdhFsVkLTysx6VgmvInXAGti6OJoYkue02BwWNZ9kVpAXID4Ejg_fsGMrNbYIUbdTjxAJreTbzQYANI9npex5bTSfOho5XRATKGYnQ_GuUVD2VbkrF2_r1fwSOnPOZUISl03hoqQy/s1600/input+folder.png"><span style="font-size: large;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj_bYYHdhFsVkLTysx6VgmvInXAGti6OJoYkue02BwWNZ9kVpAXID4Ejg_fsGMrNbYIUbdTjxAJreTbzQYANI9npex5bTSfOho5XRATKGYnQ_GuUVD2VbkrF2_r1fwSOnPOZUISl03hoqQy/s640/input+folder.png" /></span></a><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="color: red; font-size: large;"><i>hadoop fs -put ~/sentimentanalysis/tweets /kalyan/sentimentanalysis/pig/input</i></span><br />
<span style="font-size: large;"><br /></span>
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj4BcEmtvDhAkFgXBA6tHNtSxG_OtI9hbEQ3AIpYB2deo-z6rjKv3Ye0sK4pl3AB1MXE3rThyqnAzU1ia5IwEt_DPnujnPWMy3V8WItj48YZ_GePTfBkyhASNPBRaE3MCUyqklaTJQgrG9z/s1600/put+command.png"><span style="font-size: large;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj4BcEmtvDhAkFgXBA6tHNtSxG_OtI9hbEQ3AIpYB2deo-z6rjKv3Ye0sK4pl3AB1MXE3rThyqnAzU1ia5IwEt_DPnujnPWMy3V8WItj48YZ_GePTfBkyhASNPBRaE3MCUyqklaTJQgrG9z/s640/put+command.png" /></span></a><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhpi3K91rr9WSANIGEM_Cw7NWcll4xMAEEfxk1xGEJm6foVq1XWhBNaswXSaZLw1Hsn7EYE-n2CcK5EtDbqtrb_0Ob2zL0iaR1ed3_XYVTntwYID-3WKuUaBv6gnpE-Ila1rZrmDlTUSpoQ/s1600/file+inside+the+hdfs.png"><span style="font-size: large;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhpi3K91rr9WSANIGEM_Cw7NWcll4xMAEEfxk1xGEJm6foVq1XWhBNaswXSaZLw1Hsn7EYE-n2CcK5EtDbqtrb_0Ob2zL0iaR1ed3_XYVTntwYID-3WKuUaBv6gnpE-Ila1rZrmDlTUSpoQ/s640/file+inside+the+hdfs.png" /></span></a><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhqOsap8WT353S3N7E5pBzQ_GvqR7N1D0TJuRq7nXmX_szcch2_gRyzsr1KV4ECeqK3mu1A8F-BlqEqKhXnM0Sb1ftU3MJDvKIfZVUFoCiLe0VRM31c70BQBL_txT7kAPq1uSJJkwrLcH8H/s1600/sample+tweets.png"><span style="font-size: large;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhqOsap8WT353S3N7E5pBzQ_GvqR7N1D0TJuRq7nXmX_szcch2_gRyzsr1KV4ECeqK3mu1A8F-BlqEqKhXnM0Sb1ftU3MJDvKIfZVUFoCiLe0VRM31c70BQBL_txT7kAPq1uSJJkwrLcH8H/s640/sample+tweets.png" /></span></a><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="color: purple; font-size: large;">8. start the pig either <b>local mode</b> or <b>mapreduce mode</b></span><br />
<span style="font-size: large;"><br /></span>
<span style="color: red; font-size: large;"><i>command: pig -x mapreduce</i></span><br />
<span style="font-size: large;"><br /></span>
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjLkONlUH10gBrZZLcjdVX8MuIuqcP1a-geMifmPUuKqpOysL3ZRtBBcP7CdgmOkpFy24Vw3gic5LOFu5DHF-p79E6TbH4xr1PcMfvw1FFvNlNfCJjYVCjnKWkNq_kBKL4kUL9Pw0NI5D4a/s1600/run+pig+in+mapreduce+mode.png"><span style="font-size: large;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjLkONlUH10gBrZZLcjdVX8MuIuqcP1a-geMifmPUuKqpOysL3ZRtBBcP7CdgmOkpFy24Vw3gic5LOFu5DHF-p79E6TbH4xr1PcMfvw1FFvNlNfCJjYVCjnKWkNq_kBKL4kUL9Pw0NI5D4a/s640/run+pig+in+mapreduce+mode.png" /></span></a><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="color: purple; font-size: large;">9. Load the sample tweets in pig `<b>tweets</b>` bag</span><br />
<span style="font-size: large;"><br /></span>
<span style="color: red; font-size: large;"><i>tweets = load '/kalyan/sentimentanalysis/pig/input' AS (tweet : chararray);</i></span><br />
<span style="font-size: large;"><br /></span>
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhLOclM6IFzKgiiqtGuP07RljSRyhBUbhIErnVvkSPZ6iEMxYQcWwGRO_bskVxJBJ4MKJkkXcdPZm59RJr24FEs2buz09ArMvGpcsBWTbGRWrJPx8dxoupFQPhNO1n8fuXnnK2CGooM6j-w/s1600/load+the+data+into+pig+bag.png"><span style="font-size: large;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhLOclM6IFzKgiiqtGuP07RljSRyhBUbhIErnVvkSPZ6iEMxYQcWwGRO_bskVxJBJ4MKJkkXcdPZm59RJr24FEs2buz09ArMvGpcsBWTbGRWrJPx8dxoupFQPhNO1n8fuXnnK2CGooM6j-w/s640/load+the+data+into+pig+bag.png" /></span></a><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="color: purple; font-size: large;">10. Display the data in pig `<b>tweets</b>` bag</span><br />
<span style="font-size: large;"><br /></span>
<i style="color: red; font-size: x-large;">dump tweets;</i><br />
<br />
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhUe6eFquVYvIfY2rElnufvyFD_Gsj5rAYLqClr5Jj06UL5YGwMNtr6BWO8D0bzBkyDQYR3KxD97V3ZOcBe0R_MZBQ2QrIFkAn59xw_ew99S82hRR0GhiKD_aAN6X_dy7zSa7Iei6FFZW-7/s1600/display+data+in+bag.png"><span style="font-size: large;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhUe6eFquVYvIfY2rElnufvyFD_Gsj5rAYLqClr5Jj06UL5YGwMNtr6BWO8D0bzBkyDQYR3KxD97V3ZOcBe0R_MZBQ2QrIFkAn59xw_ew99S82hRR0GhiKD_aAN6X_dy7zSa7Iei6FFZW-7/s640/display+data+in+bag.png" /></span></a><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="color: purple; font-size: large;">11. Download `<b>sentimentanalysis-pig.jar</b>` file and copy to '~/sentimentanalysis' folder</span><br />
<span style="font-size: large;"><br /></span>
<span style="color: red; font-size: large;"><i>Note: Download sentimentanalysis-pig.jar <a href="https://drive.google.com/open?id=0B9ji3P-nUjrxTVd5T1luRVhoQm8">link</a></i></span><br />
<div>
<br />
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg_hzyAWuEJs4yE59Jy6iA6bKHv991_Fv6DnAVn-lF8rwUv5eGOigzYUsbPZyXkHLP3HW4-YMkakAjmB8psY-Xz1v5NSArfPSmMIYzgXKkqEWq2Q_RUnlGnJMevVHzQMr_5c73Afz8HWUvj/s1600/sentiment+analysis+jar+file+in+pig.png"><span style="font-size: large;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg_hzyAWuEJs4yE59Jy6iA6bKHv991_Fv6DnAVn-lF8rwUv5eGOigzYUsbPZyXkHLP3HW4-YMkakAjmB8psY-Xz1v5NSArfPSmMIYzgXKkqEWq2Q_RUnlGnJMevVHzQMr_5c73Afz8HWUvj/s640/sentiment+analysis+jar+file+in+pig.png" /></span></a><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="color: purple; font-size: large;">12. Load the `<b>sentimentanalysis-pig.jar</b>` into HDFS</span><br />
<span style="font-size: large;"><br /></span>
<span style="color: red; font-size: large;"><i>hadoop fs -put ~/sentimentanalysis/sentimentanalysis-pig.jar /kalyan/sentimentanalysis/pig</i></span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgK7mKALvNb-77K7HV0NcCEsSHTf7T_sC9CTji-dZ5DMOw-mjyfdHaq2i1lK29rBuQ3_HfckedkxbxTJWWULqWzsnDUnOwhAbK5vZEFQK1BQs_v9sjLoGvjbXRGaqACQmYUXyM1Tqc8LwD8/s1600/put+command+for+pig+jar.png"><span style="font-size: large;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgK7mKALvNb-77K7HV0NcCEsSHTf7T_sC9CTji-dZ5DMOw-mjyfdHaq2i1lK29rBuQ3_HfckedkxbxTJWWULqWzsnDUnOwhAbK5vZEFQK1BQs_v9sjLoGvjbXRGaqACQmYUXyM1Tqc8LwD8/s640/put+command+for+pig+jar.png" /></span></a><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhdIketf8WnrvWnkyILjvPjOvuH04u5TY4cczSefjI2ZdfacW4aNT6kLLWcBdVuOCT8Y21P9IG-q6iBKL0EY8jqlGGbI8WYjmxbQiTSiZiWNBx0_BRl5-6XsTzjik1l7fJ1yYtl0ZuiVymo/s1600/pig+jar+file+in+hdfs.png"><span style="font-size: large;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhdIketf8WnrvWnkyILjvPjOvuH04u5TY4cczSefjI2ZdfacW4aNT6kLLWcBdVuOCT8Y21P9IG-q6iBKL0EY8jqlGGbI8WYjmxbQiTSiZiWNBx0_BRl5-6XsTzjik1l7fJ1yYtl0ZuiVymo/s640/pig+jar+file+in+hdfs.png" /></span></a><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="color: purple; font-size: large;">13. Add `<b>sentimentanalysis-pig.jar`</b> file into hive class path using below command</span><br />
<span style="font-size: large;"><br /></span>
<span style="color: red; font-size: large;"><i>REGISTER <PATH OF THE JAR FILE>;</i></span><br />
<span style="color: red; font-size: large;"><i><br /></i></span>
<span style="color: red; font-size: large;"><i>REGISTER hdfs://localhost:8020/kalyan/sentimentanalysis/pig/sentimentanalysis-pig.jar;</i></span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhMaWFNUym3ohAzNDReDqPZYhhbhwEdJm_MKRmDti-39FgFppLKFPGAxjr1NXdazRFpTmQIBE4RgRJ_YNvWpxr8C-4kHW17tcxXa-PWJmoC-w7rL4-cTg4b234PdWd43WrvNmRtRZ5zSB2h/s1600/register+path+of+the+jar+file.png"><span style="font-size: large;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhMaWFNUym3ohAzNDReDqPZYhhbhwEdJm_MKRmDti-39FgFppLKFPGAxjr1NXdazRFpTmQIBE4RgRJ_YNvWpxr8C-4kHW17tcxXa-PWJmoC-w7rL4-cTg4b234PdWd43WrvNmRtRZ5zSB2h/s640/register+path+of+the+jar+file.png" /></span></a><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="color: purple; font-size: large;">14. Define the <b>sentiment</b> <b>function</b> in pig</span><br />
<span style="font-size: large;"><br /></span>
<span style="color: red; font-size: large;"><i>DEFINE <function name> 'UDF CLASS NAME WITH PACKAGE'</i></span><br />
<span style="color: red; font-size: large;"><i><br /></i></span>
<span style="color: red; font-size: large;"><i>DEFINE sentiment com.orienit.kalyan.sentimentanalysis.pig.udf.SentimentUdf();</i></span><br />
<span style="font-size: large;"><br /></span>
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg-mg4ZATmWEIhLTPklyl_g8zTZxj5ufXbI7_FoEqUkPNH6e6l4xI2f_7WbEQqcl2Q3O6BfSspMw4r_vzeOYENFHMal5-2ldBZW_ZTUZ0XmCCQ_KfNI344KisWkoyQsjyPNkaDcvNxd8ajv/s1600/create+a+udf+fucntion+in+pig.png"><span style="font-size: large;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg-mg4ZATmWEIhLTPklyl_g8zTZxj5ufXbI7_FoEqUkPNH6e6l4xI2f_7WbEQqcl2Q3O6BfSspMw4r_vzeOYENFHMal5-2ldBZW_ZTUZ0XmCCQ_KfNI344KisWkoyQsjyPNkaDcvNxd8ajv/s640/create+a+udf+fucntion+in+pig.png" /></span></a><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="color: purple; font-size: large;">15. Analyse the <b>tweets</b> using sentiment function using below commands</span><br />
<span style="font-size: large;"><br /></span>
<span style="color: red; font-size: large;"><i>sentimenttweets1 = FOREACH tweets GENERATE tweet, sentiment(tweet) as sentiment;</i></span><br />
<span style="font-size: large;"><br /></span>
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhyTbqq62ZHeiWdpeWd2PytHPsPK-f1lSnhinSzqpUdRv2xlMAHVonPH1GAYm262kD1fAvgm4FgyndWJKR9zriV22yAoWs2s9B1gEDkqmUSSC1_1v5dw7_Qwgo7484LrgCjAdYMfKIePbm0/s1600/analyse+tweets+using+pig.png"><span style="font-size: large;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhyTbqq62ZHeiWdpeWd2PytHPsPK-f1lSnhinSzqpUdRv2xlMAHVonPH1GAYm262kD1fAvgm4FgyndWJKR9zriV22yAoWs2s9B1gEDkqmUSSC1_1v5dw7_Qwgo7484LrgCjAdYMfKIePbm0/s640/analyse+tweets+using+pig.png" /></span></a><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="color: purple; font-size: large;">16. Display the data in `<b>sentimenttweets1</b>` bag in pig</span><br />
<span style="font-size: large;"><br /></span>
<span style="color: red; font-size: large;"><i>dump sentimenttweets1;</i></span><br />
<span style="font-size: large;"><br /></span>
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiD_AGhrfg9fcqiwrHoKQCnT4UtHdRAZmKo_FZFfaClTHJ19gCiLbT68MAzvMkVdCPI9wzSYyqtQW5-SeNiEnCfg35qiCxy_gWo7sdr-uSVX-bXE1Pr2BIVv4YMmFjVlZS2Un7vTfyvhLmX/s1600/result+in+sentiment.png"><span style="font-size: large;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiD_AGhrfg9fcqiwrHoKQCnT4UtHdRAZmKo_FZFfaClTHJ19gCiLbT68MAzvMkVdCPI9wzSYyqtQW5-SeNiEnCfg35qiCxy_gWo7sdr-uSVX-bXE1Pr2BIVv4YMmFjVlZS2Un7vTfyvhLmX/s640/result+in+sentiment.png" /></span></a><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="color: purple; font-size: large;">17. Store the `<b>sentimenttweets1</b>` result into <b>hdfs</b> folder</span><br />
<br />
<span style="color: red; font-size: large;"><i>STORE sentimenttweets1 INTO '/kalyan/sentimentanalysis/pig/sentimenttweets1';</i></span><br />
<span style="font-size: large;"><br /></span>
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgfPmgmKmbQBGaNfWvvIqlchwHiL_NTlDU2iXB1Q9Wbs1KX8AT6saxfQmfyL7rxEPhYL_FJeQ6wxTHlugsYJJbzfJqPKnmaq3JSCYyZD0ZvNGSkWRxK9D08Smk1pDexOCs7rZL9BVZgS-vA/s1600/store+the+sentiment+data+into+hdfs.png"><span style="font-size: large;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgfPmgmKmbQBGaNfWvvIqlchwHiL_NTlDU2iXB1Q9Wbs1KX8AT6saxfQmfyL7rxEPhYL_FJeQ6wxTHlugsYJJbzfJqPKnmaq3JSCYyZD0ZvNGSkWRxK9D08Smk1pDexOCs7rZL9BVZgS-vA/s640/store+the+sentiment+data+into+hdfs.png" /></span></a><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgudsxx-Z5auJ_mqUq8oil58UMyTfaRDW6T2-XC1JT_-390Fm2eRNTaeuiEUXsQ7WW6Quu4fKYdhVmQ87lZvLT2R8JubdwIfX7YCMDDnT-TQYUv4Iz6GkXUn1THTdc0VcmeayBo5ZTC67hP/s1600/disply+sentiment+data+in+hdfs.png"><span style="font-size: large;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgudsxx-Z5auJ_mqUq8oil58UMyTfaRDW6T2-XC1JT_-390Fm2eRNTaeuiEUXsQ7WW6Quu4fKYdhVmQ87lZvLT2R8JubdwIfX7YCMDDnT-TQYUv4Iz6GkXUn1THTdc0VcmeayBo5ZTC67hP/s640/disply+sentiment+data+in+hdfs.png" /></span></a><br />
<span style="font-size: large;"><br /></span>
<br />
<span style="color: purple; font-size: large;">18. Analyse the `<b>tweets</b>` from `<b>sentimenttweets1</b>` bag using case statement</span><br />
<span style="font-size: large;"><br /></span>
<span style="color: red; font-size: large;"><i>sentimenttweets2 = FOREACH sentimenttweets1 GENERATE tweet, (</i></span><br />
<span style="color: red; font-size: large;"><i>CASE</i></span><br />
<span style="color: red; font-size: large;"><i>WHEN sentiment == 1 THEN 'positive'</i></span><br />
<span style="color: red; font-size: large;"><i>WHEN sentiment == 0 THEN 'neutral'</i></span><br />
<span style="color: red; font-size: large;"><i>WHEN sentiment == -1 THEN 'negative'</i></span><br />
<span style="color: red; font-size: large;"><i>END</i></span><br />
<span style="color: red; font-size: large;"><i>);</i></span><br />
<span style="font-size: large;"><br /></span>
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgzsUheeYK0zrTVhk1u5oKW3IuG0lehoRhVKgxVTbsEHm1o4RFdFyA-67eaAHZQE4oAlZOoMm1iQxIp4ZyqHZpXtQUb2GVUFQUaVsE0K7T_-txOzS6QXfJ_poulqlOCwThS2Ug9hGLjDgbq/s1600/case+statement+in+pig.png"><span style="font-size: large;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgzsUheeYK0zrTVhk1u5oKW3IuG0lehoRhVKgxVTbsEHm1o4RFdFyA-67eaAHZQE4oAlZOoMm1iQxIp4ZyqHZpXtQUb2GVUFQUaVsE0K7T_-txOzS6QXfJ_poulqlOCwThS2Ug9hGLjDgbq/s640/case+statement+in+pig.png" /></span></a><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="color: purple; font-size: large;">19. Store the `<b>sentimenttweets2</b>` result into hdfs folder</span><br />
<span style="font-size: large;"><br /></span>
<span style="color: red; font-size: large;"><i>STORE sentimenttweets2 INTO '/kalyan/sentimentanalysis/pig/sentimenttweets2';</i></span><br />
<span style="font-size: large;"><br /></span>
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiU2RjG_xYKTb0Wkx5BAL6QMltTqjDSMRPxYET6JCxXiN8OIdMwCaleFgpW0RiU0TDW_6MoZrpn5UWZhBiIE-x0eQDTJwrf8yfBjvn-z9viStegwKY7BuRWufzNpaAzc8_M8r3Uz784Mlt8/s1600/store+sentiment+tweets+in+to+hdfs+case+statement.png"><span style="font-size: large;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiU2RjG_xYKTb0Wkx5BAL6QMltTqjDSMRPxYET6JCxXiN8OIdMwCaleFgpW0RiU0TDW_6MoZrpn5UWZhBiIE-x0eQDTJwrf8yfBjvn-z9viStegwKY7BuRWufzNpaAzc8_M8r3Uz784Mlt8/s640/store+sentiment+tweets+in+to+hdfs+case+statement.png" /></span></a><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg0rENicHBv8ov-gU4jmJnd1eVomUFCuXGNcDntAcG9-piVGOvgyZnwrnYTKYUMJV_Gl1qH_eB1EjP4ir84u_p2UR-aJtZ0LBFCxWznjKP9RwkUiL9zBNo6hcZnoveaDvGy407UyUqjJiba/s1600/display+the+sentiment+tweets+in+pig.png"><span style="font-size: large;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg0rENicHBv8ov-gU4jmJnd1eVomUFCuXGNcDntAcG9-piVGOvgyZnwrnYTKYUMJV_Gl1qH_eB1EjP4ir84u_p2UR-aJtZ0LBFCxWznjKP9RwkUiL9zBNo6hcZnoveaDvGy407UyUqjJiba/s640/display+the+sentiment+tweets+in+pig.png" /></span></a></div>
<div>
<span style="font-size: large;"><br /></span></div>
<div>
<span style="font-size: large;"><br /></span></div>
<div>
<span style="font-size: large;"><br /></span></div>
<div>
<span style="font-size: large;"><br /></span></div>
<div>
<br /></div>
</div>
Anonymoushttp://www.blogger.com/profile/01685371156177399870noreply@blogger.com2tag:blogger.com,1999:blog-2182833570422175384.post-33352635782921525362016-10-26T17:24:00.000-07:002016-10-26T23:59:50.578-07:00Twitter Data Sentiment Analysis Using Hive<div dir="ltr" style="text-align: left;" trbidi="on">
<b><span style="font-size: large;"><u>Pre-Requisites of Twitter Data + Hive + Sentiment Analysis Project:</u></span></b><br />
<span style="font-size: large;">hadoop-2.6.0</span><br />
<span style="font-size: large;">hive-1.2.1</span><br />
<span style="font-size: large;">java-1.7</span><br />
<span style="font-size: large;"><br /></span>
<b><span style="font-size: large;"><u>NOTE: Make sure that install all the above components</u></span></b><br />
<b><span style="font-size: large;"><u><br /></u></span></b>
<b><span style="font-size: large;"><u>Twitter Data + Hive + Sentiment Analysis Project Download Links:</u></span></b><br />
<div>
<span style="font-size: large;">`hadoop-2.6.0.tar.gz` ==> <a href="https://archive.apache.org/dist/hadoop/core/hadoop-2.6.0/hadoop-2.6.0.tar.gz">link</a></span><br />
<span style="font-size: large;">`apache-hive-1.2.1-src.tar.gz` ==> <a href="http://mirror.fibergrid.in/apache/hive/hive-1.2.1/apache-hive-1.2.1-bin.tar.gz">link</a></span><br />
<span style="font-size: large;">`sentimentanalysis-hive.jar` ==> <a href="https://drive.google.com/open?id=0B9ji3P-nUjrxTVd5T1luRVhoQm8">link</a></span><br />
<span style="font-size: large;">`tweets` ==> <a href="https://drive.google.com/open?id=0B9ji3P-nUjrxZVRjN1g1X0VRY3M">link</a></span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">-----------------------------------------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="color: purple; font-size: large;">1. Create `<b>sentimentanalysis</b>` folder in your machine</span><br />
<span style="font-size: large;"><br /></span>
<span style="color: red; font-size: large;"><b>command:</b> mkdir ~/sentimentanalysis</span><br />
<span style="font-size: large;"><br /></span>
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhIaLIwch5HRTlJ9EKfFBgvTTGi2gkOEOhd2PWgy4kJDv6brx_hdRFOEHOV_Ptyy-RL0aVMyPNLgQVUH3rlPjs5EgzXPE7hUHLwPY6pBjzd5qgZB704-kxWjHiSkXtnuIgucKn7jCKb2G4C/s1600/create+a+folder.png"><span style="font-size: large;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhIaLIwch5HRTlJ9EKfFBgvTTGi2gkOEOhd2PWgy4kJDv6brx_hdRFOEHOV_Ptyy-RL0aVMyPNLgQVUH3rlPjs5EgzXPE7hUHLwPY6pBjzd5qgZB704-kxWjHiSkXtnuIgucKn7jCKb2G4C/s640/create+a+folder.png" /></span></a><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="color: purple; font-size: large;">2. Download sample tweets or Download twitter data using flume to do Sentiment Analysis and copy to '<b>~/sentimentanalysis</b>' folder</span><br />
<span style="font-size: large;"><br /></span>
<span style="color: red; font-size: large;"><b>Note: </b>Download sample tweets <a href="https://drive.google.com/open?id=0B9ji3P-nUjrxZVRjN1g1X0VRY3M">link</a></span><br />
<span style="font-size: large;"><br /></span>
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjkDSNo2mr7Dx9LAk5ieW2qLW7jhDyFpHCuddBwopcJPClOyzuFO6c5lFPxYtFt4XYmMcbQsgaGFiLjLIDo18gu2lcdGH0iUbJaKuuMG6sdxBd8mqH2FCFgOAXhhEDOMTe-B3xoGtJqkT6O/s1600/tweets.png"><span style="font-size: large;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjkDSNo2mr7Dx9LAk5ieW2qLW7jhDyFpHCuddBwopcJPClOyzuFO6c5lFPxYtFt4XYmMcbQsgaGFiLjLIDo18gu2lcdGH0iUbJaKuuMG6sdxBd8mqH2FCFgOAXhhEDOMTe-B3xoGtJqkT6O/s640/tweets.png" /></span></a><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<b><span style="color: purple; font-size: large;">Example: Sample Tweets</span></b><br />
<b><span style="font-size: large;"><br /></span></b>
<span style="font-size: large;">i am learning hadoop course</span><br />
<span style="font-size: large;">i am good in hadoop</span><br />
<span style="font-size: large;">i am learning hadoop</span><br />
<span style="font-size: large;">i am not feeling well</span><br />
<span style="font-size: large;">why we need bigdata</span><br />
<span style="font-size: large;">i am not happy with rdbms</span><br />
<span style="font-size: large;">ravi is not working today</span><br />
<span style="font-size: large;">india got the world cup </span><br />
<span style="font-size: large;">learn hadoop from kalyan blog </span><br />
<span style="font-size: large;">learn spark from kalyan blog </span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="color: purple; font-size: large;">3. verify using <b>cat</b> command </span><br />
<span style="font-size: large;"><br /></span>
<span style="color: red; font-size: large;"><b>command: </b>cat ~/sentimentanalysis/tweets</span><br />
<span style="font-size: large;"><br /></span>
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjfNXwuBOq_OETICsrnpMme1q9aEePoum4vAhEYH-vWG1Jap85lpWw8xbaUzDeGt6xmPcyzBnQ6_aXxtdgq5h_6NcicbHj3ds_syj6RRn-oaL8rUvpourLRWiGEFeqV07DbXtVZIJco9HQM/s1600/cat+command.png"><span style="font-size: large;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjfNXwuBOq_OETICsrnpMme1q9aEePoum4vAhEYH-vWG1Jap85lpWw8xbaUzDeGt6xmPcyzBnQ6_aXxtdgq5h_6NcicbHj3ds_syj6RRn-oaL8rUvpourLRWiGEFeqV07DbXtVZIJco9HQM/s640/cat+command.png" /></span></a><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="color: purple; font-size: large;">4. start the hadoop using below command</span><br />
<span style="font-size: large;"><br /></span></div>
<div>
<span style="color: red; font-size: large;"><b>command:</b> start-all.sh</span><br />
<span style="font-size: large;"><br /></span>
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgJUGukou19TvLpfmnzXjWXNx_68xsA0KHwc7p7sEJxW3vsFQpRyL1E5Hm_lVXs8RdoBc7Pc6MgDZmnqC_nwexBBA3yx8c1RXDSQHJvaH2B3aJzm0CoWcbp4DJMS7010F9PPZhb3hCQfsh2/s1600/start+the+hadoop.png"><span style="font-size: large;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgJUGukou19TvLpfmnzXjWXNx_68xsA0KHwc7p7sEJxW3vsFQpRyL1E5Hm_lVXs8RdoBc7Pc6MgDZmnqC_nwexBBA3yx8c1RXDSQHJvaH2B3aJzm0CoWcbp4DJMS7010F9PPZhb3hCQfsh2/s640/start+the+hadoop.png" /></span></a><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="color: purple; font-size: large;">5. verify is running or not using "<b>jps</b>" command</span><br />
<span style="font-size: large;"><br /></span>
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgJUGukou19TvLpfmnzXjWXNx_68xsA0KHwc7p7sEJxW3vsFQpRyL1E5Hm_lVXs8RdoBc7Pc6MgDZmnqC_nwexBBA3yx8c1RXDSQHJvaH2B3aJzm0CoWcbp4DJMS7010F9PPZhb3hCQfsh2/s1600/start+the+hadoop.png"><span style="font-size: large;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgJUGukou19TvLpfmnzXjWXNx_68xsA0KHwc7p7sEJxW3vsFQpRyL1E5Hm_lVXs8RdoBc7Pc6MgDZmnqC_nwexBBA3yx8c1RXDSQHJvaH2B3aJzm0CoWcbp4DJMS7010F9PPZhb3hCQfsh2/s640/start+the+hadoop.png" /></span></a><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="color: purple; font-size: large;">6. Open browser using below url</span><br />
<span style="font-size: large;"><br /></span>
<a href="http://localhost:50070/dfshealth.jsp"><span style="font-size: large;">http://localhost:50070/dfshealth.jsp</span></a><br />
<span style="font-size: large;"><br /></span>
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiFmUV4tP3AEB_UbZb_6ClwOkNPlR_4laA4MTNc-8pEc-G4BjQ0qxaRgnHCneFbehIKPQZ5ZObVkIqQHThSsb_LAwJ4u5n8gX8cdFHEZWiNJyP0liB71v1_bI2L7Es-lJmq9wA_1FHUTjDe/s1600/browser+data.png"><span style="font-size: large;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiFmUV4tP3AEB_UbZb_6ClwOkNPlR_4laA4MTNc-8pEc-G4BjQ0qxaRgnHCneFbehIKPQZ5ZObVkIqQHThSsb_LAwJ4u5n8gX8cdFHEZWiNJyP0liB71v1_bI2L7Es-lJmq9wA_1FHUTjDe/s640/browser+data.png" /></span></a><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span><span style="color: purple; font-size: large;">7. Load the sample <b>tweets</b> into <b>HDFS</b></span><br />
<span style="font-size: large;"><br /></span>
<span style="color: red; font-size: large;"><i>hadoop fs -mkdir -p /kalyan/sentimentanalysis/hive/input</i></span><br />
<span style="font-size: large;"><br /></span>
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjhDgzvoBWzUTcSisEDlAV9B-fUT3U0HOtgLRE07RN9fnNe4EOHm2sPNRgOTFkEUAW6fvf4LqFz01uKHsqPVsSVF_yvWmEa6cOB9cCxBFGsHe_SUH4F0OZQT5Jx5v04hd8aqo8EhV_2ceAH/s1600/put+the+data+in+hdfs.png"><span style="font-size: large;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjhDgzvoBWzUTcSisEDlAV9B-fUT3U0HOtgLRE07RN9fnNe4EOHm2sPNRgOTFkEUAW6fvf4LqFz01uKHsqPVsSVF_yvWmEa6cOB9cCxBFGsHe_SUH4F0OZQT5Jx5v04hd8aqo8EhV_2ceAH/s640/put+the+data+in+hdfs.png" /></span></a><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiu5HvFlWvXiFGGO5IUYKz9HQQdacdnG7uV3PgrltuL7roWQ3m9jwI3MLw5X75VVkGuqxv3fUy17JKbte9TAR6XcPXGa_JSZrGRP3DE5JNOG4HVLGJXCom_ZEwtFQu6P6CxprP_wr2hLLT-/s1600/create+a+folder+in+hdfs.png"><span style="font-size: large;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiu5HvFlWvXiFGGO5IUYKz9HQQdacdnG7uV3PgrltuL7roWQ3m9jwI3MLw5X75VVkGuqxv3fUy17JKbte9TAR6XcPXGa_JSZrGRP3DE5JNOG4HVLGJXCom_ZEwtFQu6P6CxprP_wr2hLLT-/s640/create+a+folder+in+hdfs.png" /></span></a><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="color: red; font-size: large;"><i>hadoop fs -put ~/sentimentanalysis/tweets /kalyan/sentimentanalysis/hive/input</i></span><br />
<span style="font-size: large;"><br /></span>
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjhDgzvoBWzUTcSisEDlAV9B-fUT3U0HOtgLRE07RN9fnNe4EOHm2sPNRgOTFkEUAW6fvf4LqFz01uKHsqPVsSVF_yvWmEa6cOB9cCxBFGsHe_SUH4F0OZQT5Jx5v04hd8aqo8EhV_2ceAH/s1600/put+the+data+in+hdfs.png"><span style="font-size: large;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjhDgzvoBWzUTcSisEDlAV9B-fUT3U0HOtgLRE07RN9fnNe4EOHm2sPNRgOTFkEUAW6fvf4LqFz01uKHsqPVsSVF_yvWmEa6cOB9cCxBFGsHe_SUH4F0OZQT5Jx5v04hd8aqo8EhV_2ceAH/s640/put+the+data+in+hdfs.png" /></span></a><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiIFSH0MoejhiYkclRYtJKBjvwmdiEmR16c9eUeFku67U2dvjq9fdKZF2_BZhnRCF9-OVc-dgoJMt-p7TlzEvybHN4F9cKoxkxQ6d4BOZODhlNXMvgGNKc07-9RIYubqY6mo9G-FqYM_tq1/s1600/verify+in+hdfs.png"><span style="font-size: large;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiIFSH0MoejhiYkclRYtJKBjvwmdiEmR16c9eUeFku67U2dvjq9fdKZF2_BZhnRCF9-OVc-dgoJMt-p7TlzEvybHN4F9cKoxkxQ6d4BOZODhlNXMvgGNKc07-9RIYubqY6mo9G-FqYM_tq1/s640/verify+in+hdfs.png" /></span></a><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEikDg7aX4vaczPeNOY97DsZTtQHBQNGwHZIqztHL4b8lvQjCeT7nzTaDKAqa69YycXaXF19W_AL7HqEl7v34uvZo6BkKw9Guy5otwf-14dIRZQXz8X4GNDmptPRyVJLDGTxcE7L0TQ5UpyH/s1600/read+the+hdfs+data.png"><span style="font-size: large;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEikDg7aX4vaczPeNOY97DsZTtQHBQNGwHZIqztHL4b8lvQjCeT7nzTaDKAqa69YycXaXF19W_AL7HqEl7v34uvZo6BkKw9Guy5otwf-14dIRZQXz8X4GNDmptPRyVJLDGTxcE7L0TQ5UpyH/s640/read+the+hdfs+data.png" /></span></a><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="color: purple; font-size: large;">8. Create <b>kalyan</b> database in hive using below command</span><br />
<span style="font-size: large;"><br /></span>
<span style="color: red; font-size: large;"><i>CREATE DATABASE IF NOT EXISTS kalyan;</i></span><br />
<span style="color: red;"><span style="font-size: large;"><i><br /></i></span>
<span style="font-size: large;"><i>USE kalyan;</i></span></span><br />
<span style="font-size: large;"><br /></span>
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEitUHWrQ_2Gxj3fY7KkuB4RffiJCT4Hq3eGAoYuevE6H5Jjfnpp2vsMsXH0ewVNGF4EsRyt6I7gsZgqt-z4xiTWBI7j2KuVbt1i3Pt-1rNv05hQDGlDHF770L8nrqBOHtbd-7DKWsnX6bza/s1600/create+database.png"><span style="font-size: large;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEitUHWrQ_2Gxj3fY7KkuB4RffiJCT4Hq3eGAoYuevE6H5Jjfnpp2vsMsXH0ewVNGF4EsRyt6I7gsZgqt-z4xiTWBI7j2KuVbt1i3Pt-1rNv05hQDGlDHF770L8nrqBOHtbd-7DKWsnX6bza/s640/create+database.png" /></span></a><br />
<span style="font-size: large;"><br /></span>
<br />
<span style="color: purple; font-size: large;"><br /></span>
<span style="color: purple; font-size: large;">9. Create <b>tweets</b> table in hive with sample tweets</span><br />
<div>
<span style="color: purple; font-size: large;"><br /></span></div>
<span style="font-size: large;"></span>
<span style="color: red; font-size: large;"><i>CREATE EXTERNAL TABLE IF NOT EXISTS kalyan.tweets (tweet string) LOCATION '/kalyan/sentimentanalysis/hive/input';</i></span><br />
<span style="font-size: large;"><br /></span>
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjNhx0SFI9MN_TswzLyObfbtXSaUJ9XOuAyihwD7fwMOyPTZ4sfczIjnM5U-cYcFIwzfdiYp50sQ-UG-8VOlMqlzP8zbTbGdh11SGIyhRmmFYCOT-SLXRrmbVXRz8qsiGdztSQTLXekIYVw/s1600/display+table+data.png"><span style="font-size: large;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjNhx0SFI9MN_TswzLyObfbtXSaUJ9XOuAyihwD7fwMOyPTZ4sfczIjnM5U-cYcFIwzfdiYp50sQ-UG-8VOlMqlzP8zbTbGdh11SGIyhRmmFYCOT-SLXRrmbVXRz8qsiGdztSQTLXekIYVw/s640/display+table+data.png" /></span></a><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="color: purple; font-size: large;">10. Display the <b>tweets </b>table data in <b>hive</b> using select query</span><br />
<span style="font-size: large;"><br /></span>
<span style="color: red; font-size: large;"><i>SELECT * FROM kalyan.tweets;</i></span><br />
<span style="font-size: large;"><br /></span>
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjNhx0SFI9MN_TswzLyObfbtXSaUJ9XOuAyihwD7fwMOyPTZ4sfczIjnM5U-cYcFIwzfdiYp50sQ-UG-8VOlMqlzP8zbTbGdh11SGIyhRmmFYCOT-SLXRrmbVXRz8qsiGdztSQTLXekIYVw/s1600/display+table+data.png"><span style="font-size: large;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjNhx0SFI9MN_TswzLyObfbtXSaUJ9XOuAyihwD7fwMOyPTZ4sfczIjnM5U-cYcFIwzfdiYp50sQ-UG-8VOlMqlzP8zbTbGdh11SGIyhRmmFYCOT-SLXRrmbVXRz8qsiGdztSQTLXekIYVw/s640/display+table+data.png" /></span></a><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="color: purple; font-size: large;">11. Download `<b>sentimentanalysis-hive.jar`</b> file and copy to '<b>~/sentimentanalysis</b>' folder</span><br />
<span style="font-size: large;"><br /></span>
<span style="color: red; font-size: large;"><b>Note: </b>Download sentimentanalysis-hive.jar <a href="https://drive.google.com/open?id=0B9ji3P-nUjrxTVd5T1luRVhoQm8">link</a></span><br />
<span style="font-size: large;"><br /></span>
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEggtOlMrvjFkh_nt7Ss1udJjqIychOXTMKrhq9Zq0j09dUhRv2f0m70NydNeXsbgYlhW80fFZX_hx8BP4YMiztXPyDvuedz7Y-JJNrTjoZcM-BgFUq_eErSo0uV-GPHCyEdyspYx-s9Uy_9/s1600/hive+sentiment+analysis+jar.png"><span style="font-size: large;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEggtOlMrvjFkh_nt7Ss1udJjqIychOXTMKrhq9Zq0j09dUhRv2f0m70NydNeXsbgYlhW80fFZX_hx8BP4YMiztXPyDvuedz7Y-JJNrTjoZcM-BgFUq_eErSo0uV-GPHCyEdyspYx-s9Uy_9/s640/hive+sentiment+analysis+jar.png" /></span></a><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="color: purple; font-size: large;">12. Load the `<b>sentimentanalysis-hive.jar`</b> into HDFS</span><br />
<span style="font-size: large;"><br /></span>
<span style="color: red; font-size: large;"><i>hadoop fs -put ~/sentimentanalysis/sentimentanalysis-hive.jar /kalyan/sentimentanalysis/hive</i></span><br />
<span style="font-size: large;"><br /></span>
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjk9JBUcS0ebVrUEKEqWD7FW2A_wygLmoNqA5rz-yoojEtni5YjxkMBKh5RomVqRffEZvScoC4NWrBBZLhe8rVdy6jxXZbbQiIVNwJ_H-CtvWt4mDu3n9FtfZMm-7OJqtcvGdMvECk1jbvW/s1600/put+hive+jar.png"><span style="font-size: large;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjk9JBUcS0ebVrUEKEqWD7FW2A_wygLmoNqA5rz-yoojEtni5YjxkMBKh5RomVqRffEZvScoC4NWrBBZLhe8rVdy6jxXZbbQiIVNwJ_H-CtvWt4mDu3n9FtfZMm-7OJqtcvGdMvECk1jbvW/s640/put+hive+jar.png" /></span></a><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEipvH-XF_DOBlq49flqcBMJGNcwj0pydNBnb7ZS221pt7ex1xVk-IZ4mBEwqkhAkQPZo3tSj1tktlz4kdC4PsQu9Ehj7XGM7aaUMtHpINRhLl60dPu1908r21nNmN0yYk08he0DB6bhBCv9/s1600/verify+hive+jar.png"><span style="font-size: large;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEipvH-XF_DOBlq49flqcBMJGNcwj0pydNBnb7ZS221pt7ex1xVk-IZ4mBEwqkhAkQPZo3tSj1tktlz4kdC4PsQu9Ehj7XGM7aaUMtHpINRhLl60dPu1908r21nNmN0yYk08he0DB6bhBCv9/s640/verify+hive+jar.png" /></span></a><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="color: purple; font-size: large;">13. Add jar file into hive class path using below command</span><br />
<span style="font-size: large;"><br /></span>
<span style="color: red; font-size: large;"><i>ADD JAR <PATH OF THE JAR FILE>;</i></span><br />
<span style="color: red;"><span style="font-size: large;"><br /></span>
<span style="font-size: large;"><i>ADD JAR hdfs://localhost:8020/kalyan/sentimentanalysis/hive/sentimentanalysis-hive.jar;</i></span></span><br />
<span style="font-size: large;"><br /></span>
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgik-Btyjf0s9M_rblTpMVeW6Dgyvu9gcSoU-WqufNKA5-M-s1qojEgEgiDJJIN_klg7YuV71dPcpXqFANmYxV9teTsGIYZ5yqPduC_6nxrnBskAYMM1vygO7CGkLpORq2u4z-rcBZ1zf0r/s1600/add+jar+from+hdfs.png"><span style="font-size: large;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgik-Btyjf0s9M_rblTpMVeW6Dgyvu9gcSoU-WqufNKA5-M-s1qojEgEgiDJJIN_klg7YuV71dPcpXqFANmYxV9teTsGIYZ5yqPduC_6nxrnBskAYMM1vygO7CGkLpORq2u4z-rcBZ1zf0r/s640/add+jar+from+hdfs.png" /></span></a><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="color: purple; font-size: large;">14. Define the sentiment function in hive</span><br />
<span style="font-size: large;"><br /></span>
<span style="color: #990000; font-size: large;"><b>Hive supports Temporary function and Permanent function:</b></span><br />
<span style="font-size: large;"><br /></span>
<span style="color: orange; font-size: large;">i. Create Temporary function using below command</span><br />
<span style="font-size: large;"><i><br /></i></span>
<span style="color: red; font-size: large;"><i>CREATE TEMPORARY FUNCTION <function name> AS 'UDF CLASS NAME WITH PACKAGE';</i></span><br />
<span style="color: red;"><span style="font-size: large;"><i><br /></i></span>
<span style="font-size: large;"><i>CREATE TEMPORARY FUNCTION sentiment AS 'com.orienit.kalyan.sentimentanalysis.hive.udf.SentimentUdf';</i></span></span><br />
<span style="font-size: large;"><br /></span>
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgoYgTEd4hx8JjYz-UIdgvydef6rwJQ0va9Ipq98ST8xNWDCQcfsMTAlLRfXs8ou8rs2vqtowS-86nYXNAhkYHXyQOvrCtxuzpMGIHzs7ZzXrR5AEe5QCRR2nSIE2tPU3iMNa3_ewmjJYyB/s1600/create+a+hive+temporary+function.png"><span style="font-size: large;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgoYgTEd4hx8JjYz-UIdgvydef6rwJQ0va9Ipq98ST8xNWDCQcfsMTAlLRfXs8ou8rs2vqtowS-86nYXNAhkYHXyQOvrCtxuzpMGIHzs7ZzXrR5AEe5QCRR2nSIE2tPU3iMNa3_ewmjJYyB/s640/create+a+hive+temporary+function.png" /></span></a><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="color: orange; font-size: large;">ii. Create Permanent function using below command</span><br />
<span style="font-size: large;"><br /></span>
<span style="color: red; font-size: large;"><i>CREATE FUNCTION <db name>.<function name> AS 'UDF CLASS NAME WITH PACKAGE' USING JAR '<PATH OF THE JAR FILE>';</i></span><br />
<span style="color: red;"><span style="font-size: large;"><i><br /></i></span>
<span style="font-size: large;"><i>CREATE FUNCTION kalyan.</i><i>sentiment</i><i> AS '</i><i>com.orienit.kalyan.sentimentanalysis.hive.udf.SentimentUdf</i><i>' USING JAR '</i><i>hdfs://localhost:8020/kalyan/sentimentanalysis/hive/sentimentanalysis-hive.jar</i><i>';</i></span></span><br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhD8s8NvEmoK991K_LljoK6nQVLUZolXkzVHr6mKb72hA2Hn8bcCjZVBkkR2CMA-1Oo9nwtmnadtME28smaNwxXEXfhWVzVADi5E79TmyAglO5gs5gZdBJTQSRDRDGE2eMBd6GL8T-CNUQa/s1600/add+jar+file+in+hive+classpath.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" height="358" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhD8s8NvEmoK991K_LljoK6nQVLUZolXkzVHr6mKb72hA2Hn8bcCjZVBkkR2CMA-1Oo9nwtmnadtME28smaNwxXEXfhWVzVADi5E79TmyAglO5gs5gZdBJTQSRDRDGE2eMBd6GL8T-CNUQa/s640/add+jar+file+in+hive+classpath.png" width="640" /></a></div>
<br />
<div class="separator" style="clear: both; text-align: center;">
</div>
<br />
<br />
<span style="color: purple; font-size: large;">15. Verify the function in hive using below command</span><br />
<span style="font-size: large;"><br /></span>
<span style="color: red; font-size: large;"><i>SHOW FUNCTIONS;</i></span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjf-xE4vWKRDXR7j-2cLC5XufD4WUGe7iguPt8KG3LXaYVPR-Xbnu0zec1mk5ax1-LgOV-CH4umgbhQw6apYsXRoTQdkwn6mK25Kr-CTb4mj1iKzb7iNNosloBbVSw3Hp0D6tUFhFOgQOpu/s1600/sentiment+function+in+hive.png"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjf-xE4vWKRDXR7j-2cLC5XufD4WUGe7iguPt8KG3LXaYVPR-Xbnu0zec1mk5ax1-LgOV-CH4umgbhQw6apYsXRoTQdkwn6mK25Kr-CTb4mj1iKzb7iNNosloBbVSw3Hp0D6tUFhFOgQOpu/s640/sentiment+function+in+hive.png" /></a></span><br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhmBzIRUwf5P2TMyAdoF5uccbDUiPvGOecGbf9K_9jlkU_3g53eWORcqK5mwb9lL9H3AYItHl9gabH6EhEIZpfy2naaN1cJ80x9lmE6cOjcOSozuPdgE-tT0puubsQR0a8ct2sZbgEht_MR/s1600/verify+fucntion+in+hive.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" height="358" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhmBzIRUwf5P2TMyAdoF5uccbDUiPvGOecGbf9K_9jlkU_3g53eWORcqK5mwb9lL9H3AYItHl9gabH6EhEIZpfy2naaN1cJ80x9lmE6cOjcOSozuPdgE-tT0puubsQR0a8ct2sZbgEht_MR/s640/verify+fucntion+in+hive.png" width="640" /></a></div>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span><br />
<span style="font-size: large;"><br /></span>
<span style="color: purple; font-size: large;">16. Describe the function in hive using below command</span><br />
<span style="font-size: large;"><br /></span>
<span style="color: red; font-size: large;"><i>DESCRIBE FUNCTION EXTENDED <function name>;</i></span><br />
<span style="color: red;"><span style="font-size: large;"><i><br /></i></span>
<span style="font-size: large;"><i>DESCRIBE FUNCTION EXTENDED sentiment;</i></span></span><br />
<span style="font-size: large;"><br /></span>
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgWXnAVDxZKLwbmnSHXHd1XFOczlxri7FTi9liKICeH1U252w3DHJMOU8LW3kc-iI5nDh1fgil0188PG8u8Jnh6Yr9SZc2N-mt6EvtTADQIMvrufR8z1sfssdV0pA7zBmhn28AUbdlRh8Xj/s1600/describe+sentiment+function.png"><span style="font-size: large;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgWXnAVDxZKLwbmnSHXHd1XFOczlxri7FTi9liKICeH1U252w3DHJMOU8LW3kc-iI5nDh1fgil0188PG8u8Jnh6Yr9SZc2N-mt6EvtTADQIMvrufR8z1sfssdV0pA7zBmhn28AUbdlRh8Xj/s640/describe+sentiment+function.png" /></span></a><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<br />
<span style="color: red; font-size: large;"><i>DESCRIBE FUNCTION EXTENDED kalyan.sentiment;</i></span><br />
<div>
<span style="font-size: large;"><i><br /></i></span></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhcvF1DkA4AE66omjN5oYZL8vOTfok0WKDevdAlfSTKMO_8PeLP50KZGqU36V00OApvyroCmsQWqybO7QSP8vdT-29eGOH-yKz_YJzfTqlnMVPWBcS0LhghCEX9jp6DNuFGFKCC5w3kItpG/s1600/describe+fucntion.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" height="358" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhcvF1DkA4AE66omjN5oYZL8vOTfok0WKDevdAlfSTKMO_8PeLP50KZGqU36V00OApvyroCmsQWqybO7QSP8vdT-29eGOH-yKz_YJzfTqlnMVPWBcS0LhghCEX9jp6DNuFGFKCC5w3kItpG/s640/describe+fucntion.png" width="640" /></a></div>
<div>
<span style="font-size: large;"><i><br /></i></span>
<span style="font-size: large;"><i><br /></i></span>
<span style="font-size: large;"><i><br /></i></span>
<span style="font-size: large;"><i><br /></i></span>
<span style="font-size: large;"><i><br /></i></span>
<span style="font-size: large;"><i><br /></i></span>
<span style="font-size: large;"><i><br /></i></span>
<span style="font-size: large;"><i><br /></i></span>
<span style="font-size: large;"><i><br /></i></span>
<span style="font-size: large;"><i><br /></i></span>
<span style="font-size: large;"><i><br /></i></span>
<span style="font-size: large;"><i><br /></i></span>
<span style="font-size: large;"><i><br /></i></span>
<span style="font-size: large;"><i><br /></i></span>
<span style="font-size: large;"><i><br /></i></span>
<span style="color: purple; font-size: large;"><br /></span>
<span style="color: purple; font-size: large;">17. Analyse the tweets using </span><b style="color: purple; font-size: x-large;">sentiment</b><span style="color: purple; font-size: large;"> function using below commands</span></div>
<span style="font-size: large;"><br /></span>
<span style="color: red; font-size: large;"><i>SELECT tweet, sentiment(tweet) FROM kalyan.tweets;</i></span><br />
<span style="font-size: large;"><br /></span>
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgKtyyxmBNiD8EU1JCTeD3sfeLn4FIObHvU1Vp8gg_TlDvOZ-k36u11jp2nECH9Jx0AmjH5t6cw47xHSuNIjUZk01cKCwpDNLX0HIDHTXEK2hcwKc6lxpb00n_hFcbHQQuN39bbSvLM9M6Z/s1600/sentiment+query.png"><span style="font-size: large;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgKtyyxmBNiD8EU1JCTeD3sfeLn4FIObHvU1Vp8gg_TlDvOZ-k36u11jp2nECH9Jx0AmjH5t6cw47xHSuNIjUZk01cKCwpDNLX0HIDHTXEK2hcwKc6lxpb00n_hFcbHQQuN39bbSvLM9M6Z/s640/sentiment+query.png" /></span></a><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /><span style="color: purple;">18. Create 'sentimenttweets' table in hive using below command</span></span><br />
<span style="font-size: large;"><br /></span>
<span style="color: red; font-size: large;"><i>CREATE TABLE IF NOT EXISTS kalyan.sentimenttweets (tweet string, sentiment int) LOCATION '/kalyan/sentimentanalysis/hive/output';</i></span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg5TsBykJndqKNnHLwJQx2TytoQ63c9VfqkiJb6toeyWh3Lv35srCp33WpC2NJNuqNF1cbfz96f2VOeoMKB5ItPO_B0RFQb_P3H4UJULTByLvrBwc7Miv2wahLB9FroRcbSJKl8g8cPbSQ-/s1600/create+sentiment+table.png"><span style="font-size: large;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg5TsBykJndqKNnHLwJQx2TytoQ63c9VfqkiJb6toeyWh3Lv35srCp33WpC2NJNuqNF1cbfz96f2VOeoMKB5ItPO_B0RFQb_P3H4UJULTByLvrBwc7Miv2wahLB9FroRcbSJKl8g8cPbSQ-/s640/create+sentiment+table.png" /></span></a><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /><span style="color: purple;">19. Insert Sentiment tweets data into `</span></span><span style="color: purple;"><span style="font-size: large;">sentimenttweets</span><span style="font-size: large;">` table</span></span><br />
<span style="font-size: large;"><i><br /></i></span>
<span style="color: red; font-size: large;"><i>INSERT OVERWRITE TABLE kalyan.sentimenttweets SELECT tweet, sentiment(tweet) FROM kalyan.tweets;</i></span><br />
<span style="font-size: large;"><br /></span>
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj8CRs5UWi2QNwIgs9cvmj7wiSsNln0iXAWTTJBGcKmnvuNbZ8em7hq_2yABy_ON8eKqC7dMfOUZsnmPfVtUlBIGNVppENWP4fxVNdE9VxWENtXRR_TzyuD7epcIgVVtJyVjGZJBfUz2_qJ/s1600/do+sentiment+on+tweets.png"><span style="font-size: large;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj8CRs5UWi2QNwIgs9cvmj7wiSsNln0iXAWTTJBGcKmnvuNbZ8em7hq_2yABy_ON8eKqC7dMfOUZsnmPfVtUlBIGNVppENWP4fxVNdE9VxWENtXRR_TzyuD7epcIgVVtJyVjGZJBfUz2_qJ/s640/do+sentiment+on+tweets.png" /></span></a><br />
<span style="font-size: large;"><br /></span><span style="color: purple;"><span style="font-size: large;"><br /></span></span><br />
<span style="color: purple;"><span style="font-size: large;">20. Retrieve sentiment tweets data from `</span><i style="font-size: x-large;">sentimenttweets</i><span style="font-size: large;">` table</span></span><br />
<span style="font-size: large;"><br /></span>
<span style="color: red; font-size: large;"><i>SELECT tweet, sentiment FROM kalyan.sentimenttweets;</i></span><br />
<span style="font-size: large;"><br /></span>
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjT0OM3361_qEcrffnRBcE_Vi4AMe7zHYLLKTIIUdCQ38moVyACn8VZY3xTDx-vqcNvUjSxUWN1XOq4K0gKiEFZyap_typMmD8FIs35s4Jb8zRSnIYelD_FK8HztNT2EAy-zxxVvXExmDYV/s1600/sentiemnt+tweets.png"><span style="font-size: large;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjT0OM3361_qEcrffnRBcE_Vi4AMe7zHYLLKTIIUdCQ38moVyACn8VZY3xTDx-vqcNvUjSxUWN1XOq4K0gKiEFZyap_typMmD8FIs35s4Jb8zRSnIYelD_FK8HztNT2EAy-zxxVvXExmDYV/s640/sentiemnt+tweets.png" /></span></a><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span><br />
<span style="color: purple;"><span style="font-size: large;">21. Retrieve sentiment tweets data from `</span><i style="font-size: x-large;">sentimenttweets</i><span style="font-size: large;">` table using <b>case statement</b></span></span><br />
<div>
<span style="color: purple;"><span style="font-size: large;"><br /></span></span></div>
<span style="font-size: large;"></span>
<span style="color: red; font-size: large;"><i>SELECT tweet, </i></span><br />
<span style="color: red; font-size: large;"><i>case </i></span><br />
<span style="color: red; font-size: large;"><i>when sentiment = 1 then "positive" </i></span><br />
<span style="color: red; font-size: large;"><i>when sentiment = 0 then "neutral" </i></span><br />
<span style="color: red; font-size: large;"><i>when sentiment = -1 then "negative" </i></span><br />
<span style="color: red; font-size: large;"><i>end</i></span><br />
<span style="color: red; font-size: large;"><i>FROM kalyan.sentimenttweets;</i></span><br />
<span style="font-size: large;"><br /></span>
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjYT1Y0PgpxZPGu4R0CiT3PurbOuKDSdLBb_0g9btnKcpLX1MDHtBrHxTLJt6SwPsjPLuxarNaojIWs6tpZ3vRNjjc8LH_FsKgeMbEqgjzlUxxe5kgutysS7oY6TT0vWWWp_CBDqMom6CID/s1600/case+statement+in+hive.png"><span style="font-size: large;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjYT1Y0PgpxZPGu4R0CiT3PurbOuKDSdLBb_0g9btnKcpLX1MDHtBrHxTLJt6SwPsjPLuxarNaojIWs6tpZ3vRNjjc8LH_FsKgeMbEqgjzlUxxe5kgutysS7oY6TT0vWWWp_CBDqMom6CID/s640/case+statement+in+hive.png" /></span></a><br />
<span style="font-size: large;"><br /></span>
<br />
<span style="font-size: large;"><br /></span>
<br />
<div>
<div>
</div>
</div>
</div>
</div>
Anonymoushttp://www.blogger.com/profile/01685371156177399870noreply@blogger.com5tag:blogger.com,1999:blog-2182833570422175384.post-85106936673264785092016-10-24T23:25:00.002-07:002016-10-27T16:31:25.124-07:00SCALA BASICS DAY 2 Practice on 25 Oct 2016<div dir="ltr" style="text-align: left;" trbidi="on">
<span style="font-size: large;"><span style="background-color: white; color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif;">Scala Day 2 Practice:</span><br style="background-color: white; color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif;" /><span style="background-color: white; color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif;">==============================================</span></span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">-----------------------------------------------------</span><br />
<span style="font-size: large;"><b>Functions in Scala:</b></span><br />
<span style="font-size: large;">------------------------------------------------------</span><br />
<span style="font-size: large;">1. anonymous functions</span><br />
<span style="font-size: large;">2. named functions</span><br />
<span style="font-size: large;">3. curried functions</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">( x : Int ) => { x + 1 }</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">val add = ( x : Int ) : Int => { x + 1 }</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">add(1)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">add(10)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">-----------------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val add = ( x : Int ) => { x + 1 }</span><br />
<span style="font-size: large;">add: Int => Int = <function1></span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> ( x : Int ) => { x + 1 }</span><br />
<span style="font-size: large;">res2: Int => Int = <function1></span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> add(1)</span><br />
<span style="font-size: large;">res3: Int = 2</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> add(10)</span><br />
<span style="font-size: large;">res4: Int = 11</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">-----------------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">def add( x : Int ) = { x + 1 }</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">def add( x : Int ) : Int = { x + 1 }</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> def add( x : Int ) = { x + 1 }</span><br />
<span style="font-size: large;">add: (x: Int)Int</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> def add( x : Int ) : Int = { x + 1 }</span><br />
<span style="font-size: large;">add: (x: Int)Int</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> add(1)</span><br />
<span style="font-size: large;">res5: Int = 2</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> add(10)</span><br />
<span style="font-size: large;">res6: Int = 11</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">-----------------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">val id = 1</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">if( id == 1) {</span><br />
<span style="font-size: large;"> println("condtion match")</span><br />
<span style="font-size: large;">} else {</span><br />
<span style="font-size: large;"> println("condtion doesn't match")</span><br />
<span style="font-size: large;">}</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">def validate(id : Int) = {</span><br />
<span style="font-size: large;">if( id == 1) {</span><br />
<span style="font-size: large;"> println("condtion match")</span><br />
<span style="font-size: large;">} else {</span><br />
<span style="font-size: large;"> println("condtion doesn't match")</span><br />
<span style="font-size: large;">}</span><br />
<span style="font-size: large;">}</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">validate(id)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> def validate(id : Int) = {</span><br />
<span style="font-size: large;"> | if( id == 1) {</span><br />
<span style="font-size: large;"> | println("condtion match")</span><br />
<span style="font-size: large;"> | } else {</span><br />
<span style="font-size: large;"> | println("condtion doesn't match")</span><br />
<span style="font-size: large;"> | }</span><br />
<span style="font-size: large;"> | }</span><br />
<span style="font-size: large;">validate: (id: Int)Unit</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> validate(id)</span><br />
<span style="font-size: large;">condtion match</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val id = 2</span><br />
<span style="font-size: large;">id: Int = 2</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> validate(id)</span><br />
<span style="font-size: large;">condtion doesn't match</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">-----------------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">def factorial( n : Int ) : Int = {</span><br />
<span style="font-size: large;"> if ( n == 1 ) 1</span><br />
<span style="font-size: large;"> else n * factorial( n - 1 )</span><br />
<span style="font-size: large;">}</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> def factorial( n : Int ) : Int = {</span><br />
<span style="font-size: large;"> | if ( n == 1 ) 1</span><br />
<span style="font-size: large;"> | else n * factorial( n - 1 )</span><br />
<span style="font-size: large;"> | }</span><br />
<span style="font-size: large;">factorial: (n: Int)Int</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> factorial(1)</span><br />
<span style="font-size: large;">res14: Int = 1</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> factorial(5)</span><br />
<span style="font-size: large;">res15: Int = 120</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> factorial(6)</span><br />
<span style="font-size: large;">res16: Int = 720</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">-----------------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> 1 to 5</span><br />
<span style="font-size: large;">res17: scala.collection.immutable.Range.Inclusive = Range(1, 2, 3, 4, 5)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> 1 to 5 by 2</span><br />
<span style="font-size: large;">res18: scala.collection.immutable.Range = Range(1, 3, 5)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> 1 to 5 by 3</span><br />
<span style="font-size: large;">res19: scala.collection.immutable.Range = Range(1, 4)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> 1 until 5</span><br />
<span style="font-size: large;">res20: scala.collection.immutable.Range = Range(1, 2, 3, 4)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> 1 until 5 by 2</span><br />
<span style="font-size: large;">res21: scala.collection.immutable.Range = Range(1, 3)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">-----------------------------------------------------</span><br />
<span style="font-size: large;">Scala Collections:</span><br />
<span style="font-size: large;">-------------------</span><br />
<span style="font-size: large;">1. scala.collection.immutable</span><br />
<span style="font-size: large;">2. scala.collection.mutable</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">var arr1 = Array(1,2,3,4,5)</span><br />
<span style="font-size: large;">var arr1 = Array[Int](1,2,3,4,5)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">var arr2 = Array("anil", "venkat", "raj", "anvith", "rohith")</span><br />
<span style="font-size: large;">var arr2 = Array[String]("anil", "venkat", "raj", "anvith", "rohith")</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">var list1 = List("anil", "venkat", "raj", "anvith", "rohith")</span><br />
<span style="font-size: large;">var list1 = List[String]("anil", "venkat", "raj", "anvith", "rohith")</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">var set1 = Set("anil", "venkat", "raj", "anvith", "rohith")</span><br />
<span style="font-size: large;">var set1 = Set[String]("anil", "venkat", "raj", "anvith", "rohith")</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">var seq1 = Seq("anil", "venkat", "raj", "anvith", "rohith")</span><br />
<span style="font-size: large;">var seq1 = Seq[String]("anil", "venkat", "raj", "anvith", "rohith")</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">var vec1 = Vector("anil", "venkat", "raj", "anvith", "rohith")</span><br />
<span style="font-size: large;">var vec1 = Vector[String]("anil", "venkat", "raj", "anvith", "rohith")</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">val any1 = List(1, 2.5, true, "anil")</span><br />
<span style="font-size: large;">val any1 = List[Any](1, 2.5, true, "anil")</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">-----------------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> var arr1 = Array(1,2,3,4,5)</span><br />
<span style="font-size: large;">arr1: Array[Int] = Array(1, 2, 3, 4, 5)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> var arr1 = Array[Int](1,2,3,4,5)</span><br />
<span style="font-size: large;">arr1: Array[Int] = Array(1, 2, 3, 4, 5)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> var arr2 = Array("anil", "venkat", "raj", "anvith", "rohith")</span><br />
<span style="font-size: large;">arr2: Array[String] = Array(anil, venkat, raj, anvith, rohith)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> var arr2 = Array[String]("anil", "venkat", "raj", "anvith", "rohith")</span><br />
<span style="font-size: large;">arr2: Array[String] = Array(anil, venkat, raj, anvith, rohith)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> var list1 = List("anil", "venkat", "raj", "anvith", "rohith")</span><br />
<span style="font-size: large;">list1: List[String] = List(anil, venkat, raj, anvith, rohith)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> var set1 = Set("anil", "venkat", "raj", "anvith", "rohith")</span><br />
<span style="font-size: large;">set1: scala.collection.immutable.Set[String] = Set(raj, anvith, rohith, venkat, anil)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val any1 = List(1, 2.5, true, "anil")</span><br />
<span style="font-size: large;">any1: List[Any] = List(1, 2.5, true, anil)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">-----------------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> list1.foreach(println)</span><br />
<span style="font-size: large;">anil</span><br />
<span style="font-size: large;">venkat</span><br />
<span style="font-size: large;">raj</span><br />
<span style="font-size: large;">anvith</span><br />
<span style="font-size: large;">rohith</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> any1.foreach(println)</span><br />
<span style="font-size: large;">1</span><br />
<span style="font-size: large;">2.5</span><br />
<span style="font-size: large;">true</span><br />
<span style="font-size: large;">anil</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> list1.foreach(println)</span><br />
<span style="font-size: large;">anil</span><br />
<span style="font-size: large;">venkat</span><br />
<span style="font-size: large;">raj</span><br />
<span style="font-size: large;">anvith</span><br />
<span style="font-size: large;">rohith</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> list1.foreach(x => println(x))</span><br />
<span style="font-size: large;">anil</span><br />
<span style="font-size: large;">venkat</span><br />
<span style="font-size: large;">raj</span><br />
<span style="font-size: large;">anvith</span><br />
<span style="font-size: large;">rohith</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> list1.foreach((x : String) => { println(x) })</span><br />
<span style="font-size: large;">anil</span><br />
<span style="font-size: large;">venkat</span><br />
<span style="font-size: large;">raj</span><br />
<span style="font-size: large;">anvith</span><br />
<span style="font-size: large;">rohith</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> for(x <- list1) println(x)</span><br />
<span style="font-size: large;">anil</span><br />
<span style="font-size: large;">venkat</span><br />
<span style="font-size: large;">raj</span><br />
<span style="font-size: large;">anvith</span><br />
<span style="font-size: large;">rohith</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> for(x <- 1 to 6) println(x)</span><br />
<span style="font-size: large;">1</span><br />
<span style="font-size: large;">2</span><br />
<span style="font-size: large;">3</span><br />
<span style="font-size: large;">4</span><br />
<span style="font-size: large;">5</span><br />
<span style="font-size: large;">6</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> for(x <- 1 to 6 by 2) println(x)</span><br />
<span style="font-size: large;">1</span><br />
<span style="font-size: large;">3</span><br />
<span style="font-size: large;">5</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> for(x <- 1 to 6) { if( x % 2 == 0 ) println(x) }</span><br />
<span style="font-size: large;">2</span><br />
<span style="font-size: large;">4</span><br />
<span style="font-size: large;">6</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> for(x <- 1 to 6 if x % 2 == 0) println(x)</span><br />
<span style="font-size: large;">2</span><br />
<span style="font-size: large;">4</span><br />
<span style="font-size: large;">6</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> val op = for(x <- 1 to 6 if x % 2 == 0) yield x</span><br />
<span style="font-size: large;">op: scala.collection.immutable.IndexedSeq[Int] = Vector(2, 4, 6)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> op.foreach(println)</span><br />
<span style="font-size: large;">2</span><br />
<span style="font-size: large;">4</span><br />
<span style="font-size: large;">6</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">-----------------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">def factorial( n : Int ) : Int = {</span><br />
<span style="font-size: large;"> var res = 1</span><br />
<span style="font-size: large;"> for ( x <- 1 to n ) res = res * x</span><br />
<span style="font-size: large;"> res</span><br />
<span style="font-size: large;">}</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">factorial(1)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">factorial(5)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> def factorial( n : Int ) : Int = {</span><br />
<span style="font-size: large;"> | var res = 1</span><br />
<span style="font-size: large;"> | for ( x <- 1 to n ) res = res * x</span><br />
<span style="font-size: large;"> | res</span><br />
<span style="font-size: large;"> | }</span><br />
<span style="font-size: large;">factorial: (n: Int)Int</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> factorial(1)</span><br />
<span style="font-size: large;">res34: Int = 1</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> factorial(5)</span><br />
<span style="font-size: large;">res35: Int = 120</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">-----------------------------------------------------</span><br />
<span style="font-size: large;"><b>curried functions:</b></span><br />
<span style="font-size: large;">------------------------------------------------------</span><br />
<span style="font-size: large;">def add( x : Int, y : Int ) : Int = { x + y }</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">add(1,2)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">def add( x : Int )( y : Int ) : Int = { x + y }</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">add(1)(2)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">def addOne(y : Int) = add(1)(y)</span><br />
<span style="font-size: large;">def addTwo(y : Int) = add(2)(y)</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">-----------------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> def add( x : Int, y : Int ) : Int = { x + y }</span><br />
<span style="font-size: large;">add: (x: Int, y: Int)Int</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> add(1,2)</span><br />
<span style="font-size: large;">res36: Int = 3</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> def add( x : Int )( y : Int ) : Int = { x + y }</span><br />
<span style="font-size: large;">add: (x: Int)(y: Int)Int</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> add(1)(2)</span><br />
<span style="font-size: large;">res37: Int = 3</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> def addOne(y : Int) = add(1)(y)</span><br />
<span style="font-size: large;">addOne: (y: Int)Int</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> def addTwo(y : Int) = add(2)(y)</span><br />
<span style="font-size: large;">addTwo: (y: Int)Int</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> addOne(5)</span><br />
<span style="font-size: large;">res38: Int = 6</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">scala> addTwo(5)</span><br />
<span style="font-size: large;">res39: Int = 7</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">-----------------------------------------------------</span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
<span style="font-size: large;"><br /></span>
</div>
Anonymoushttp://www.blogger.com/profile/01685371156177399870noreply@blogger.com0