---------------------------------------------------------------------------------------------------------------------------------

Mr. Kalyan, Big Data Solution Architect,

Apache Contributor, 11+ years of IT exp, 7+ years of Big Data exp,

Cloudera CCA175 Certified Consultant, IIT Kharagpur, Gold Medalist

---------------------------------------------------------------------------------------------------------------------------------

Big Data Hadoop Training In Hyderabad @ ORIENIT @ KALYAN

Big Data Hadoop Training Course Content Link

---------------------------------------------------------------------------------------------------------------------------------

Big Data Hadoop Course Content (Hadoop-1.x, Hadoop-2.x & Hadoop-3.x)

(Development and Administration)

---------------------------------------------------------------------------------------------------------------------------------

Introduction to Big Data and Hadoop

Big Data
- What is Big Data?
- Why all industries are talking about Big Data?
- What are the issues in Big Data?
  - Storage
    - What are the challenges for storing big data?
  - Processing
    - What are the challenges for processing big data?
- What are the technologies support big data?
  - Hadoop
  - Spark
  - Data Bases
    - Traditional
    - NO SQL
Hadoop
- What is Hadoop?
- Why Hadoop?
- History of Hadoop
- Hadoop Use cases
- Advantages and Disadvantages of Hadoop
Importance of Different Ecosystems of Hadoop
Importance of Integration with other Big Data solutions
Big Data Real time Use Cases
Batch vs Real Time Big Data Analytics
Real Time Analytics
- Streaming Data – Storm / Kafka / Flume
- In Memory Data - Spark

HDFS (Hadoop Distributed File System)

HDFS architecture
- Name Node
  - Importance of Name Node
  - What are the roles of Name Node
  - What are the drawbacks in Name Node
- Secondary Name Node
  - Importance of Secondary Name Node
  - What are the roles of Secondary Name Node
  - What are the drawbacks in Secondary Name Node
- Data Node
  - Importance of Data Node
  - What are the roles of Data Node
  - What are the drawbacks in Data Node
Data Storage in HDFS
- How blocks are storing in DataNodes
- How replication works in Data Nodes
- How to write the files into HDFS
- How to read the files from HDFS
HDFS Block size
- Importance of HDFS Block size
- Why Block size is so large?
- How it is related to MapReduce split size
HDFS Replication factor
- Importance of HDFS Replication factor in production environment
- Can we change the replication for a particular file or folder
- Can we change the replication for all files or folders
Accessing HDFS
- CLI(Command Line Interface) using hdfs commands
- Java Based Approach
HDFS Commands
- Importance of each command
- How to execute the command
- Hdfs admin related commands explanation
Configurations
- Can we change the existing configurations of hdfs or not?
- Importance of configurations
How to overcome the Drawbacks in HDFS
- Name Node failures
- Secondary Name Node failures
- Data Node failures
Where does it fit and Where doesn’t fit?
Exploring the Apache HDFS Web UI
How to configure the Hadoop Cluster
- How to add the new nodes ( Commissioning )
- How to remove the existing nodes ( De-Commissioning )
- How to verify the Live Nodes / Dead Nodes
Hadoop-2.x / Hadoop-3.x version features
- Introduction to Namenode federation
- Introduction to Namenode High Availabilty with NFS
- Introduction to Namenode High Availabilty with QJM
Difference between Hadoop-1.x, Hadoop-2.x and Hadoop-3.x versions

MAPREDUCE

Map Reduce architecture
- JobTracker
  - Importance of JobTracker
  - What are the roles of JobTracker
  - What are the drawbacks in JobTracker
- TaskTracker
  - Importance of TaskTracker
  - What are the roles of TaskTracker
  - What are the drawbacks in TaskTracker
- Map Reduce Job execution flow
Data Types in Hadoop
- What are the Data types in Map Reduce
- Why these are importance in Map Reduce
- Can we write custom Data Types in MapReduce
Input Format's in Map Reduce
- Text Input Format
- Key Value Text Input Format
- Sequence File Input Format
- NLine Input Format
- Importance of Input Format in Map Reduce
- How to use Input Format in Map Reduce
- How to write custom Input Format's and its Record Readers
Output Format's in Map Reduce
- Text Output Format
- Sequence File Output Format
- Importance of Output Format in Map Reduce
- How to use Output Format in Map Reduce
- How to write custom Output Format's and its Record Writers
Mapper
- What is mapper in Map Reduce Job
- Why we need mapper?
- What are the Advantages and Disadvantages of mapper
- Writing mapper programs
Reducer
- What is reducer in Map Reduce Job
- Why we need reducer ?
- What are the Advantages and Disadvantages of reducer
- Writing reducer programs
Combiner
- What is combiner in Map Reduce Job
- Why we need combiner?
- What are the Advantages and Disadvantages of Combiner
- Writing Combiner programs
Partitioner
- What is Partitioner in Map Reduce Job
- Why we need Partitioner?
- What are the Advantages and Disadvantages of Partitioner
- Writing Partitioner programs
Distributed Cache
- What is Distributed Cache in Map Reduce Job
- Importance of Distributed Cache in Map Reduce job
- What are the Advantages and Disadvantages of Distributed Cache
- Writing Distributed Cache programs
Counters
- What is Counter in Map Reduce Job
- Why we need Counters in production environment?
- How to Write Counters in Map Reduce programs
Importance of Writable and Writable Comparable Api’s
- How to write custom Map Reduce Keys using Writable
- How to write custom Map Reduce Values using Writable Comparable
Joins
- Map Side Join
  - What is the importance of Map Side Join
  - Where we are using it
- Reduce Side Join
  - What is the importance of Reduce Side Join
  - Where we are using it
- What is the difference between Map Side join and Reduce Side Join?
Compression techniques
- Importance of Compression techniques in production environment
- Compression Types
  - NONE, RECORD and BLOCK
- Compression Codecs
  - Default, Gzip, Bzip2, Snappy and LZO
- Enabling and Disabling these techniques for all the Jobs
- Enabling and Disabling these techniques for a particular Job
Map Reduce Schedulers
- FIFO Scheduler
- Capacity Scheduler
- Fair Scheduler
- Importance of Schedulers in production environment
- How to use Schedulers in production environment
Map Reduce Programming Model
- How to write the Map Reduce jobs in Java
- Running the Map Reduce jobs in local mode
- Running the Map Reduce jobs in pseudo mode
- Running the Map Reduce jobs in cluster mode
Debugging Map Reduce Jobs
- How to debug Map Reduce Jobs in Local Mode.
- How to debug Map Reduce Jobs in Remote Mode.
Data Locality
- What is Data Locality?
- Will Hadoop follows Data Locality?
Speculative Execution
- What is Speculative Execution?
- Will Hadoop follows Speculative Execution?
Map Reduce Commands
- Importance of each command
- How to execute the command
- Mapreduce admin related commands explanation
Configurations
- Can we change the existing configurations of mapreduce or not?
- Importance of configurations
Writing Unit Tests for Map Reduce Jobs
Configuring hadoop development environment using Eclipse
Use of Secondary Sorting and how to solve using MapReduce
How to Identify Performance Bottlenecks in MR jobs and tuning MR jobs.
Map Reduce Streaming and Pipes with examples
Exploring the MapReduce Web UI

YARN (Next Generation Map Reduce)

What is YARN?
What is the importance of YARN?
Where we can use the concept of YARN in Real Time & it's powered projects
What is difference between YARN and Map Reduce
Yarn Architecture
- Importance of Resource Manager
- Importance of Node Manager
- Importance of Application Manager
- Yarn Application execution flow
Installing YARN on both windows & Linux
Exploring the YARN Web UI
Examples on YARN

Apache PIG

Introduction to Apache Pig
Map Reduce Vs Apache Pig
SQL Vs Apache Pig
Different data types in Pig
Modes Of Execution in Pig
- Local Mode
- Map Reduce Mode
Execution Mechanism
- Grunt Shell
- Script
- Embedded
UDF's
- How to write the UDF's in Pig
- How to use the UDF's in Pig
- Importance of UDF's in Pig
Filter's
- How to write the Filter's in Pig
- How to use the Filter's in Pig
- Importance of Filter's in Pig
Load Functions
- How to write the Load Functions in Pig
- How to use the Load Functions in Pig
- Importance of Load Functions in Pig
Store Functions
- How to write the Store Functions in Pig
- How to use the Store Functions in Pig
- Importance of Store Functions in Pig
Transformations in Pig
How to write the complex pig scripts
How to integrate the Pig and Hbase

Apache HIVE

Hive Introduction
Hive architecture
- Driver
- Compiler
- Optimizer
- Semantic Analyzer
Hive Query Language(Hive QL)
SQL VS Hive QL
Hive Installation and Configuration
Hive DLL and DML Operations
Hive Services
- CLI
- Hiveserver
- Hwi
Metastore
- embedded metastore configuration
- external metastore configuration
UDF's
- How to write the UDF's in Hive
- How to use the UDF's in Hive
- Importance of UDF's in Hive
UDAF's
- How to use the UDAF's in Hive
- Importance of UDAF's in Hive
UDTF's
- How to use the UDTF's in Hive
- Importance of UDTF's in Hive
How to write a complex Hive queries
What is Hive Data Model?
Partitions
- Importance of Hive Partitions in production environment
- Limitations of Hive Partitions
- How to write Partitions
Buckets
- Importance of Hive Buckets in production environment
- How to write Buckets
SerDe
- Importance of Hive SerDe's in production environment
- How to write SerDe programs
How to integrate the Hive and Hbase
How to integrate the Hive and Spark

Cloudera Impala

Introduction to Impala
Impala Examples
Hive vs Impala

Apache Zookeeper

Introduction to zookeeper
Pseudo mode installations
Zookeeper cluster installations
Basic commands execution

Apache HBase

HBase introduction
HBase usecases
HBase basics
- Importane of Column families
- Basic CRUD operations
  - create
  - scan / get
  - put
  - delete / deleteall / drop
- Bulk loading in Hbase
HBase installation
- Local mode
- Psuedo mode
- Cluster mode
HBase Architecture
- HMaster
- HRegionServer
- Zookeeper
Mapreduce integration
- Mapreduce over HBase

Apache Phoenix

Introduction to Phoenix
Installing Phoenix
Integrating with Hbase
Comparing Hbase & Phoenix
Practice on Phoenix examples

Apache Cassandra

Introduction to Cassandra
Installing Cassandra
Practice on Cassandra examples

MongoDB

Introduction to MongoDB
Installing MongoDB
Practice on MongoDB examples

Apache Sqoop

Introduction to Sqoop
MySQL client and Server Installation
Sqoop Installation
How to connect to Relational Database using Sqoop
Examples on Import and Export Sqoop commands

Apache Flume

Introduction to flume
Flume installation
Flume Architecture
- Agent
- Sources
- Channels
- Sinks
Practice on Flume examples

Apache Kafka

Introduction to Kafka
Installing Kafka
Practice on Kafka examples

Apache Oozie

Introduction to oozie
Oozie installation
Executing different oozie workflow jobs
Monitering Oozie workflow jobs

Pre-Requisites for this Course

Java Basics like OOPS Concepts, Interfaces, Classes and Abstract Classes etc (Free Java classes as part of the course)
SQL Basic Knowledge ( Free SQL classes as part of the course)
Linux Basic Commands (Provided in our blog)

Spark and Scala Content as part of Hadoop Course

Introduction of Scala

What is Scala?
Why Scala?
Advantages of Scala?
Using the Scala REPL(Read Evaluate print loop)
What is Type Inference
Interoperability between Scala and Java

Scala using Command Line

Installing Java & Scala
Interactive Scala
Writing Scala Scripts
Compiling Scala Programs

Basics of Scala

Defining Variables
Defining Functions
String Interpolation
IDE for Scala

Scala Type Less, Do More

Semicolons
Variable Declarations
Method Declarations
Type Inference
Immutability
Operators
Precedence Rules
Literals
Arrays, Lists, Maps, Tuples

Expressions and Conditionals

If expressions
If-Else expressions
For Loops
While Loops
Do-While Loops
Conditional Operators
Pattern Matching

Functional Programming in Scala

What is Functional Programming?
Different types of functions in Scala
- Anonymous functions
- Named functions
- Curried functions
Recursions

Object-Oriented Programming in Scala

How to create a Class
How to create a Case Class
How to create a Object
Constructors in Scala
Fields in Classes

Introduction to Spark

What is Spark
Why Spark
Who Uses Spark
Brief History of Spark
Storage Layers for Spark
Spark vs Mapreduce
- Why Spark is 100 times faster than MapReduce
Difference between Spark-1.x and Spark-2.x
Unified Stack of Spark
- Spark Core
- Spark Sql
- Spark Streaming
- Spark MLLib
- Spark GraphX
Spark Architecture explanation
- Master Slave architecture
- Spark Driver
- Workers
- Executors
Installation of Spark in different modes
- Local mode
- Pseudo mode
Introduction Spark WebUI
Spark Job Execution flow

Basics of Spark

Creating the Spark Context
Creating the Spark Conf
Creating the Spark Session
Caching Overview
Distributed Persistence
Deploying Applications with spark-submit

Resilient Distributed Dataset (RDD)

What is RDD
Creating RDDs
- Using collections
- Using datasets (text, csv, tsv, ...)
RDD Operations
- Transformations
- Actions
Working with Key/Value Pairs
Creating Pair RDDs
Transformations on Pair RDDs
- Aggregations
- Joins
- Sorting Data

Loading and Saving Your Data

Loading Data using RDD
Saving Data using RDD

Apache Spark SQL

What is the importance of Spark SQL
Working with Spark SQL DataSets
Working with Spark SQL DataFrames
Practice on Spark SQL Context
Practice on Spark SparkSession
Practical examples on Spark SQL
- Aggregations
- Joins
- Sorting Data
Spark SQL Integrations
- Spark and Hive interation
- Spark and RDBMS interation
Processing different files using Spark SQL
- Text
- Json
- Csv
- Tsv
- Parquet

BigData Administration topics:

Hadoop Installations (Windows & Linux)
- Local mode (hands on installation on ur laptop)
- Pseudo mode (hands on installation on ur laptop)
- Cluster mode (hands on 40+ node cluster setup in our lab)
- Nodes Commissioning and De-commissioning in Hadoop Cluster
- Jobs Monitoring in Hadoop Cluster
- Fair Scheduler (hands on installation on ur laptop)
- Capacity Scheduler (hands on installation on ur laptop)
Hive Installations
- Local mode (hands on installation on ur laptop)
  - With internal Derby
- Cluster mode (hands on installation on ur laptop)
  - With external Derby
  - With external MySql
- Hive Web Interface (HWI) mode (hands on installation on ur laptop)
- Hive Thrift Server mode (hands on installation on ur laptop)
- Derby Installation (hands on installation on ur laptop)
- MySql Installation (hands on installation on ur laptop)
Pig Installations
- Local mode (hands on installation on ur laptop)
- Mapreduce mode (hands on installation on ur laptop)
Hbase Installations
- Local mode (hands on installation on ur laptop)
- Psuedo mode (hands on installation on ur laptop)
- Cluster mode (hands on installation on ur laptop)
  - With internal Zookeeper
  - With external Zookeeper
Zookeeper Installations
- Local mode (hands on installation on ur laptop)
- Cluster mode (hands on installation on ur laptop)
Sqoop Installations
- Sqoop installation with MySql (hands on installation on ur laptop)
- Sqoop with hadoop integration (hands on installation on ur laptop)
- Sqoop with hive integration (hands on installation on ur laptop)
- Sqoop with hbase integration (hands on installation on ur laptop)
Flume Installation
- Psuedo mode (hands on installation on ur laptop)
Oozie Installation
- Psuedo mode (hands on installation on ur laptop)
Advanced Technologies Installations
- Spark
- Cassandra
- MongoDB
- Kakfa
- Mahout
Cloudera Hadoop Distribution installation
HortonWorks Hadoop Distribution installation

Advanced and New technologies architectural discussions

Spark / Flink (Real time data processing)
Storm / Kafka / Flume (Real time data streaming)
Cassandra / MongoDB (NOSQL database)
Solr (Search engine)
Nutch (Web Crawler)
Lucene (Indexing data)
Mahout (Machine Learning Algorithms)
Ganglia, Nagios (Monitoring tools)
Cloudera, Hortonworks, MapR, Amazon EMR (Distributions)
How to crack the Cloudera / Hortonworks certification questions

Cloudera Distribution

Introduction to Cloudera
Cloudera Installation
Cloudera Certification details
How to use cloudera hadoop
What are the main differences between Cloudera and Apache hadoop

Hortonworks Distribution

Introduction to Hortonworks
Hortonworks Installation
Hortonworks Certification details
How to use Hortonworks hadoop
What are the main differences between Hortonworks and Apache hadoop

Amazon EMR

Introduction to Amazon EMR and Amazon EC2
How to use Amazon EMR and Amazon EC2
Why to use Amazon EMR and Importance of this

Hadoop ecosystem Integrations:

Hive and Spark integration
Hive and HBase integration
Pig and HBase integration
Sqoop and RDBMS integration
Hbase and Phoenix integration
Flume and Phoenix integration
Kakfa and Phoenix integraion

Free Big Data Workshops:

Spark & Scala
Cassandra
MongoDB
Search engine & E-commerce solutions
Big Data Analytics (R, Mahout, Spark ML)

Real Time Big Data Projects

We willl be sharing Weekly based Big Data Assignments
We willl be sharing End-to-End Big Data Projects
We are providing Big Data Project Practice on Our Lab
We are providing Important Recorded Videos on Our YouTube Channel
Any information search in Google / YouTube by keyword is 'Kalyan Hadoop'

What we are offering to you:

Hadoop installation on both Windows & Linux
Free Weekly Online Hadoop Certification
Real Time Big Data projects will be shared
Free Big Data Workshops on new & advanced technologies
Hands on MapReduce programming around 20+ programs these will make you to perfect in MapReduce both concept-wise and programmatically
Hands on 5 POC's will be provided (These POC's will help you perfect in Hadoop and it's ecosystems)
Hands on practical 40+ Node hadoop cluster setup in our Lab.
Well documented Hadoop material with all the topics covering in the course
Well documented Hadoop blog contains frequent interview questions along with the answers and latest updates on Big Data technology.
Discussing about hadoop interview questions & answers daily base.
Resume preparation with POC's or Project's based on your experience.

Kalyan Hadoop and Spark Training in Hyderabad Learn Big Data From Basics... @ Kalyan @

Hadoop Training in Hyderabad

Apache Oozie

Introduction to Spark

BigData Administration topics:

Cloudera Distribution

Hortonworks Distribution

Amazon EMR

Hadoop ecosystem Integrations:

Free Big Data Workshops:

1 comment :