Hadoop Training in Hyderabad

  1. Introduction to Big Data and Hadoop

  • Big Data
    • What is Big Data?
    • Why all industries are talking about Big Data?
    • What are the issues in Big Data?
      • Storage
        • What are the challenges for storing big data?
      • Processing
        • What are the challenges for processing big data?
    • What are the technologies support big data?
      • Hadoop
      • Data Bases
        • Traditional
        • NO SQL
  • Hadoop
    • What is Hadoop?
    • History of Hadoop
    • Why Hadoop?
    • Hadoop Use cases
    • Advantages and Disadvantages of Hadoop
  • Importance of Different Ecosystems of Hadoop
  • Importance of Integration with other Big Data solutions
  • Big Data Real time Use Cases
  1. HDFS (Hadoop Distributed File System)

  • HDFS architecture
    • Name Node
      • Importance of Name Node
      • What are the roles of Name Node
      • What are the drawbacks in Name Node
    • Secondary Name Node
      • Importance of Secondary Name Node
      • What are the roles of Secondary Name Node
      • What are the drawbacks in Secondary Name Node
    • Data Node
      • Importance of Data Node
      • What are the roles of Data Node
      • What are the drawbacks in Data Node
  • Data Storage in HDFS
    • How blocks are storing in DataNodes
    • How replication works in Data Nodes
    • How to write the files in HDFS
    • How to read the files in HDFS
  • HDFS Block size
    • Importance of HDFS Block size
    • Why Block size is so large?
    • How it is related to MapReduce split size
  • HDFS Replication factor
    • Importance of HDFS Replication factor in production environment
    • Can we change the replication for a particular file or folder
    • Can we change the replication for all files or folders
  • Accessing HDFS
    • CLI(Command Line Interface) using hdfs commands
    • Java Based Approach
  • HDFS Commands
    • Importance of each command
    • How to execute the command
    • Hdfs admin related commands explanation
  • Configurations
    • Can we change the existing configurations of hdfs or not?
    • Importance of configurations
  • How to overcome the Drawbacks in HDFS
    • Name Node failures
    • Secondary Name Node failures
    • Data Node failures
  • Where does it fit and Where doesn’t fit?
  • Exploring the Apache HDFS Web UI
  • How to configure the Hadoop Cluster
    • How to add the new nodes ( Commissioning )
    • How to remove the existing nodes ( De-Commissioning )
    • How to verify the Dead Nodes
    • How to start the Dead Nodes
  • Hadoop 2.x.x version features
    • Introduction to Namenode federation
    • Introduction to Namenode High Availabilty with NFS
    • Introduction to Namenode High Availabilty with QJM
  • Difference between Hadoop 1.x.x and Hadoop 2.x.x versions
  1. MAPREDUCE

  • Map Reduce architecture
    • JobTracker
      • Importance of JobTracker
      • What are the roles of JobTracker
      • What are the drawbacks in JobTracker
    • TaskTracker
      • Importance of TaskTracker
      • What are the roles of TaskTracker
      • What are the drawbacks in TaskTracker
    • Map Reduce Job execution flow
  • Data Types in Hadoop
    • What are the Data types in Map Reduce
    • Why these are importance in Map Reduce
    • Can we write custom Data Types in MapReduce
  • Input Format's in Map Reduce
    • Text Input Format
    • Key Value Text Input Format
    • Sequence File Input Format
    • NLine Input Format
    • Importance of Input Format in Map Reduce
    • How to use Input Format in Map Reduce
    • How to write custom Input Format's and its Record Readers
  • Output Format's in Map Reduce
    • Text Output Format
    • Sequence File Output Format
    • Importance of Output Format in Map Reduce
    • How to use Output Format in Map Reduce
    • How to write custom Output Format's and its Record Writers
  • Mapper
    • What is mapper in Map Reduce Job
    • Why we need mapper?
    • What are the Advantages and Disadvantages of mapper
    • Writing mapper programs
  • Reducer
    • What is reducer in Map Reduce Job
    • Why we need reducer ?
    • What are the Advantages and Disadvantages of reducer
    • Writing reducer programs
  • Combiner
    • What is combiner in Map Reduce Job
    • Why we need combiner?
    • What are the Advantages and Disadvantages of Combiner
    • Writing Combiner programs
  • Partitioner
    • What is Partitioner in Map Reduce Job
    • Why we need Partitioner?
    • What are the Advantages and Disadvantages of Partitioner
    • Writing Partitioner programs
  • Distributed Cache
    • What is Distributed Cache in Map Reduce Job
    • Importance of Distributed Cache in Map Reduce job
    • What are the Advantages and Disadvantages of Distributed Cache
    • Writing Distributed Cache programs
  • Counters
    • What is Counter in Map Reduce Job
    • Why we need Counters in production environment?
    • How to Write Counters in Map Reduce programs
  • Importance of Writable and Writable Comparable Api’s
    • How to write custom Map Reduce Keys using Writable
    • How to write custom Map Reduce Values using Writable Comparable
  • Joins
    • Map Side Join
      • What is the importance of Map Side Join
      • Where we are using it
    • Reduce Side Join
      • What is the importance of Reduce Side Join
      • Where we are using it
    • What is the difference between Map Side join and Reduce Side Join?
  • Compression techniques
    • Importance of Compression techniques in production environment
    • Compression Types
      • NONE, RECORD and BLOCK
    • Compression Codecs
      • Default, Gzip, Bzip, Snappy and LZO
    • Enabling and Disabling these techniques for all the Jobs
    • Enabling and Disabling these techniques for a particular Job
  • Map Reduce Schedulers
    • FIFO Scheduler
    • Capacity Scheduler
    • Fair Scheduler
    • Importance of Schedulers in production environment
    • How to use Schedulers in production environment
  • Map Reduce Programming Model
    • How to write the Map Reduce jobs in Java
    • Running the Map Reduce jobs in local mode
    • Running the Map Reduce jobs in pseudo mode
    • Running the Map Reduce jobs in cluster mode
  • Debugging Map Reduce Jobs
    • How to debug Map Reduce Jobs in Local Mode.
    • How to debug Map Reduce Jobs in Remote Mode.
  • Data Locality
    • What is Data Locality?
    • Will Hadoop follows Data Locality?
  • Speculative Execution
    • What is Speculative Execution?
    • Will Hadoop follows Speculative Execution?
  • Map Reduce Commands
    • Importance of each command
    • How to execute the command
    • Mapreduce admin related commands explanation
  • Configurations
    • Can we change the existing configurations of mapreduce or not?
    • Importance of configurations
  • Writing Unit Tests for Map Reduce Jobs
  • Configuring hadoop development environment using Eclipse
  • Use of Secondary Sorting and how to solve using MapReduce
  • How to Identify Performance Bottlenecks in MR jobs and tuning MR jobs.
  • Map Reduce Streaming and Pipes with examples
  • Exploring the MapReduce Web UI

  1. YARN (Next Generation Map Reduce)

  • What is YARN?
  • What is the importance of YARN?
  • Where we can use the concept of YARN in Real Time & it's powered projects
  • What is difference between YARN and Map Reduce
  • Yarn Architecture
    1. Importance of Resource Manager
    2. Importance of Node Manager
    3. Importance of Application Manager
    4. Yarn Application execution flow
  • Installing YARN on both windows & Linux
  • Exploring the YARN Web UI
  • Examples on YARN
  1. Apache PIG

  • Introduction to Apache Pig
  • Map Reduce Vs Apache Pig
  • SQL Vs Apache Pig
  • Different data types in Pig
  • Modes Of Execution in Pig
    • Local Mode
    • Map Reduce Mode
  • Execution Mechanism
    • Grunt Shell
    • Script
    • Embedded
  • UDF's
    • How to write the UDF's in Pig
    • How to use the UDF's in Pig
    • Importance of UDF's in Pig
  • Filter's
    • How to write the Filter's in Pig
    • How to use the Filter's in Pig
    • Importance of Filter's in Pig
  • Load Functions
    • How to write the Load Functions in Pig
    • How to use the Load Functions in Pig
    • Importance of Load Functions in Pig
  • Store Functions
    • How to use the Store Functions in Pig
    • Importance of Store Functions in Pig
  • Transformations in Pig
  • How to write the complex pig scripts
  • How to integrate the Pig and Hbase

  1. Apache HIVE

  • Hive Introduction
  • Hive architecture
    • Driver
    • Compiler
    • Optimizer
    • Semantic Analyzer
  • Hive Query Language(Hive QL)
  • SQL VS Hive QL
  • Hive Installation and Configuration
  • Hive DLL and DML Operations
  • Hive Services
    • CLI
    • Hiveserver
    • Hwi
  • Metastore
    • embedded metastore configuration
    • external metastore configuration
  • UDF's
    • How to write the UDF's in Hive
    • How to use the UDF's in Hive
    • Importance of UDF's in Hive
  • UDAF's
    • How to use the UDAF's in Hive
    • Importance of UDAF's in Hive
  • UDTF's
    • How to use the UDTF's in Hive
    • Importance of UDTF's in Hive
  • How to write a complex Hive queries
  • What is Hive Data Model?
  • Partitions
    • Importance of Hive Partitions in production environment
    • Limitations of Hive Partitions
    • How to write Partitions
  • Buckets
    • Importance of Hive Buckets in production environment
    • How to write Buckets
  • SerDe
    • Importance of Hive SerDe's in production environment
    • How to write SerDe programs
  • How to integrate the Hive and Hbase
  1. Cloudera Impala

  • Introduction to Impala
  • Impala Examples
  1. Apache Zookeeper

  • Introduction to zookeeper
  • Pseudo mode installations
  • Zookeeper cluster installations
  • Basic commands execution
  1. Apache HBase

  • HBase introduction
  • HBase usecases
  • HBase basics
    • Importane of Column families
    • Basic CRUD operations
      • create
      • scan / get
      • put
      • delete / drop
    • Bulk loading in Hbase
  • HBase installation
    • Local mode
    • Psuedo mode
    • Cluster mode
  • HBase Architecture
    • HMaster
    • HRegionServer
    • Zookeeper
  • Mapreduce integration
    • Mapreduce over HBase
  1. Apache Phoenix

  • Introduction to Phoenix
  • Installing Phoenix
  • Integrating with Hbase
  • Comparing Hbase & Phoenix
  • Practice on Phoenix examples
  1. Apache Cassandra

  • Introduction to Cassandra
  • Installing Cassandra
  • Practice on Cassandra examples
  1. MongoDB

  • Introduction to MongoDB
  • Installing MongoDB
  • Practice on MongoDB examples
  1. Apache Drill

  • Introduction to Drill
  • Installing Drill
  • Practice on Drill examples
  1. Apache SQOOP

  • Introduction to Sqoop
  • MySQL client and Server Installation
  • Sqoop Installation
  • How to connect to Relational Database using Sqoop
  • Examples on Import and Export Sqoop commands
  1. Apache FLUME

  • Introduction to flume
  • Flume installation
  • Flume Architecture
    • Agent
    • Sources
    • Channels
    • Sinks
  • Practice on Flume examples
  1. Apache Kafka

  • Introduction to Kafka
  • Installing Kafka
  • Practice on Kafka examples
  1. Apache Spark

  • Introduction to Spark
  • Installing Spark
  • Spark Architecture
  • Introduction to Spark Components
    • Spark Core
    • Spark SQL
    • Spark Streaming
    • Spark MLLib
    • Spark GraphX
  • Practice on Spark examples
  • Spark and Hive interation
  1. Apache OOZIE

  • Introduction to oozie
  • Oozie installation
  • Executing different oozie workflow jobs
  • Monitering Oozie workflow jobs
  1. Real Time Big Data Projects

  • We willl be sharing End-to-End Big Data Projects
  • We are providing Big Data Project Practice on Our Lab
  • We are providing Important Recorded Videos on Our YouTube Channel
  • Any information search in Google / YouTube by keyword is 'Kalyan Hadoop'
  1. Pre-Requisites for this Course

  • Java Basics like OOPS Concepts, Interfaces, Classes and Abstract Classes etc (Free Java classes as part of course)
  • SQL Basic Knowledge ( Free SQL classes as part of course)
  • Linux Basic Commands (Provided in our blog)


Administration topics:

  • Hadoop Installations (Windows & Linux)
    • Local mode (hands on installation on ur laptop)
    • Pseudo mode (hands on installation on ur laptop)
    • Cluster mode (hands on 40+ node cluster setup in our lab)
    • Nodes Commissioning and De-commissioning in Hadoop Cluster
    • Jobs Monitoring in Hadoop Cluster
    • Fair Scheduler (hands on installation on ur laptop)
    • Capacity Scheduler (hands on installation on ur laptop)
  • Hive Installations
    • Local mode (hands on installation on ur laptop)
      • With internal Derby
    • Cluster mode (hands on installation on ur laptop)
      • With external Derby
      • With external MySql
    • Hive Web Interface (HWI) mode (hands on installation on ur laptop)
    • Hive Thrift Server mode (hands on installation on ur laptop)
    • Derby Installation (hands on installation on ur laptop)
    • MySql Installation (hands on installation on ur laptop)
  • Pig Installations
    • Local mode (hands on installation on ur laptop)
    • Mapreduce mode (hands on installation on ur laptop)
  • Hbase Installations
    • Local mode (hands on installation on ur laptop)
    • Psuedo mode (hands on installation on ur laptop)
    • Cluster mode (hands on installation on ur laptop)
      • With internal Zookeeper
      • With external Zookeeper
  • Zookeeper Installations
    • Local mode (hands on installation on ur laptop)
    • Cluster mode (hands on installation on ur laptop)
  • Sqoop Installations
    • Sqoop installation with MySql (hands on installation on ur laptop)
    • Sqoop with hadoop integration (hands on installation on ur laptop)
    • Sqoop with hive integration (hands on installation on ur laptop)
    • Sqoop with hbase integration (hands on installation on ur laptop)
  • Flume Installation
    • Psuedo mode (hands on installation on ur laptop)
  • Oozie Installation
    • Psuedo mode (hands on installation on ur laptop)
  • Advanced Technologies Installations
    • Spark
    • Cassandra
    • MongoDB
    • Kakfa
    • Mahout
  • Cloudera Hadoop Distribution installation
  • HortonWorks Hadoop Distribution installation

  1. ORIENIT Hadoop POC's Solution Class

  2. ================================
  3. Advanced and New technologies architectural discussions

  • Spark / Flink (Real time data processing)
  • Storm / Kafka / Flume (Real time data streaming)
  • Cassandra / MongoDB (NOSQL database)
  • Solr (Search engine)
  • Nutch (Web Crawler)
  • Lucene (Indexing data)
  • Mahout (Machine Learning Algorithms)
  • Ganglia, Nagios (Monitoring tools)
  • Cloudera, Hortonworks, MapR, Amazon EMR (Distributions)
  • How to crack the Cloudera / Hortonworks certification questions

Cloudera Distribution

  • Introduction to Cloudera
  • Cloudera Installation
  • Cloudera Certification details
  • How to use cloudera hadoop
  • What are the main differences between Cloudera and Apache hadoop

Hortonworks Distribution

  • Introduction to Hortonworks
  • Hortonworks Installation
  • Hortonworks Certification details
  • How to use Hortonworks hadoop
  • What are the main differences between Hortonworks and Apache hadoop

Amazon EMR

  • Introduction to Amazon EMR and Amazon EC2
  • How to use Amazon EMR and Amazon EC2
  • Why to use Amazon EMR and Importance of this

Hadoop ecosystem Integrations:

  • Hive and Spark integration
  • Hive and HBase integration
  • Pig and HBase integration
  • Sqoop and RDBMS integration
  • Hbase and Phoenix integration
  • Flume and Phoenix integration
  • Kakfa and Phoenix integraion

 Free Big Data Workshops:

  • Spark & Scala
  • Cassandra
  • MongoDB
  • Search engine & E-commerce solutions
  • Big Data Analytics (R, Mahout, Spark ML)

  1. What we are offering to you:

  • Hadoop installation on both Windows & Linux
  • Free Weekly Online Hadoop Certification
  • Real Time Big Data projects will be shared
  • Free Big Data Workshops on new & advanced technologies
  • Hands on MapReduce programming around 20+ programs these will make you to perfect in MapReduce both concept-wise and programmatically
  • Hands on 5 POC's will be provided (These POC's will help you perfect in Hadoop and it's ecosystems)
  • Hands on practical 40+ Node hadoop cluster setup in our Lab.
  • Well documented Hadoop material with all the topics covering in the course
  • Well documented Hadoop blog contains frequent interview questions along with the answers and latest updates on Big Data technology.
  • Discussing about hadoop interview questions & answers daily base.
  • Resume preparation with POC's or Project's based on your experience.


Related Posts Plugin for WordPress, Blogger...