All for Joomla All for Webmasters

Hadoop 2.x with Spark

Hadoop Training

Hadoop Training in Pune

About the Course

Ethans is market leader in providing trainings to working professionals, we are dedicated and commited to provide value to all our students. Our Hadoop course is expected to take two months with total 18 classes which includes Practical Hadoop training, each class is having three-four hours training. It can take lesser time if the number of hours per day is increased.

No pre requisite required for the classes as such,  Students will first trained in SQL, UNIX and JAVA.

Duration: 54-60 hours class room program, 9 Weekends
Prerequisites: Good to have Basic Knowledge of UNIX, SQL or Java (We conduct Free Java Classes on Saturday and Sunday)  
Lab: 40 hours’ lab sessions + 60 plus home assignments  + 4 POC's (Mini Projects)
After the classes: Students will easily crack Hadoop interview and have advance knowledge of Big data Ecosystem

Who get this training Hadoop?

  • Data analysts and scientist
  • Big Data professionals
  • Developers
  • System Administrator
  • Operation Professionals
  • Automation Engineers
  • Robotics Engineers
  • College Students
  • Project Manager

Hadoop Syllabus - Course Content 

  • 1 Day Session on UNIX
  • 1 Day Session on SQL
  • 3 Days Session on Core Java



Module 1: Understanding Big Data and Hadoop
Big Data
Limitations and Solutions of existing Data Analytics Architecture
Hadoop Features
Hadoop Ecosystem
Hadoop 2.x core components
Hadoop Storage: HDFS
Hadoop Processing: MapReduce Framework
Hadoop Different Distributions

Module 2: Hadoop Architecture and HDFS
Hadoop 2.x Cluster Architecture - Federation and High Availability
A Typical Production Hadoop Cluster
Hadoop Cluster Modes
Common Hadoop Shell Commands
Hadoop 2.x Configuration Files
Single node cluster and Multi node cluster set up Hadoop Administration

Module 3: Hadoop MapReduce Framework
MapReduce Use Cases
Traditional way Vs MapReduce way
Why MapReduce
Hadoop 2.x MapReduce Architecture
Hadoop 2.x MapReduce Components
YARN MR Application Execution Flow
YARN Workflow
Anatomy of MapReduce Program
Demo on MapReduce
Input Splits
Relation between Input Splits and HDFS Blocks
MapReduce: Combiner & Partitioner

Module 4: Advanced MapReduce
Distributed Cache
Reduce Join
Custom Input Format
Sequence Input Format
Xml file Parsing using MapReduce

Module 5: Pig Scripting
About Pig
MapReduce Vs Pig
Pig Use Cases
Programming Structure in Pig
Pig Running Modes
Pig components
Pig Execution
Pig Latin Program
Data Models in Pig
Pig Data Types
Shell and Utility Commands
Pig Latin : Relational Operators
File Loaders, Group Operator
COGROUP Operator
Joins and COGROUP
Diagnostic Operators
Specialized joins in Pig
Built In Functions ( Eval Function, Load and Store Functions, Math function, String Function, Date Function, Pig UDF, Piggybank, Parameter Substitution ( PIG macros and Pig Parameter substitution )
Pig Streaming
Testing Pig scripts with Punit
Aviation use case in PIG
Pig Demo on Healthcare Data set

Module 6:Hive
Hive Background
Hive Use Case
About Hive
Hive Vs Pig
Hive Architecture and Components
Metastore in Hive
Limitations of Hive
Comparison with Traditional Database
Hive Data Types and Data Models
Partitions and Buckets
Hive Tables(Managed Tables and External Tables)
Importing Data
Querying Data
Managing Output
Hive Script
Hive UDF
Retail use case in Hive
Hive Demo on Healthcare Data set

Module 7:Advanced Hive and HBase
Hive QL: Joining Tables
Dynamic Partitioning
Custom Map/Reduce Scripts
Hive Indexes and views Hive query optimizers
Hive : Thrift Server, User Defined Functions
HBase: Introduction to NoSQL Databases and HBase
HBase v/s RDBMS
HBase Components
HBase Architecture
Run Modes & Configuration
HBase Cluster Deployment

Module 8:Advanced HBase
HBase Data Model
HBase Shell
HBase Client API
Data Loading Techniques
ZooKeeper Data Model
Zookeeper Service
Demos on Bulk Loading
Getting and Inserting Data
Filters in HBase

Module 9:Processing Distributed Data with Apache Spark
What is Apache Spark
Spark Ecosystem
Spark Components
History of Spark and Spark Versions/Releases
Spark a Polyglot
What is Scala?
Why Scala?

Module 10:Oozie and Hadoop Project
Flume and Sqoop Demo
Oozie Components
Oozie Workflow
Scheduling with Oozie
Demo on Oozie Workflow
Oozie Co-ordinator
Oozie Commands
Oozie Web Console
Oozie for MapReduce

Additional Benefits:

• We provide real time scenarios examples, how to work in real time projects
• We guide for resume preparation by giving sample resume
• Will give you 2 POC (proof Of Concept) with Data set so that you can practice before going for interview
• We provide hands –on in class room itself so that you can understand concepts 100%
• We give assignments for weekdays practice