Call : (+91) 968636 4243
Mail : info@EncartaLabs.com

Hadoop Ecosystem

( Duration: 4 Days )

This Hadoop Ecosystem training course provides, vendor agnostic, technical overview of the Hadoop landscape. No prior knowledge of databases or programming is assumed. This course is targeted towards both technical and non-technical personnel who want to understand the emerging world of Big Data, with a specific focus on Hadoop.

By attending Hadoop Ecosystem workshop, delegates will:

Learn the core concepts of the Hadoop ecosystem
Deep dive into the critical architecture paths of HDFS, MapReduce and HBase
Learn the basics of how to effectively write Pig and Hive scripts
Explain how to choose the correct use cases for Hadoop

Engineers, Programmers, Networking specialists, Managers, Executives

COURSE AGENDA

Introduction to Hadoop

Parallel Computer vs. Distributed Computing
Brief history of Hadoop
RDBMS/SQL vs. Hadoop
Hadoop vs SETI
Structured vs Unstructured data
Scaling with Hadoop
Google white papers: GFS, MapReduce, BigTable, Chubby
Intro to the Hadoop ecosystem: HDFS, MapReduce, Pig, Hive, HBase
HDFS overview: NameNode vs DataNode
MapReduce overview: JobTracker vs TaskTracker
Hadoop XML files for configuration: core-site.xml, hdfs-site.xml, mapred-site.xml
Hardware recommendations
Hadoop ecosystem: Hive, Pig, HBase, ZooKeeper, Mahout, Hue, Talend, Sqoop, Flume, oozie
Book recommendations for Hadoop
Vendor Comparison (Cloudera, Hortonworks, MapR, Intel, Amazon EMR)
Use cases

HDFS Deep Dive

Linux File system options (ext3, ext4, XFS)
NameNode architecture (EditLog, FsImage, location of replicas)
Secondary NameNode architecture
DataNode architecture
Write Pipeline
Read Pipeline
Heartbeats, DataNode commissioning/decommissioning, Rack Awareness, Block Scanner, Balancer, Trash, Health Check
HDFS disk space quotas and number of files quotas
Benchmarking HDFS
Settings in the hdfs-site.xml file
Exploring the HDFS Web UI
Next-gen HDFS: NameNode high availability, snapshots, federation
Quick Intro to the Java API interface
HDFS Benchmarking with DFSIO

Beginning MapReduce

MapReduce Architecture
JobTracker/TaskTracker
Combiner
Shuffle and Sort
Partitioner
Speculative Execution
Exploring the MapReduce Web UI
Walkthrough of a simple MapReduce example: WordCount
Walkthrough of a unstructured file MapReduce example: Facial recognition against video files
Walkthrough of structured file MapReduce example: web log files

Advanced MapReduce

Partitioner
Distributed Cache
Job Scheduling: FIFO, Fair Scheduler, Capacity Scheduler
Thinking in the MapReduce way
Serialization and File-Based Data Structures
Mapper and Reducer predefined implementations (IndentityMapper, InverseMapper, SumReducer, etc)
Default datatypes for k/v pairs: BoonleanWritable, ByteWritable, Text, IntWritable, etc
Input/output formats
Backlisted TaskTrackers
Counters
MapReduce configuration files: mapred-site.xml
Intro to Monitoring and Debugging on a production cluster
Next-gen MapReduce: YARN architecture details

Pigs Eat Anything

Pig philosophy and architecture
Pig Latin and the Grunt shell
Loading data
Data types and schemas
Pig Latin details: structure, functions, expressions, relational operators
Intro to User Defined Functions and Scripts

Hive for Structured Data

Hive philosophy and architecture
Hive vs. RDBMS
HiveQL and Hive Shell
Managing tables
Data types and schemas
Querying data
Partitions and Buckets
Intro to User Defined Functions

Real-time I/O with HBase

NoSQL architectures overview: Key-value, Key-document, Column Family, Graph, Real Time
HBase architecture
HBase vs Cassandra
HBase versions and origins
HBase vs. RDBMS
HBase Master and Region Servers
Intro to ZooKeeper
Data Modeling
Column Families and Regions
Bloom Filters and Block Indexes
Block Cache
Write Pipeline/ Read Pipeline
Deletes and Tombstones
Compactions: Minor vs. Major
Table Scans and Filters
Increment columns
Hardware trends for HBase, Sizing
HBase Operations and Troubleshooting: HTrace, Hanibal, Ganglia

Encarta Labs Advantage

One Stop Corporate Training Solution Providers for over 6,000 various courses on a variety of subjects
All courses are delivered by Industry Veterans
Get jumpstarted from newbie to production ready in a matter of few days

Trained more than 50,000 Corporate executives across the Globe
All our trainings are conducted in workshop mode with more focus on hands-on sessions

View our other course offerings by visiting https://www.encartalabs.com/course-catalogue-all.php

Contact us for delivering this course as a public/open-house workshop/online training for a group of 10+ candidates.

Top

Notice