Skip to main Content

Open Source Hadoop Administration

  • Course Code OSHA
  • Duration 4 days

Course Delivery

Additional Payment Options

  • GTC 32 inc. VAT

    GTC, Global Knowledge Training Credit, please contact Global Knowledge for more details

Public Classroom Price

£2,000.00

excl. VAT

Request Group Training Add to Cart

Course Delivery

This course is available in the following formats:

  • Public Classroom

    Traditional Classroom Learning

  • Virtual Learning

    Learning that is virtual

Request this course in a different delivery format.

Course Overview

Top

This open source course provides participants with a comprehensive understanding of the steps necessary to install, configure, operate and maintain Hadoop. The course begins with an overview of the Big Data landscape, and then dives into a system administration working view of running Hadoop.

Course Schedule

Top

Target Audience

Top

This course is intended for System administrators, DevOps engineers, and software developers responsible for managing and maintaining Hadoop clusters.

Course Objectives

Top

    Upon successful completion of this course, participants should be able to:

  • Describe the fundamental concepts of using Big Data
  • Identify where Hadoop fits into a Big Data strategy
  • Learn to plan your Hadoop cluster.
  • Learn HDFS features.
  • Learn how to get data into HDFS.
  • Learn to work with MapReduce.
  • Learn installation and configuration of Hadoop.
  • Learn cluster maintenance.

Course Content

Top
  • The content of this course is designed to support the course objectives.

Hadoop Introduction

  • A Brief History of Hadoop
  • Core Hadoop Components
  • Fundamental Concepts

Planning Your Hadoop Cluster

  • General Planning Considerations
  • Choosing Hardware
  • Network Considerations
  • Configuring Nodes
  • Planning for Cluster Management

HDFS

  • HDFS Features
  • Writing and Reading Files
  • NameNode Considerations
  • HDFS Security
  • Namenode Web UI
  • Hadoop File Shell

Getting Data into HDFS

  • Pulling data from External Sources with Flume
  • Importing Data from Relational Databases with Sqoop
  • REST Interfaces
  • Best Practices

• MapReduce

  • MapReduce overview
  • Features of MapReduce
  • Architectural Overview
  • YARN MapReduce Version 2
  • Failure Recovery
  • The JobTracker Web UI

Hadoop Installation & Initial Configuration

  • Configuration & Deployment Types
  • Installing Hadoop
  • Specifying the Hadoop Configuration
  • Initial HDFS & MapReduce Configuration
  • Log Files

Installing/Configuring Hive, Impala, and Pig

  • Hive
  • Impala
  • Pig

Hadoop Clients

  • What is a Hadoop Client?
  • Installing and Configuring Hadoop Clients
  • Installing and Configuring Hue
  • Hue Authentication and Configuration

Advanced Cluster Configuration

  • Advanced Configuration Parameters
  • Configuring Hadoop Ports
  • Explicitly Including and Excluding Hosts
  • Configuring HDFS for Rack Awareness & HDFS High Availability

Hadoop Security

  • Why Hadoop Security Is Important
  • Hadoop’s Security System Concepts
  • What Kerberos Is and How it Works
  • Securing a Hadoop Cluster with Kerberos

Managing and Scheduling Jobs

  • Managing Running Jobs
  • Scheduling Hadoop Jobs
  • Configuring the FairScheduler

Cluster Maintenance

  • Checking HDFS Status
  • Copying Data Between Clusters
  • Adding/Removing Cluster Nodes
  • Rebalancing the Cluster
  • NameNode Metadata Backup
  • Cluster Upgrades

 Cluster Monitoring and Troubleshooting

  • General System Monitoring
  • Managing Hadoop’s Log Files
  • Monitoring the Clusters
  • Common Troubleshooting Issues
Cookie Control toggle icon