Introduction to Apache Hadoop for Administrators

Introduction to Apache Hadoop for Administrators

 

Bascom  Bridge ‘s Hadoop Administration Training: Introduction to Apache Hadoop for Administrators training class teaches attendees to install, maintain, monitor, troubleshoot, optimize, and secure Hadoop.

 

HADOOP ADMINISTRATION TRAINING OBJECTIVES

  • Understand the benefits of distributed computing
  • Understand the Hadoop architecture (including HDFS and MapReduce)
  • Define administrator participation in Big Data projects
  • Plan, implement, and maintain Hadoop clusters
  • Deploy and maintain additional Big Data tools (Pig, Hive, Flume, etc.)
  • Plan, deploy and maintain HBase on a Hadoop cluster
  • Monitor and maintain hundreds of servers
  • Pinpoint performance bottlenecks and fix them

HADOOP ADMINISTRATION TRAINING PREREQUISITES

All attendees must have prior systems administration experience. Knowledge of Hadoop and Distributed Computing is not required, but rather will be introduced and explained in the course.

HADOOP ADMINISTRATION TRAINING MATERIALS

All attendees receive courseware and a related textbook.

SOFTWARE NEEDED FOR EACH PC:

  • A web browser – any recent version of Chrome, Firefox, or Internet Explorer, with a recent version of Flash Player
  • An SSH client
  • We will provide Hadoop clusters in a remote environment.
  • For classes delivered online, all participants need either dual monitors or a separate device logged into the online session so that they can do their work on one screen and watch the instructor on the other. A separate computer connected to a projector or large screen TV would be another way for students to see the instructor’s screen simultaneously with working on their own.

HADOOP ADMINISTRATION TRAINING OUTLINE

  • Introduction
    • Hadoop history and concepts
    • Ecosystem
    • Distributions
    • High level architecture
    • Hadoop myths
    • Hadoop challenges (hardware / software)
  • Planning and installation
    • Selecting software and Hadoop distributions
    • Sizing the cluster and planning for growth
    • Selecting hardware and network
    • Rack topology
    • Installation
    • Multi-tenancy
    • Directory structure and logs
    • Benchmarking
  • HDFS operations
    • Concepts (horizontal scaling, replication, data locality, rack awareness)
    • Nodes and daemons (NameNode, Secondary NameNode, HA Standby NameNode, DataNode)
    • Health monitoring
    • Command-line and browser-based administration
    • Adding storage and replacing defective drives
  • MapReduce operations
    • Parallel computing before MapReduce: compare HPC versus Hadoop administration
    • MapReduce cluster loads
    • Nodes and Daemons (JobTracker, TaskTracker)
    • MapReduce UI walk through
    • MapReduce configuration
    • Job config
    • Job schedulers
    • Administrator view of MapReduce best practices
    • Optimizing MapReduce
    • Fool proofing MR: what to tell your programmers
    • YARN: architecture and use
  • Advanced topics
    • Hardware monitoring
    • System software monitoring
    • Hadoop cluster monitoring
    • Adding and removing servers and upgrading Hadoop
    • Backup, recovery, and business continuity planning
    • Cluster configuration tweaks
    • Hardware maintenance schedule
    • Oozie scheduling for administrators
    • Securing your cluster with Kerberos
    • The future of Hadoop
  • Conclusion

Send a Comment

Your email address will not be published.

CONTACT US

+91 9376007676  

INQUIRY NOW


,

Introduction to Apache Hadoop for Administrators

1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)
Loading...
  • Course No : HDP-102
  •  Theory : 50%
  •  Lab : 50%
  • Duration : 3.00 days

PREREQUISITES:

All attendees must have prior systems administration experience. Knowledge of Hadoop and Distributed Computing is not required, but rather will be introduced and explained in the course.

Scroll Up
Skip to toolbar