Introduction to Apache Hadoop Development

Introduction to Apache Hadoop Development

Bascom Bridge’s Apache Hadoop Training: Introduction to Apache Hadoop Development training class teaches attendees how to build distributed, data-intensive applications using the Hadoop framework. Students learn the principles of parallel programming and how use Big Data tools such as Pig, Hive, and HBase.



  • Understand the principles of parallel computing
  • Understand Hadoop architecture (HDFS and MapReduce)
  • Use additional Big Data tools (Pig, Hive, HBase, etc.)
  • Learn Big Data patterns and best practices
  • Define Big Data project architecture
  • Understand and use NoSQL, Mahout, and Oozie


All attendees must be comfortable with the Java programming language (since all programming exercises are in Java), familiar with Linux commands, and proficient in an IDE like Eclipse or a Linux editor (VI / nano) for modifying the code.


All attendees receive courseware and a related textbook.


  • A web browser – any recent version of Chrome, Firefox, or Internet Explorer, with a recent version of Flash Player
  • An SSH client
  • We will provide Hadoop clusters in a remote environment.
  • For classes delivered online, all participants need either dual monitors or a separate device logged into the online session so that they can do their work on one screen and watch the instructor on the other. A separate computer connected to a projector or large screen TV would be another way for students to see the instructor’s screen simultaneously with working on their own.


  • Introduction
    • Hadoop history and concepts
    • Ecosystem
    • Distributions
    • High level architecture
    • Hadoop myths
    • Hadoop challenges (hardware / software)
  • HDFS
    • Concepts (horizontal scaling, replication, data locality, rack awareness)
    • Architecture
    • Namenode (function, storage, file system meta-data, and block reports)
    • Secondary namenode
    • HA Standby namenode
    • Data node
    • Communications / heart-beats
    • Block manager / balancer
    • Health check / safemode
    • read / write path
    • Navigating HDFS UI
    • Command-line interaction with HDFS
    • File systems abstractions
    • WebHDFS
    • Reading / writing files using Java API
    • Getting Data into / out of HDFS (Flume, Sqoop)
    • Getting HDFS stats
    • Latest in HDFS
    • Namenode HA and Federation
    • HDFS roadmap
  • MapReduce
    • Parallel computing before MapReduce
    • MapReduce concepts
    • Daemons: jobtracker / tasktracker
    • Phases: driver, mapper, shuffle/sort, and reducer
    • First MapReduce job
    • MapReduce UI walk through
    • Counters
    • Distributed cache
    • Combiners
    • Partitioners
    • MapReduce configuration
    • Job config
    • MR types and formats
    • Sorting
    • Job schedulers
    • MapReduce best practices
    • MRUnit
    • Optimizing MapReduce
    • Fool proofing MR
    • Thinking in MapReduce
    • YARN: architecture and use
  • Pig
    • Intro: principles and uses cases
    • Pig versus MapReduce
  • Hive
    • Intro: principles and uses cases
    • Environment and configuration
    • Hive tables and metadata
    • Hive keywords
  • HBase
    • History and concepts
    • Architecture
    • HBase versus RDBMS
    • HBase shell
    • HBase Java API
    • Splits and compaction
    • Read path / write path
    • Schema design
  • Real world Big Data skills and a hackathon
    • NoSQL design patterns: going from SQL to NoSQL
    • Smart Meter data collection with Flume
    • Sinks into HDFS and HBase
    • Analyzing smart meter data with Pig and Hive
    • Smart meter analytics with Mahout
    • Scheduling complete workflow with Oozie
  • Conclusion

Send a Comment

Your email address will not be published.


+91 9376007676  


Introduction to Apache Hadoop Development

1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)
  • Course No : HDP-100
  •  Theory : 50%
  •  Lab : 50%
  • Duration : 24 hours
Scroll Up
Skip to toolbar