Tuesday, April 11, 2017

Hadoop MapReduce: Basic concepts


  • Design of HDFS:
    • Suitable for Very Large Files
    • Streaming data access with write-once and read-many-times pattern.
    • Requires commodity hardware.
  • HDFS is not good for
    • Low latency data access like OLTP systems
    • Lots of small files because namenode memory is limited
    • Multiple writes and arbitrary file modifications
  • HDFS has two types of nodes
    • Namenode
    • Datanode
  • Namenode manages the filesystem namespace, the file system tree and the metadata for all the files and directories in the tree.
  • Namespace stores information persistently in two files: Namespace image and edit log.
  • Namenode knows the datanodes on which all the blocks are located. But the exact block location is stored by datanode.  
  • Secondary Namenode: 
    • It doesn't act as a namenode
    • Its main role is to periodically merge the namespace image with edit log to prevent edit log from becoming too large.
    • Secondary namenode keeps the copies of the merged namespace image incase primary namenode fails.
    • Secondary namenode lags that of the primary, so in case of primary namenode failure data loss is certain.
    • Secondary namenode runs on a separate physical machine with plenty of CPU and memory.
  • If namenode fails, all files on the filesystem is lost since metadata is lost.
  • HDFS Federation:
    • When metadata exceeds the namenode memory then memory becomes the bottleneck. To avoid this situation, "HDFS Federation" feature introduced in 2.x release allows the cluster to add namenodes.
    • Each namenode in the cluster manages a portion of filesystem namespace. For example, one namenode can manage all the files under /root and second namenode manages all files under /home.
    • Each namenode manages a namespace volume and a block pool. Namespace volume is made up of the metadata for the namespace and block pool contains all the blocks for the files in the namespace.
    • These namenodes under federation do not communicate with each other. The failure of one namenode doesn't affect the availability of another.  
  • High Availability of Namenode in Hadoop 1.x:
    • One method is to replicate persistent state of namenode on multiple filesystems. These writes are synchronous and atomic. The usual choice is to write on local disk and on remote NFS mount.
    • Secondary namenode doesn't act as primary namenode. But in case of primary failure, metadata on NFS can be copied to secondary and then run it as new primary. To perform all these actions, there will be a downtime of up to 30 mins.
    • Since secondary always lags with that of primary, data loss is inevitable.
  • HDFS High Availability:
    • HDFS High  availability in Hadoop 2.x resolves namenode high availability issue in Hadoop 1.x.
    • In this implementation, there are a pair of namenodes in an active-standby configuration. If active namenode fails, standby takes over as new active. In this configuration, data nodes must send block reports to both namenodes, active and standby. So standby always have latest state available in memory.
    • The observed failover time in this case will be around a minute.
  • Failover and fencing:
    • The transition from active namenode to standby is managed by failover controller. The default implementation uses ZOOKEEPER to ensure one namenode is always active.
    • There are two types of failovers:
      • Graceful failover
      • Ungraceful failover
    • Graceful failover is when admin manually initiates failover as part of maintenance. Failover controller arranges an orderly transition.
    • Ungraceful failover can be triggered by
      • Slow network
      • Failed active namenode
    • In case of slow network, standby namenode takes over the active assuming active namenode is down. But in this case previously active namenode is still running. So HA implementation (ZooKeepe make sures previously active namenode is stopped - a method known as fencing.
    • Different Fencing methods are
      • Killing the previously active namenode process
      • Revoking namenodes access to shared storage directory
      • Disabling the network ports
      • STONITH - Shoot The Other Node In The Head
    • Failover is transparent to the user.
  • Distcp:
    • Distcp is an efficient replacement for  "hadoop fs -cp".
    • Distcp copies the files to and from filesystems in parallel.
    • Distcp is implemented as a mapreduce job where the work of copying files is done by mapper in parallel and no reducers.
    • By default, upto 20 mappers are used.
    • Examples:
      • hadoop distcp dir1 dir2  -> Copies dir1 from HDFS file system to dir2 on the same HDFS file system
      • hadoop distcp file:///dir1 dir2 -> Copies dir1 from Local file system to dir2 on HDFS file system
      • hadoop distcp webhdfs:///dir1 webhdfs:///dir2  -> Copies dir1 from HDFS file system to dir2 on another HDFS file system. When two clusters are running incompatible versions of HDFS, we can use webhdfs protocol to distcp between them.
  • Balancer:
    • When multiple files are being copied by distcp, the first replica of each block in each file would reside on the node running the map taking network topology into account. The second and third replicas are spread across the cluster.
    • But the node running the map would be unbalanced.
    • Balancer tool can be used to even out the block distribution across the cluster.

3 comments:

  1. great info about hadoop in this blog At SynergisticIT we offer the best hadoop training in california

    ReplyDelete
  2. Hi,
    Very nice post,thank you for shring this article.
    keep updating...

    big data online training

    Hadoop admin training

    ReplyDelete
  3. Very nice article post,Thank you for sharing this awesome blog.
    keep updating more big data hadoop tutorials.

    Big Data Online Training

    ReplyDelete

Amazon S3: Basic Concepts

Amazon S3 is an reliable, scalable, online object storage that stores files. Bucket: A bucket is a container in Amazon S3 where the fil...