Learning Big Data : Hadoop MapReduce: Basic concepts

Namenode manages the filesystem namespace, the file system tree and the metadata for all the files and directories in the tree.
Namespace stores information persistently in two files: Namespace image and edit log.
Namenode knows the datanodes on which all the blocks are located. But the exact block location is stored by datanode.
Secondary Namenode:

It doesn't act as a namenode
Its main role is to periodically merge the namespace image with edit log to prevent edit log from becoming too large.
Secondary namenode keeps the copies of the merged namespace image incase primary namenode fails.
Secondary namenode lags that of the primary, so in case of primary namenode failure data loss is certain.
Secondary namenode runs on a separate physical machine with plenty of CPU and memory.

When metadata exceeds the namenode memory then memory becomes the bottleneck. To avoid this situation, "HDFS Federation" feature introduced in 2.x release allows the cluster to add namenodes.
Each namenode in the cluster manages a portion of filesystem namespace. For example, one namenode can manage all the files under /root and second namenode manages all files under /home.
Each namenode manages a namespace volume and a block pool. Namespace volume is made up of the metadata for the namespace and block pool contains all the blocks for the files in the namespace.
These namenodes under federation do not communicate with each other. The failure of one namenode doesn't affect the availability of another.

One method is to replicate persistent state of namenode on multiple filesystems. These writes are synchronous and atomic. The usual choice is to write on local disk and on remote NFS mount.
Secondary namenode doesn't act as primary namenode. But in case of primary failure, metadata on NFS can be copied to secondary and then run it as new primary. To perform all these actions, there will be a downtime of up to 30 mins.
Since secondary always lags with that of primary, data loss is inevitable.

HDFS High availability in Hadoop 2.x resolves namenode high availability issue in Hadoop 1.x.
In this implementation, there are a pair of namenodes in an active-standby configuration. If active namenode fails, standby takes over as new active. In this configuration, data nodes must send block reports to both namenodes, active and standby. So standby always have latest state available in memory.
The observed failover time in this case will be around a minute.

The transition from active namenode to standby is managed by failover controller. The default implementation uses ZOOKEEPER to ensure one namenode is always active.
There are two types of failovers:

Graceful failover is when admin manually initiates failover as part of maintenance. Failover controller arranges an orderly transition.
Ungraceful failover can be triggered by

In case of slow network, standby namenode takes over the active assuming active namenode is down. But in this case previously active namenode is still running. So HA implementation (ZooKeepe make sures previously active namenode is stopped - a method known as fencing.
Different Fencing methods are

Distcp is an efficient replacement for "hadoop fs -cp".
Distcp copies the files to and from filesystems in parallel.
Distcp is implemented as a mapreduce job where the work of copying files is done by mapper in parallel and no reducers.
By default, upto 20 mappers are used.
Examples:

hadoop distcp dir1 dir2 -> Copies dir1 from HDFS file system to dir2 on the same HDFS file system
hadoop distcp file:///dir1 dir2 -> Copies dir1 from Local file system to dir2 on HDFS file system
hadoop distcp webhdfs:///dir1 webhdfs:///dir2 -> Copies dir1 from HDFS file system to dir2 on another HDFS file system. When two clusters are running incompatible versions of HDFS, we can use webhdfs protocol to distcp between them.

When multiple files are being copied by distcp, the first replica of each block in each file would reside on the node running the map taking network topology into account. The second and third replicas are spread across the cluster.
But the node running the map would be unbalanced.
Balancer tool can be used to even out the block distribution across the cluster.

Learning Big Data