Limitations of Hadoop 1.x:
- No horizontal scalability of namenode
- Does not support Namenode High availability
- Overburdened Job Tracker
- Not possible to run Non-MapReduce Big data applications on HDFS
- Does not support Multi-Tenancy
Problem: Namenode - No Horizontal Scalability
Hadoop 1.x supports Single Namenode and Single Namespace, limited by namenode RAM. Even though we have hundreds of DataNodes in the cluster, the NameNode keeps all its metadata in memory, so we are limited to a maximum of only 50-100M files in the entire cluster because of a Single NameNode and Single Namespace.
Problem: Namenode - No High Availability
NameNode is Single Point of Failure. Without namenode the filesystem can't be used. We need to manually recover using Secondary NameNode in case of failure. Since secondary always lags with that of primary, data loss is inevitable.
Problem: Job Tracker is Overburdened
Job Tracker spends significant portion of time and effort managing the life cycle of Applications.
Problem: No Multi-Tenancy. Non-MapReduce jobs not supported
In Hadoop 1.x, dedicates all the Datanode resources to Map and Reduce slots. Other workloads such as Graph processing etc is not allowed to utilize the data on HDFS.
Hadoop 2.x features addressing Hadoop 1.x limitations:
- HDFS Federation
- HDFS High Availability
- YARN
HDFS Federation:
HDFS Federation solves the "Namenode - No Horizontal Scalability" problem by using multiple independent Namenodes each of which can manage a portion of filesystem Namespace.
HDFS High Availability:
HDFS High availability in Hadoop 2x resolves namenode high availability issue in Hadoop 1.x. In this implementation, there are a pair of namenodes in an active-standby configuration. If active namenode fails, standby takes over as new active. In this configuration, data nodes must send block reports to both namenodes, active and standby. So standby always have latest state available in memory.
YARN:
YARN is designed to overcome the disadvantage of too much burden on JobTracker in Hadoop 1.x. YARN also supports multi-tenancy approach. YARN adds more general interface to run non-hadoop jobs within hadoop framework.
Very nice Blog,keep sharing more information with us.
ReplyDeletethank you.....
big data hadoop course
hadoop admin course