Learning Big Data : Hadoop 1.x Limitations

Tuesday, April 11, 2017

Hadoop 1.x Limitations

Limitations of Hadoop 1.x:

No horizontal scalability of namenode
Does not support Namenode High availability
Overburdened Job Tracker
Not possible to run Non-MapReduce Big data applications on HDFS
Does not support Multi-Tenancy

Problem: Namenode - No Horizontal Scalability

Hadoop 1.x supports Single Namenode and Single Namespace, limited by namenode RAM. Even though we have hundreds of DataNodes in the cluster, the NameNode keeps all its metadata in memory, so we are limited to a maximum of only 50-100M files in the entire cluster because of a Single NameNode and Single Namespace.

Problem: Namenode - No High Availability

NameNode is Single Point of Failure. Without namenode the filesystem can't be used. We need to manually recover using Secondary NameNode in case of failure. Since secondary always lags with that of primary, data loss is inevitable.

Problem: Job Tracker is Overburdened

Job Tracker spends significant portion of time and effort managing the life cycle of Applications.

Problem: No Multi-Tenancy. Non-MapReduce jobs not supported

In Hadoop 1.x, dedicates all the Datanode resources to Map and Reduce slots. Other workloads such as Graph processing etc is not allowed to utilize the data on HDFS.

Hadoop 2.x features addressing Hadoop 1.x limitations:

HDFS Federation
HDFS High Availability
YARN

HDFS Federation:

HDFS Federation solves the "Namenode - No Horizontal Scalability" problem by using multiple independent Namenodes each of which can manage a portion of filesystem Namespace.

HDFS High Availability:

HDFS High availability in Hadoop 2x resolves namenode high availability issue in Hadoop 1.x. In this implementation, there are a pair of namenodes in an active-standby configuration. If active namenode fails, standby takes over as new active. In this configuration, data nodes must send block reports to both namenodes, active and standby. So standby always have latest state available in memory.

YARN:

YARN is designed to overcome the disadvantage of too much burden on JobTracker in Hadoop 1.x. YARN also supports multi-tenancy approach. YARN adds more general interface to run non-hadoop jobs within hadoop framework.

1 comment:

veera cynixitSeptember 30, 2020 at 1:55 AM
Very nice Blog,keep sharing more information with us.
thank you.....

big data hadoop course

hadoop admin course
ReplyDelete
Replies

Add comment

Learning Big Data

Tuesday, April 11, 2017

Hadoop 1.x Limitations

1 comment:

Amazon S3: Basic Concepts

Total Pageviews

Search This Blog