Chapter 15. Hadoop Challenges

Table of Contents

15.1. Hadoop is a cutting edge technology
15.2. Hadoop in the Enterprise Ecosystem
15.3. Hadoop is still rough around the edges
15.4. Hadoop is NOT cheap
Hardware Cost
IT and Operations costs
15.5. Map Reduce is a different programming paradigm
15.6. Hadoop and High Availability

This chapter explores some of the challenges in adopting Hadoop in to a company.

15.1. Hadoop is a cutting edge technology

Hadoop is a new technology, and as with adopting any new technology, finding people who know the technology is difficult!

15.2. Hadoop in the Enterprise Ecosystem

Hadoop is designed to solve Big Data problems encountered by Web and Social companies. In doing so a lot of the features Enterprises need or want are put on the back burner. For example, HDFS does not offer native support for security and authentication.

15.3. Hadoop is still rough around the edges

The development and admin tools for Hadoop are still pretty new. Companies like Cloudera, Hortonworks, MapR and Karmasphere have been working on this issue. How ever the tooling may not be as mature as Enterprises are used to (as say, Oracle Admin, etc.)

15.4. Hadoop is NOT cheap

Hardware Cost

Hadoop runs on 'commodity' hardware. But these are not cheapo machines, they are server grade hardware. For more see Chapter 14, Hardware and Software for Hadoop

So standing up a reasonably large Hadoop cluster, say 100 nodes, will cost a significant amount of money. For example, lets say a Hadoop node is $5000, so a 100 node cluster would be $500,000 for hardware.

IT and Operations costs

A large Hadoop cluster will require support from various teams like : Network Admins, IT, Security Admins, System Admins.

Also one needs to think about operational costs like Data Center expenses : cooling, electricity, etc.

15.5. Map Reduce is a different programming paradigm

Solving problems using Map Reduce requires a different kind of thinking. Engineering teams generally need additional training to take advantage of Hadoop.

15.6. Hadoop and High Availability

Hadoop version 1 had a single point of failure problem because of NameNode. There was only one NameNode for the cluster, and if it went down, the whole Hadoop cluster would be inoperable. This has prevented the use of Hadoop for mission critical, always-up applications.

This problem is more pronounced on paper than in reality. Yahoo did a study that analyzed their Hadoop cluster failures and found that only a tiny fraction of failures were caused by NameNode failure.
TODO : link

However, this problem is being solved by various Hadoop providers. See ??? chapter for more details.


This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. Creative Commons License