Chapter 14. Hardware and Software for Hadoop

Table of Contents

14.1. Hardware
14.2. Software
Operating System
Java

14.1. Hardware

Hadoop runs on commodity hardware. That doesn't mean it runs on cheapo hardware. Hadoop runs on decent server class machines.

Here are some possibilities of hardware for Hadoop nodes.

Table 14.1. Hardware Specs

 MediumHigh End
CPU 8 physical cores 12 physical cores
Memory 16 GB 48 GB
Disk 4 disks x 1TB = 4 TB 12 disks x 3TB = 36 TB
Network 1 GB Ethernet 10 GB Ethernet or Infiniband

So the high end machines have more memory. Plus, newer machines are packed with a lot more disks (e.g. 36 TB) -- high storage capacity.

Examples of Hadoop servers

So how does a large hadoop cluster looks like? Here is a picture of Yahoo's Hadoop cluster.

image credit to : http://developer.yahoo.com/blogs/ydn/posts/2007/07/yahoo-hadoop/

14.2. Software

Operating System

Hadoop runs well on Linux. The operating systems of choice are:

RedHat Enterprise Linux (RHEL)

This is a well tested Linux distro that is geared for Enterprise. Comes with RedHat support

CentOS

Source compatible distro with RHEL. Free. Very popular for running Hadoop. Use a later version (version 6.x).

Ubuntu

The Server edition of Ubuntu is a good fit -- not the Desktop edition. Long Term Support (LTS) releases are recommended, because they continue to be updated for at least 2 years.

Java

Hadoop is written in Java. The recommended Java version is Oracle JDK 1.6 release and the recommended minimum revision is 31 (v 1.6.31).

So what about OpenJDK? At this point the Sun JDK is the 'official' supported JDK. You can still run Hadoop on OpenJDK (it runs reasonably well) but you are on your own for support :-)


This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. Creative Commons License