Chapter 6. Hadoop for Developers

This section is a quick 'fact sheet' in a Q&A format.

What is Hadoop?

Hadoop is an open source software stack that runs on a cluster of machines. Hadoop provides distributed storage and distributed processing for very large data sets.

Is Hadoop a fad or here to stay?

Sure, Hadoop and Big Data are all the rage now. But Hadoop does solve a real problem and it is a safe bet that it is here to stay.

Below is a graph of Hadoop job trends from As you can see, demand for Hadoop skills has been up and up since 2009. So Hadoop is a good skill to have!

Figure 6.1. Hadoop Job Trends

Hadoop Job Trends

What skills do I need to learn Hadoop?

A hands-on developer or admin can learn Hadoop. The following list is a start - in no particular order

What kind of technical roles are available in Hadoop?

The following should give you an idea of the kind of technical roles in Hadoop.

Table 6.1. Hadoop Roles

Job TypeJob functionsSkills
Hadoop Developer develops MapReduce jobs, designs data warehouses Java, Scripting, Linux
Hadoop Admin manages Hadoop cluster, designs data pipelines Linux administration, Network Management, Experience in managing large cluster of machines
Data Scientist Data mining and figuring out hidden knowledge in data Math, data mining algorithms
Business Analyst Analyzes data! Pig, Hive, SQL superman, familiarity with other BI tools

I am not a programmer, can I still use Hadoop?

Yes, you don't need to write Java Map Reduce code to extract data out of Hadoop. You can use Pig and Hive. Both Pig and Hive offer 'high level' Map Reduce. For example you can query Hadoop using SQL in Hive.

What kind of development tools are available for Hadoop?

Hadoop development tools are still evolving. Here are a few:

Where can I learn more?

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. Creative Commons License