This section is a quick 'fact sheet' in a Q&A format.
Hadoop is an open source software stack that runs on a cluster of machines. Hadoop provides distributed storage and distributed processing for very large data sets.
Sure, Hadoop and Big Data are all the rage now. But Hadoop does solve a real problem and it is a safe bet that it is here to stay.
Below is a graph of Hadoop job trends from Indeed.com. As you can see, demand for Hadoop skills has been up and up since 2009. So Hadoop is a good skill to have!
A hands-on developer or admin can learn Hadoop. The following list is a start - in no particular order
The following should give you an idea of the kind of technical roles in Hadoop.
Table 6.1. Hadoop Roles
Job Type | Job functions | Skills |
---|---|---|
Hadoop Developer | develops MapReduce jobs, designs data warehouses | Java, Scripting, Linux |
Hadoop Admin | manages Hadoop cluster, designs data pipelines | Linux administration, Network Management, Experience in managing large cluster of machines |
Data Scientist | Data mining and figuring out hidden knowledge in data | Math, data mining algorithms |
Business Analyst | Analyzes data! | Pig, Hive, SQL superman, familiarity with other BI tools |
Yes, you don't need to write Java Map Reduce code to extract data out of Hadoop. You can use Pig and Hive. Both Pig and Hive offer 'high level' Map Reduce. For example you can query Hadoop using SQL in Hive.
Hadoop development tools are still evolving. Here are a few: