Hadoop for Executives :: Hadoop Illuminated

	Hadoop Illuminated > Hadoop for Executives

Chapter 5. Hadoop for Executives

This section is a quick 'fact sheet' in a Q&A format.

What is Hadoop?

Hadoop is an open source software stack that runs on a cluster of machines. Hadoop provides distributed storage and distributed processing for very large data sets.

What is the license of Hadoop?

Hadoop is open source software. It is an Apache project released under Apache Open Source License v2.0. This license is very commercial friendly.

Who contributes to Hadoop?

Originally Hadoop was developed and open sourced by Yahoo. Now Hadoop is developed as an Apache Software Foundation project and has numerous contributors from Cloudera, Horton Works, Facebook, etc.

Isn't Hadoop used by foo-foo social media companies and not by enterprises

Hadoop had its start in a Web company. It was adopted pretty early by social media companies because the companies had Big Data problems and Hadoop offered a solution.

However, Hadoop is now making inroads into Enterprises.

I am not sure my company has a big data problem

Hadoop is designed to deal with Big Data. So if you don't have a 'Big Data Problem', then Hadoop probably isn't the best fit for your company. But before you stop reading right here, please read on :-)

How much data is considered Big Data, differs from company to company. For some companies, 10 TB of data would be considered Big Data; for others 1 PB would be 'Big Data'. So only you can determine how much is Big Data.

Also, if you don't have a 'Big Data problem' now, is that because you are not capturing some data? In some scenarios, companies chose to forgo capturing data, because there wasn't a feasible way to store and process it. Now that Hadoop can help with Big Data, it may be possible to start capturing data that wasn't captured before.

How much does it cost to adopt Hadoop?

Hadoop is open source. The software is free. However running Hadoop does have other cost components.

Cost of hardware : Hadoop runs on a cluster of machines. The cluster size can be anywhere from 10 nodes to 1000s of nodes. For a large cluster, the hardware costs will be significant.
The cost of IT / OPS for standing up a large Hadoop cluster and supporting it will need to be factored in.
Since Hadoop is a newer technology, finding people to work on this ecosystem is not easy.

See here for complete list of Chapter 15, Hadoop Challenges

What distributions to use?

Please see Chapter 11, Hadoop Distributions

What are the some of the use cases for Hadoop?

Please see Chapter 10, Hadoop Use Cases and Case Studies


Chapter 4. Hadoop and Big Data		Chapter 6. Hadoop for Developers