Hadoop Illuminated

Mark Kerzner

Author 

Sujee Maniyam

Author 

Dedication

To the open source community

This book on GitHub
Companion project on GitHub

Acknowledgements

From Mark
I would like to express gratitude to my editors, co-authors, colleagues, and bosses who shared the thorny path to working clusters - with the hope to make it less thorny for those who follow. Seriously, folks, Hadoop is hard, and Big Data is tough, and there are many related products and skills that you need to master. Therefore, have fun, provide your feedback, and I hope you will find the book entertaining.

"The author's opinions do not necessarily coincide with his point of view." - Victor Pelevin, "Generation P"

From Sujee
To the kind souls who helped me along the way

Copyright © 2013 Hadoop illuminated LLC. All Rights Reserved.

Table of Contents

I. About this book
1. Who is this book for?
2. About Authors
II. High level introduction to Hadoop
3. Big Data
4. Soft Introduction to Hadoop
5. Hadoop for Executives
6. Hadoop for Developers
7. Hadoop Distributed File System (HDFS) -- Concept and Design
8. Introduction To MapReduce
9. Hadoop Use Cases and Case Studies
10. Hadoop Distributions
11. Big Data Ecosystem
12. Hardware and Software for Hadoop
13. Hadoop Challenges
14. Publicly Available Big Data Sets
III. Hadoop In Depth
15. HDFS NameNode
16. How MapReduce Works -- The Internals
17. Hadoop Versions
18. How Hadoop version 2 'looks' different from version 1
IV. Hadoop Administration
V. Hadoop Cookbook
19. Handy Classes
20. Handy Classes for HBase

List of Figures

4.1. Will you join the Hadoop dance?
4.2. The Hadoop Zoo
8.1. Dreams
8.2. Angel checks the seal
8.3. Micro-targeting the electorate
16.1. Map Reduce Job Submission Process - Hadoop 1

List of Tables

4.1. Comparison of Big Data
6.1. Hadoop Roles
10.1. Hadoop Distributions
11.1. Tools for Getting Data into HDFS
11.2. Querying Data in HDFS
11.3. Real time access to data
11.4. Databases for Big Data
11.5. Hadoop in the Cloud
11.6. Workflow Tools
11.7. Serialization Frameworks
11.8. Tools for Monitoring Hadoop
11.9. Applications that run on top of Hadoop
11.10. Distributed Coordination
11.11. Data Analytics on Hadoop
11.12. Distributed Message Processing
11.13. Stream Processing Tools
11.14. Miscellaneous Stuff
12.1. Hardware Specs
18.1. Start / Stop scripts for tar package version
18.2. Start / Stop scripts for rpm package version
18.3. Hadoop Command Split