Big Data Made Easy: A Working Guide to the Complete Hadoop Toolset

Big Data Made Easy: A Working Guide to the Complete Hadoop Toolset

Book Preface

If you would like to learn about the big data Hadoop-based toolset, then Big Data Made Easy is for you. It provides a wide overview of Hadoop and the tools you can use with it. I have based the Hadoop examples in this book on CentOS, the popular and easily accessible Linux version; each of its practical examples takes a step-by-step approach to installation and execution. Whether you have a pressing need to learn about Hadoop or are just curious, Big Data Made Easy will provide a starting point and offer a gentle learning curve through the functional layers of Hadoopbased big data. Starting with a set of servers and with just CentOS installed, I lead you through the steps of downloading, installing, using, and error checking.

The book covers following topics:
• Hadoop installation (V1 and V2)
• Web-based data collection (Nutch, Solr, Gora, HBase)
• Map Reduce programming (Java, Pig, Perl, Hive)
• Scheduling (Fair and Capacity schedulers, Oozie)
• Moving data (Hadoop commands, Sqoop, Flume, Storm)
• Monitoring (Hue, Nagios, Ganglia)
• Hadoop cluster management (Ambari, CDH)
• Analysis with SQL (Impala, Hive, Spark)
• ETL (Pentaho, Talend)
• Reporting (Splunk, Talend)

As you reach the end of each topic, having completed each example installation, you will be increasing your depth of knowledge and building a Hadoop-based big data system. No matter what your role in the IT world, appreciation of the potential in Hadoop-based tools is best gained by working along with these examples. Having worked in development, support, and testing of systems based in data warehousing, I could see that many aspects of the data warehouse system translate well to big data systems. I have tried to keep this book practical and organized according to the topics listed above. It covers more than storage and processing; it also considers such topics as data collection and movement, scheduling and monitoring, analysis and management, and ETL and reporting.

This book is for anyone seeking a practical introduction to the world of Linux-based Hadoop big data tools. It does not assume knowledge of Hadoop, but it does require some knowledge of Linux and SQL. Each command use is explained at the point it is utilized.

  • File Type: PDF
  • Upload Date: January 6, 2015

Do you like this book? Please share with your friends!

How to Read and Open File Type for PC ?

You may also be interested in the following ebook:

The Little SAS Book: A Primer, Fifth Edition The Little SAS Book: A Primer, Fifth Edition
  • Lora D. Delwiche and Susan J. Slaughter