Hadoop, an Apache open source software framework for storing and crunching big data sets across clusters of machines, has hit the big time. Markets and Markets forecast in January 2017 that the Hadoop market could grow from $6.71 billion in 2016 to more than $40 billion by 2021. (For perspective, that same market expanded from $1.5 billion to $4 billion from 2012 through 2014.) Those forecasts are starting to look a little pessimistic, in fact, as global markets are forecast to improve steadily in 2018. But one thing remains certain: growth on such an enormous scale creates an urgent need for lots more able-bodied IT pros to develop, manage and administer Hadoop implementations.
Given ample time and a penchant for the topic, many people in the field feel that you can learn what you need to know about Hadoop via self-study. For those folks, browsing the plethora of documentation on the Apache Hadoop website is a good starting point. You can also download the open source Hadoop release, and take the opportunity to turn some knobs and explore Hadoop at your own pace. Administrators and developers who prefer a more structured learning experience can take advantage of free online training courses designed to get you up to speed fast.
In no particular order, here are more than a dozen terrific free sources for Hadoop training.
Formerly Big Data University, CognitiveClass.ai offers more than 50 courses on Hadoop, HBase, Pig, big data analytics, SQL, IBM BLU, DB2 and more, all available at your own pace. They also offer a broad suite of virtual labs to help students practice what they learn. Most courses are in English, but some are in Japanese, Spanish, Portuguese, Russian and Polish. BigDataUniversity still operates Portugese (.br) and Mandarin (.cn) websites.
Cloudera has a Cloudera Essentials for Apache Hadoop online video course that's distributed chapter by chapter, as well as Hadoop training aimed at administrators, data analysts, data scientists and developers. Your next step could be taking the three-lesson Introduction to Hadoop and MapReduce course, offered through Udacity. Cloudera also has SQL analytics workbench named HUE, designed to help businesses create their own self-service queries, which means it’s also a great learning tool for those getting to know the Hadoop environment.
Dispensing with glitz and glam, coreservlets.com provides a series of tutorials on developing big data applications with Hadoop, delivered from a straight-up text-based interface. Each tutorial section lets you follow along using PDFs and/or slideshares, but you also get downloadable virtual machines in some instances as well as exercises (with solutions).
Coursera has a large library of courses that are offered in partnership with several leading universities, such as UC San Diego, Stanford, Duke and many more. The company's policy states that you can access video lectures and certain non-graded assignments for free in all courses. These previews give you the opportunity to decide if you want to purchase a course (priced between $29 and $99) and perhaps keep going to complete a certificate. At latest search, the Coursera engine pulls up 37 hits for courses that mention Hadoop, including all kinds of Big Data and Data Science topics, along with a class from UCSD entitled “Hadoop Platform and Application Framework.”
Similar to Coursera, edX offers courses from well-known universities, as well as high-tech firms and other contributors. On the main web page, enter "hadoop" into the search field to see what's currently available. You can audit an edX course for free, and work through all assignments and exams, but only paid participants receive a certificate of completion. At present edX offers seven courses on Hadoop, all of which include coverage of the framework and platform, and three of which actually mention Hadoop in the course title.
DeZyre lets you learn about big data and Hadoop from industry experts, get a mentor and complete projects... for a fee. But the company's free tutorials are available to anyone, anytime. Browse the lengthy list of tutorials on the DeZyre Tutorials page and click into anything that sparks your interest — no signup needed. There are more than 20 courses on the site, of which two are free, and many of the remainders with direct, meaningful coverage of Hadoop and related subjects.
Hortonworks also has a lot of good for-a-fee courses as well as free Hadoop training and tutorials. For most tutorials, you'll need to download and install the Hortonworks Sandbox, and the company recommends other tutorials as prerequisites to ensure you're ready to learn most efficiently. As an originator of Hadoop technology, Hortonworks also offers one of the most comprehensive and well-respected portofolio of Hadoop training.
IBM developerWorks serves up free tutorials and tools for big data analytics, cloud computing and other high-tech categories, based on IBM technologies. For example, Choose IBM Open Platform for your Hadoop and Spark projects explores its Apache Hadoop and Apache Spark distribution. Along the way, it describes the purpose or function of each component, such as Spark, MapReduce, Sqoop and more. Although it's a little long in the tooth, Open Source Big Data for the Impatient is a solid tutorial that walks you through the fundamentals of big data and Hadoop, and has you download a Hadoop image (Cloudera is recommended) to work through examples of Hadoop, Hive, Pig, Oozie and Sqoop.
The Hadoop training and tutorials site managed by Anil Jain provides links to branded (for a fee) training as well as free online tutorials and pointers to recommended books on Hadoop. Several of the free resources Jain mentions are also featured in this blog post but you’ll find others here that are definitely worth a look-see.
MapR is the provider of a leading Apache Hadoop distribution. The company's on-demand Hadoop training courses include video lessons, labs, hands-on exercises and more, and can lead to certification as a Hadoop Cluster Administrator, Hadoop Data Analyst or Hadoop Developer. MapR currently offers Apache Hadoop Essentials, five different Cluster Administration courses, Developing Hadoop Applications and many more on-demand courses that cover HBase, MapR Streams, Apache Spark, Apache Drill and Apache Hive. Browse the on-demand training page for a complete list of course offerings.
Udacity is well known for its catalog of training courses on data science, web development, software engineering and mobile operating systems — built by Silicon Valley heavy-hitters like Facebook and Twitter, Cadence and many more. Udacity offers free courses and course materials, but you must enroll in a paid program to earn a Nanodegree credential. To see all free courses at a glance, go to the Courses and Nanodegree Programs page and check the Free Courses checkbox. Currently a search on “Hadoop” there turns up three classes, two on Hadoop itself, and another on real-time analytics with Apache Storm.
Udemy offers more than 40,000 free and for-a-fee courses on just about everything under the sun. When you get to the home page, enter "Hadoop free" in the search box to see what's currently being offered. Currently, you’ll get five hits on courses that range from 5 to more than 40 lectures each, aimed mainly at beginner to intermediate levels. All make specific and detailed mention of Hadoop.
Microsoft Virtual Academy offers a big data analytics video training course that focuses on HDInsight (which is Microsoft's managed Hadoop distribution that runs on the Azure cloud) and using Hadoop on Azure. The free video course covers Hive, Tez, Pig, Sqoop, Oozie and Mahout, and offers additional resources and next steps. The Microsoft Professional Program (MPP) offers a variety of certificates in the areas of Big Data and Data Science, among a variety of other topics as well.
As you would expect, YouTube has a long list of Hadoop training videos. Search for "Hadoop" on the main page, noodle through the hundred-plus results and pick some videos that look right for you.
There's also great information on Hadoop training resources exchanged by members of the Hadoop Users LinkedIn group. In addition if you visit LinkedIn Learning and search on Hadoop, no fewer than 297 hits currently pop up, as of this writing. Great stuff!
There's certainly no shortage of material on Hadoop, so you are certain to find something you can chew on to increase your skills and knowledge in this area.