Hadoop, an Apache open-source software framework for storing and crunching big data sets across clusters of machines, has hit the big time. Markets and Markets forecast in January 2017 that the Hadoop market could grow to more than $40 billion by 2021. Those forecasts are starting to look pessimistic, in fact, as global markets are forecast to improve steadily in 2018. A 2018 Forbes report projected that Hadoop and the big data market will grow to more than $99 billion by 2022 (representing a whopping 28.5% projected growth rate).
One thing remains certain: Growth on such an enormous scale creates an urgent need for lots more able-bodied IT pros to develop, manage and administer Hadoop implementations.
Given ample time and a penchant for the topic, many people in the field feel that you can learn what you need to know about Hadoop via self-study. For those folks, browsing the plethora of documentation on the Apache Hadoop website is a good starting point. You can also download the open-source Hadoop release, and take the opportunity to turn some knobs and explore Hadoop at your own pace.
Administrators and developers who prefer a more structured learning experience can take advantage of free online training courses designed to get you up to speed fast.
Hadoop online training
In no particular order, here are more than a dozen terrific free sources for Hadoop training.
Formerly Big Data University, CognitiveClass.ai offers more than 50 courses on Hadoop, HBase, Pig, big data analytics, SQL, IBM BLU, DB2 and more, all available at your own pace.
You'll also find two learning paths dedicated to learning Hadoop Fundamentals for beginners and Hadoop Programming for more advanced Hadoop practitioners.
They also offer a broad suite of virtual labs to help students practice what they learn. Most courses are in English, but some are in Japanese, Spanish, and Russian. BigDataUniversity still operates Portuguese (.br) and Mandarin (.cn) websites.
2. Cloudera Essentials For Apache Hadoop
Cloudera has a Cloudera Essentials for Apache Hadoop online video course that's distributed chapter by chapter. At Cloudera University, you'll find Hadoop training aimed at administrators, data analysts, data scientists developers and security professionals.
Your next step could be taking the three-lesson Introduction to Hadoop and MapReduce course, offered through Udacity. Cloudera also has an SQL analytics workbench named HUE, which is designed to help businesses create their own self-service queries – it's also a great learning tool for those getting to know the Hadoop environment.
Dispensing with glitz and glam, coreservlets.com provides a series of tutorials on developing big data applications with Hadoop delivered from a straight-up text-based interface.
Each tutorial section lets you follow along using PDFs and/or slideshares, but you also get downloadable virtual machines in some instances as well as exercises (with solutions).
Coursera has a large library of courses that are offered in partnership with several leading universities, such as UC San Diego, Stanford, Duke and many more.
The company's policy states that you can access video lectures and certain nongraded assignments for free in all courses. These previews give you the opportunity to decide if you want to purchase a course (priced between $29 and $99) and perhaps keep going to complete a certificate.
At latest search, the Coursera engine pulls up 46 hits for courses that mention Hadoop, including all kinds of big data and data science topics, along with a class from UCSD entitled Hadoop Platform and Application Framework.
Similar to Coursera, edX offers courses from well-known universities, as well as high-tech firms and other contributors. On the main web page, enter "hadoop" into the search field to see what's currently available.
You can audit an edX course for free, and work through all assignments and exams, but only paid participants receive a certificate of completion. At present, edX offers seven courses on Hadoop, all of which include coverage of the framework and platform, and three of which actually mention Hadoop in the course title.
DeZyre lets you learn about big data and Hadoop from industry experts, get a mentor and complete projects … for a fee. But the company's free tutorials are available to anyone, anytime.
Browse the lengthy list of tutorials on the DeZyre Tutorials page and click on anything that sparks your interest – no signup needed. There are more than 18 courses on the site, of which two are free.
Hortonworks has a lot of good for-a-fee courses, as well as free Hadoop training and tutorials. For most tutorials, you need to download and install the Hortonworks Sandbox, and the company recommends other tutorials as prerequisites to ensure you're ready to learn most efficiently.
As an originator of Hadoop technology, Hortonworks offers one of the most comprehensive and well-respected portfolio of Hadoop training.
8. IBM developerWorks
IBM developerWorks serves up free tutorials and tools for big data analytics, cloud computing and other high-tech categories, based on IBM technologies. For example, Choose IBM Open Platform for your Hadoop and Spark projects explores its Apache Hadoop and Apache Spark distribution. Along the way, it describes the purpose or function of each component, such as Spark, MapReduce, Sqoop and more.
Although it's a little long in the tooth, Open Source Big Data for the Impatient is a solid tutorial that walks you through the fundamentals of big data and Hadoop. It has you download a Hadoop image (Cloudera is recommended) to work through examples of Hadoop, Hive, Pig, Oozie and Sqoop.
The Hadoop training and tutorials site managed by Anil Jain provides links to branded (for a fee) training as well as free online tutorials and pointers to recommended books on Hadoop.
Several of the free resources Jain mentions are featured in this blog post, but you'll find others here that are definitely worth a look-see.
10. MapR Technologies
MapR is the provider of a leading Apache Hadoop distribution. The company's on-demand Hadoop training courses include video lessons, labs, hands-on exercises and more, and can lead to certification as a MapR Certified Cluster Administrator (MCCA), Data Analyst (MCDA) or Certified Hadoop Developer (MCHD).
MapR currently offers Apache Hadoop Essentials, six different Cluster Administration courses, three Hadoop Developer courses, and many more on-demand courses that cover HBase, MapR Streams, Apache Spark, Apache Drill, and Apache Hive.
Browse the on-demand training page for a complete list of course offerings.
Udacity is well known for its catalog of training courses on data science, web development, software engineering and mobile operating systems built by Silicon Valley heavy-hitters like Facebook and Twitter, Cadence and many more. It offers free courses and course materials, but you must enroll in a paid program to earn a Nanodegree credential.
To see all free courses at a glance, go to the Courses and Nanodegree Programs page and select the Free Courses checkbox in the Filters section. Currently, a search on Hadoop there turns up three classes: two on Hadoop itself and another on real-time analytics with Apache Storm.
Udemy offers more than 40,000 free and for-a-fee courses on just about everything under the sun. When you get to the home page, enter "Hadoop free" in the search box to see what's currently being offered.
Currently, you'll find more than 35 courses that range from five to more than 60 lectures each, aimed mainly at beginner to intermediate levels. All make specific and detailed mention of Hadoop.
13. Microsoft Virtual Academy
Offered courses include Processing Big Data with Azure HDInsight (which is Microsoft's managed Hadoop distribution that runs on the Azure cloud), Processing Real-Time Data with Azure HDInsight and Implementing Predictive Analytics with Spark in Azure HDInsight. For graded quizzes and a certificate, a fee of $99 is required.
As you would expect, YouTube has a long list of Hadoop training videos. Search for Hadoop on the main page, noodle through the 100-plus results, and pick some videos that look right for you.
15. Hadoop Users LinkedIn Group
There's also great information on Hadoop training resources exchanged by members of the Hadoop Users LinkedIn Group. Also, if you visit LinkedIn Learning and do a search on Hadoop, no fewer than 297 hits currently pop up, as of this writing. Great stuff!
There's certainly no shortage of material on Hadoop, so you are certain to find something you can chew on to increase your skills and knowledge in this area.