Apache Hadoop 3.0 goes GA, adds hooks for cloud and GPUs

Is this the post-Hadoop era? Not in the eyes of Hadoop 3.0 backers, who see the latest update to the big data framework succeeding in machine learning applications and cloud systems.

It may still have a place among screaming new technologies, but Apache Hadoop is neither quite as new nor quite as screaming as it once was. And the somewhat subdued debut of Apache Hadoop 3.0 reflects that.

Case in point: In 2017, the name Hadoop was removed from the title of more than one event previously known as a “Hadoop conference.” Also, IBM dropped off the list of Hadoop distro providers, and it was a year in which machine learning applications — and tools like Spark and TensorFlow — became the focus of many big data efforts.

So, the low level of fanfare that accompanied the mid-December release of Hadoop 3.0 wasn’t too surprising. The release does hold notable improvements, however. This update to the 11-year-old distributed data framework reduces storage requirements, allows cluster pooling on the latest graphics processing unit (GPU) resources, and adds a new federation scheme that enables the crucial Hadoop YARN resource manager and job scheduler to greatly expand the number of Hadoop nodes that can run in a cluster.

This latter capability could find use in Hadoop cloud applications — where many appear to be heading.