Hadoop and NoSQL 101
Traditional frameworks of data management, heaving under the weight of the world’s increasingly titanic datasets, have had to give way to new technologies that redefine how we work with information at a genuinely gargantuan scale. And not a moment too soon. In the blink of an eye, contemporary companies were freshly charged with making sense of — and storing — the truly massive quantities of data being generated from tweets, Web-surfing logs and any device that is connected to the Internet.
And whether you’re a product manager, digital strategist, web developer or just a fellow with a fondness for the exciting potential of big data, it behooves you to get at least a working handle on the stuff.
Take NoSQL and Hadoop, for two, a couple of curiously monikered big data tools whose enhanced capabilities await your better understanding of the systems involved in your revised big-data plan. Here’s a primer on the pair.
The appearance on the scene of NoSQL — or “Not Only SQL” — ushered in a new generation of databases designed to manage high-volume, unstructured data. This framework hones in on the concept of distributed databases, where unstructured data are disseminated across multiple processing nodes, and often across multiple servers. In this way, growing pots of data can be managed simply by adding the hardware required to accommodate it.
NoSQL’s ultimate objective is to provide a powerful means of accessing and utilizing these ever-growing accumulations of poly-structured data across a high number of computers. Thanks in part to its lack of structure (as compared to the highly structured nature of relational databases), NoSQL is well adapted to the heavy demands of big data, and allows for high-performance, agile processing of same at a massive scale.
With the likes of Google, Amazon and the CIA in its camp, the NoSQL distributed database infrastructure oversees some of the world’s biggest data warehouses.
Unlike NoSQL, Hadoop is not a type of database, but a Java-based software framework. And its facility to accommodate the enormous processing needs of big data has profoundly rewritten the big data landscape.
Hadoop helps users handle compute-intensive processes on server clusters with large data volumes by way of massively parallel computing. That means applications can distribute complex computing tasks across thousands of nodes and data can be spread across thousands of commodity servers without suffering a downgrade to either speed or performance.
Hadoop, trumpeted a Wall Street Journal piece this summer, is “challenging tech heavyweights like Oracle and Teradata [whose] core database technology is too expensive and ill-suited for typical big data tasks.”