Skip to content

Archive for

Hadoop and NoSQL 101

Traditional frameworks of data management, heaving under the weight of the world’s increasingly titanic datasets, have had to give way to new technologies that redefine how we work with information at a genuinely gargantuan scale. And not a moment too soon. In the blink of an eye, contemporary companies were freshly charged with making sense of — and storing — the truly massive quantities of data being generated from tweets, Web-surfing logs and any device that is connected to the Internet.

And whether you’re a product manager, digital strategist, web developer or just a fellow with a fondness for the exciting potential of big data, it behooves you to get at least a working handle on the stuff.

ImageTake NoSQL and Hadoop, for two, a couple of curiously monikered big data tools whose enhanced capabilities await your better understanding of the systems involved in your revised big-data plan. Here’s a primer on the pair.


The appearance on the scene of NoSQL — or “Not Only SQL” — ushered in a new generation of databases designed to manage high-volume, unstructured data. This framework hones in on the concept of distributed databases, where unstructured data are disseminated across multiple processing nodes, and often across multiple servers. In this way, growing pots of data can be managed simply by adding the hardware required to accommodate it.

NoSQL’s ultimate objective is to provide a powerful means of accessing and utilizing these ever-growing accumulations of poly-structured data across a high number of computers. Thanks in part to its lack of structure (as compared to the highly structured nature of relational databases), NoSQL is well adapted to the heavy demands of big data, and allows for high-performance, agile processing of same at a massive scale.

With the likes of Google, Amazon and the CIA in its camp, the NoSQL distributed database infrastructure oversees some of the world’s biggest data warehouses.


Unlike NoSQL, Hadoop is not a type of database, but a Java-based software framework. And its facility to accommodate the enormous processing needs of big data has profoundly rewritten the big data landscape.

Hadoop helps users handle compute-intensive processes on server clusters with large data volumes by way of massively parallel computing. That means applications can distribute complex computing tasks across thousands of nodes and data can be spread across thousands of commodity servers without suffering a downgrade to either speed or performance.

Hadoop, trumpeted a Wall Street Journal piece this summer, is “challenging tech heavyweights like Oracle and Teradata [whose] core database technology is too expensive and ill-suited for typical big data tasks.”

Target Debacle Reminds of Need for Big Data Privacy Sensitivity

Last June, the Software & Industry Information Association urged policymakers to resist introducing a slew of punishing rules and regulations around data collection and analysis. The software industry trade group was anticipating government’s reaction to the troubling appearance of security breaches of sensitive information, and urging restraint lest big data’s power be stifled.

Fast forward to December, when as many as 70 million Target customers had personal details associated with their credit and debit cards exposed thanks to a pre-Christmas security SNAFU at the massive retailer. And so the concerns continue.

As such, it’s no surprise that wary corporate types are in equal parts charmed by the massive capabilities of big data and anxious about finding themselves in a similar public relations mess. But it’s important that those in a position to exploit the stuff keep big data — including its related risks and shortcomings — in perspective. Because at the end of the day, this still-emerging corporate tool is one to be embraced, and not to retreat from.

ImageReassuring customers likewise should be an ongoing campaign among big data proponents. Some tips:

• Be upfront. Companies that are transparent about their plans for your information enjoy much better optics than those whose surreptitious use of your particulars is only revealed after the fact. It’s fine to own up to plans for limited disclosure of personal data — it’s 2014 after all — so long as you state your data-usage intentions clearly from the get-go.

• Speaking of intentions, make sure there’s something in them for the holders of the information in question. Ideally, big data revelations need to be beneficial to information collector and information provider, both. When consumers feel they’re getting a tangible benefit in exchange for their personal information, experience has demonstrated, their resistance to data collection diminishes.

• Full disclosure notwithstanding, there’s no denying the inherent trickiness of your ultimate plan to share customers’ personal deets. And you simply cannot answer for the privacy policies of those companies with whom you intend to communicate. The best release valve here might be an opt-out provision, in which customers get the opportunity to restrict their personal information’s dissemination.

So long as big data, in all its opportunity-expanding glory, is part of our revised life path, so too will concerns about privacy and information violations be. Governments and businesses interested in getting the most from its emerging promise would do well to step carefully and purposefully along it.

Predictive Analytics Market Poised for Growth

The predictive analytics market is as huge and promising as it is misunderstood. So says new research from ResearchMoz in the States.


This report — “Predictive Analytics Market [(Fraud, Risk, Marketing, Operations), Verticals (BFSI, Healthcare, Environment, Government, Retail, Energy, Manufacturing, Transportation, Travel, Telecom, Sports)]: Worldwide Market Forecasts and Analysis (2013 – 2018)” — forecasts the scene for the global predictive analytics market for the next five years, breaking out sub-segments and offering cross-sectional analysis according to such market parameters as geography, software solutions type, mode of delivery, end-use industry and applications.

It’s a massive document (running 406 pages and including 182 market data tables) that stretches its reach across an ambitious range of subject areas and potential applications, but the short story is that the practice of using predictive analytics — or the extraction of meaningful information from data sets for estimating future probabilities — is on the rise.

The report declares that the predictive analytics market will grow from its current size of US$1.70 billion to $5.24 billion in 2018 at a compound annual growth rate of 25.2 percent. North America is expected to be the biggest market in terms of revenue contribution.

The surge is thanks to professional organizations’ transitions from traditional BI techniques to analytical approaches that are considerably more sophisticated. The development throws wide fresh opportunities for big players and smart new startups alike.

But the growth curve could be even more dramatic, say the executives who’ve analyzed the report’s results. It’s hampered by a serious lack of awareness about the stuff among the corporate population. On top of that, the time commitment required to learn this new approach and implement it into a company’s existing business process limits predictive analytics’ potential.

Still, the widespread conviction on the subject is that predictive analytics is primed for a phenomenal future.