In 2011, trumpeting that “the use of big data will underpin new waves of productivity growth and consumer surplus” and called out five areas ripe for a big data bonanza. In personal location data, for example, McKinsey projected a $600 billion increase in economic surplus for consumers. In health care, $300 billion in additional annual value was waiting for that next Hadoop batch process to run.
Five years later, according to a , we’re still waiting for the hype to be fulfilled. A big part of the problem, the report intones, is, well, us: “Developing the right business processes and building capabilities, including both data infrastructure and talent” is hard and mostly unrealized. All that work with Hadoop, Spark, Hive, Kafka, and so on has produced less benefit than we thought it would.
In part that’s because keeping up with all that open source software and stitching it together is a full-time job in itself. But you can also blame the bugbear that stalks every enterprise: institutional inertia. Not to worry, though: The same developers who made open source the lingua franca of enterprise development are now making big data a reality through the public cloud.
Paltry big data progress
On the surface the numbers look pretty good. According to a , a majority (62 percent) are looking to Hadoop for advanced/predictive analytics with data discovery and visualization (57 percent) also commanding attention.
of more than 2,500 data professionals across 1,400 companies and 77 countries, roughly 20 percent of respondents reported clusters of more than 100 nodes, a full 74 percent of which are in production. This represents double-digit year-over-year growth.
that interest in data lakes has mushroomed along with a propensity to build those lakes in public clouds.
This makes sense. Given that the very nature of data science — asking questions of our data to glean insight — requires a flexible approach, the infrastructure powering our big data workloads needs to enable this flexibility. In an interview, makes it clear that because “your resource mix is continually evolving, if you buy infrastructure it’s almost immediately irrelevant to your business because it’s frozen in time.”
Infrastructure elasticity is imperative to successful big data projects. Apparently more and more enterprises got this memo and are building accordingly. Perhaps not surprising, this shift in culture isn’t happening top-down; rather, it’s a bottom-up, .
What should enterprises do? Ironically, it’s more a matter of what they shouldn’t do: obstruct developers. In short, the best way to ensure an enterprise gets the most from its data is to get out of the way of its developers. They’re already taking advantage of the latest and greatest big data technologies in the cloud.