What is big data? Everything you need to know


Every day human beings eat, sleep, work, play, and produce data—lots and lots of data. According to IBM, the human race generates every day. That’s the equivalent of a stack of DVDs reaching to the moon and back, and encompasses everything from the texts we send and photos we upload to industrial sensor metrics and machine-to-machine communications.

That’s a big reason why “big data” has become such a common catch phrase. Simply put, when people talk about big data, they mean the ability to take large portions of this data, analyze it, and turn it into something useful.

Exactly what is big data?

But big data is much more than that. It’s about:

  • taking vast quantities of data, often from multiple sources
  • and not just lots of data but different kinds of data—often, multiple kinds of data at the same time, as well as data that changed over time—that didn’t need to be first transformed into a specific format or made consistent
  • and analyzing the data in a way that allows for ongoing analysis of the same data pools for different purposes
  • and doing all of that quickly, even in real time.

In the early days, the industry came up with an acronym to describe three of these four facets: VVV, for volume (the vast quantities), variety (the different kinds of data and the fact that data changes over time), and velocity (speed).

as an alternative to MapReduce. Because Spark performs calculations in parallel using in-memory storage, it can be up to 100 times faster than MapReduce. Spark can work .

Even with Hadoop, you still need a way to store and access the data. That’s typically done via like MongoDB, like CouchDB, or Cassandra, which specialize in handling unstructured or semi-structured data distributed across multiple machines. Unlike in data warehousing, where massive amounts and types of data are converged into a unified format and stored in a single data store, these tools don’t change the underlying nature or location of the data—emails are still emails, sensor data is still sensor data—and can be stored virtually anywhere.

Still, having massive amounts of data stored in a NoSQL database across clusters of machines isn’t much good until you do something with it. That’s where big data analytics comes in. Tools like , , and Jasper BI let you parse that data to identify patterns, extract meaning, and reveal new insights. What you do from there will vary depending on your needs.

InfoWorld Executive Editor Galen Gruman, InfoWorld Contributing Editor Steve Nunez, and freelance writers Frank Ohlhorst and Dan Tynan contributed to this story.