Demand for data by today’s business users is growing exponentially in two ways. First, business users have exhausted the opportunities in the data they hold. They want more sources of data to find new value, and they want the data to be accurate to deliver analytic outcomes. Second, the number of data-savvy business analysts is larger than ever and growing fast. To satisfy the increasing demand, IT departments must field a continuous stream of data requests—big and small.
While business execs see tremendous potential in putting more data at more users’ fingertips, organizations struggle to deliver that data. Traditional data integration tools, developed in the 1990s for populating data warehouses, are too brittle to scale to many data sources. They’re also prohibitively time-consuming and costly.
At the same time, new big data platforms only solve the data collection part of the problem. The data loaded into Hadoop is never enterprise-ready; it requires preparation before reaching a business user. The reality is that most enterprises are dealing with hundreds and thousands of sources. The only path to delivering clean, unified data across all of these sources is automation.
The critical importance of good data
Let’s take a simple example from the insurance industry. For instance, business analysts want to understand claim risk of an upcoming flood. They pull customers for the affected geographic region and filter for those with active home insurance. Immediately they need to drill into high property values and sparsely populated areas. This requires individual policies to be classified into more granular categories. Then they want a broad, accurate view so that they can correlate customers with both home and car insurance (in a different system, of course).