In the rush to big data, we forgot about search


I read David Linthicum’s post with great interest. A huge reason that I decided my next job would be for a search company was because of this very problem. (That’s why I now work for LucidWorks, which produces – and -based search tools.) While working with clients, I realized that with big data and the cloud a tough problem, finding things was becoming worse. I had seen the upcoming meltdown as the use of Hadoop formed yet another data silo and as a result produced few actual insights.

Part of the problem is that the technology industry is trend-driven rather than problem-solving. A few years ago, it was all about client/server under the guise of distributed computing à la Enterprise JavaBeans, followed by web services and then . Now it is all about . Many of these steps were important, and machine learning is an important tool for solving problems.

We lost indexing and search as big data emerged

But sadly, the most important problem-solving trend got lost in the shuffle: indexing and search.

The modern web began with search. The web would be a lot smaller if Yahoo and the search portals of the late 1990s had triumphed. The dot-com bomb happened and yet Google was born from its ashes. Search also birthed big data and arguably the modern machine learning trend. Google, Facebook, and other companies needed more ways to handle their indexing jobs and their large amounts of data distributed to internet scale. Meanwhile, they needed better ways to find and organize data after they ran upon the limits of crowdsourcing and human intelligence.

, we need to do is redefine “integration” for the distributed and cloud computing era.

Data integration used to mean just that: grabbing all the data and dumping it into a big, fat, single area. First this was with databases, then data warehouses, and then Hadoop. Ironically, we moved further away from indexed technology when doing this.