Data is eating the software that is eating the world


No one doubts that shapes every last facet of our 21st century existence. Given his vested interest in companies whose fortunes were built on software engineering, it was no surprise when Marc Andreessen that “software is eating the world.”

But what does that actually mean, and, just as important, does it still apply, if it ever did? These questions came to me recently when I reread Andreessen’s op-ed piece and noticed that he equated “software” with “programming.” Just as significant, he equated “eating” with industry takeovers by “Silicon Valley-style entrepreneurial technology companies” and then rattled through the usual honor roll of Amazon, Netflix, Apple, Google, and the like. What they, and others cited by Andreessen, have in common is that they built global-scale business models on the backs of programmers who bang out the code that drives web, mobile, social, cloud, and other 24/7 online channels.

Since in the Wall Street Journal in 2011, we’ve had more than a half-decade to see whether Andreessen’s epic statement of Silicon Valley triumphalism proved either prescient or, perhaps, merely self-serving and misguided. I’d say it comes down more on the prescient end of the spectrum, due to the fact that most (but not all) of the success stories he cited have continued their momentum in growth, profitability, acquisitions, innovation, and so forth. People from programming backgrounds – such as Mark Zuckerberg – are indeed the multibillionaire rockstars of this new business era. In this way, Andreessen has so far been spared , who saw many of the exemplars he cited in his 1982 bestseller “In Search of Excellence” go on to be deconstructed by business rivals or blindsided by trends they didn’t see coming.

Rise of the learning machines

However, it has become clear to everyone, especially the old-school disruptors cited by Andreessen, that “software,” as it’s normally understood, is not the secret to future success. Going forward, the agent of disruption will be the data-driven ML (machine learning) algorithms that power AI. In this new era, more of the logic that powers intelligent applications won’t be explicitly programmed. The days of predominantly declarative, deterministic, and rules-based application development are fast drawing to a close. Instead, the probabilistic logic at the heart of chatbots, recommendation engines, self-driving vehicles, and other AI-powered applications is being harvested directly from source data.

, , , models, and other ML algorithms upon which AI-centric apps depend.

To compound the marginalization of programmers in this new era, we’re likely to see more ML-driven code generation along the lines that I discussed in . Amazon, Google, Facebook, Microsoft, and other software-based powerhouses have made huge investments in data science, hoping to buoy their fortunes in the post-programming era. They all have amassed growing sets of training data from their ongoing operations. For these reasons, the “Silicon Valley-style” monoliths are confident that they have the resources needed to build, tune, and optimize increasingly innovative AI/ML-based algorithms for every conceivable application.

However, any strategic advantages that these giants gain from these AI/ML assets may be short-lived. Just as data-driven approaches are eroding the foundations of traditional programming, they’re also beginning to nibble at the edges of what highly skilled data scientists do for a living. These trends are even starting to chip away at the economies of scale available to large software companies with deep pockets.

AI and the Goliaths

We’re moving into an era in which anyone can tap into cloud-based resources to cheaply automate the development, deployment, and optimization of innovative AI/ML apps. In a “snake eating its own tail” phenomenon, ML-driven approaches will increasingly automate the creation and optimization of ML models, . And, from what we’re seeing in research initiatives such as , ML will also play a growing role in automating the acquisition and labeling of ML training data. What that means is that, in addition to abundant open-source algorithms, models, code, and data, the next-generation developer will also be able to generate ersatz but good-enough labeled training data on the fly to tune new apps for their intended purposes.