2015 was the year machine learning . No longer was it an esoteric discipline commanded by the few, the proud, the data scientists. Now it was, in theory, everyone’s business.
2016 was the year theory became practice. Machine learning’s power and promise, and all that surrounded and supported it, moved more firmly into the enterprise development mainstream.
That movement revolved around three trends: new and improved tool kits for machine learning, better hardware (and easier access to it), and more cloud-hosted, as-a-service variants of machine learning that provided both open source and proprietary tools.
1. New, revamped tool kits and frameworks helped lighten the load
Once upon a time, if you wanted to implement machine learning in an app, you had to roll the algorithms yourself. Eventually, third-party libraries came onto the field that saved you the trouble of reinventing the wheel, but still required a lot of heavy lifting to be productive. Now the state of the art involves frameworks designed to make machine learning an assembly-line process: Data in one end, trained models and useful results out the other.
that better complements Spark’s new internal architecture.
Another trend in the same vein: Products that handled data, but previously didn’t have a direct connection to machine learning, started offering machine learning acceleration as a new feature. Redis, the in-memory data caching system that pulls double duty as a database, as one application for its new .
A third trend in the field is the rise of new support tools for developing machine learning software. Sometimes it’s an entirely new language; for example, was created for writing high-speed, parallel algorithms that run well on CPUs, GPUs, and other hardware. Other times it’s tool kits for existing languages; to wit, enhances C/C++ applications that use the OpenMP tool set, speeding up access to big data sets.
2. GPUs and custom hardware arrived in force in the cloud—and everywhere else
Machine learning wasn’t made possible by the blisteringly fast computational power that GPUs provide, but GPUs certainly provide a performance boost that current-generation CPUs can’t even begin to approach.
. GPU speedups also started getting more notice in , especially those marketed as methods to feed data-hungry machine learning systems.
The other big GPU-related change was that every single major cloud vendor could now boast of having GPU-accelerated instances as part of its product lineup. With cloud-hosted GPUs, customers can buy enough of the processing power they need for a machine learning training job and get it at a scale that would be prohibitively difficult if they set up their own GPU-powered machine learning rigs in-house.
Amazon already had GPU-powered instances to its name, but : You could add or remove GPU processing from instances, rather than having to buy an instance with GPU processing as part of the package. Google, however, provided attach/detach functionality right out of the gate when it its first GPU-powered instances.
Microsoft Azure already offered GPUs as part of its cloud product lineup, but hinted at for user-programmable hardware in its datacenters. FPGAs, a class of high-speed programmable hardware, are currently in use to speed up networking in Azure, but Microsoft’s long-term plan is to offer access to similar devices to juice computation-intensive apps like—you guessed it—machine learning. (Amazon is also .)
One possible drawback to GPUs in the cloud: You don’t always get cutting-edge hardware at your disposal. When Amazon added new GPU instance types in September, it , most likely to offer a well-understood product rather than a newer but less familiar one.
3. Cloud-hosted algorithms democratized machine learning, but at a cost
“”—those were the words Microsoft used to describe its mission to bring machine learning resources to everyone through the cloud. It’s not a bad way to sum up what the other cloud giants aimed for as well: Provide tools for creating intelligent software that are as painless to leverage as any other API.
“” is another way to put it. As with other as-a-service offerings, the cloud does the heavy lifting—not only the provisioning of the systems, but the training of the models, and the hosting of the data used for the training. If you don’t already have the data in the cloud, new solutions abound for getting it there, like Amazon’s 100-petabyes-in-a-truck “.”
In many cases, you can skip the training and go right to a for everything you need. Such APIs emphasize convenience over transparency of function: requests in, intelligence out. For many people, that’s ideal, since it minimizes the amount of work involved. But it also means the mechanisms used to produce the answers generated are more opaque.
To get around this, you can depend on algorithms and mechanisms that are cloud-hosted versions of existing, familiar tools. Spark is , hosted both by its creators (Databricks) and by third parties like IBM and Microsoft in their respective clouds.
The plus side of this arrangement: You get to pick the route that’s best suited to your needs. The black-box API route should provide enough useful results for those who don’t require more than the commodified version of machine learning. But those in the business of pushing an envelope will likely want to roll their own machine-learning-powered solutions—and the bar is set to be raised further for both paths in the coming year.