Yahoo makes TensorFlow and Spark better together


Want Google TensorFlow’s deep learning chocolate in your Spark peanut butter? Good news: Yahoo has unveiled to satisfy that craving.

Last year Yahoo of big data and machine learning, integrating the in-memory data processing framework Spark with the deep learning framework Caffe. Applications written in Spark would have Caffe’s training functionality built into them or use trained models to make predictions that weren’t possible with Spark’s native machine learning.

The latest Yahoo project, TensorFlowOnSpark (TFoS), does exactly what it says: It adds support for the TensorFlow deep learning library into Spark.

In a blog post, Yahoo’s Big ML engineering team described how this arose from the need to make TensorFlow easier to deploy on existing clusters, like those running Spark. Several projects already aimed to do that: Databricks’ TensorFrames, which uses GPU acceleration, and the project, created at the same Berkeley lab that gave rise to Spark.

as a core feature, but it’s in the works. Rather than wait, Yahoo elected to create its own RDMA support and add it to TensorFlow’s C++ layer; the company is  as alpha-quality code.

Even without Yahoo’s contributions, TensorFlow has been progressing by leaps and bounds. The of the framework introduced optimizations that make it possible to deploy it on smartphone-grade hardware, and as the deep learning system for its custom machine learning hardware.

, the deep learning system that Amazon has thrown its weight behind. Amazon claims MXNet scales better than the competition across multiple nodes, so it’s faster to train models if you have the hardware to devote to the problem. It’ll be worth seeing how TensorFlowOnSpark compares—both in running on big clusters and in convenience to work with.