For everyone frustrated by how long it takes to train deep learning models, IBM has some good news: It has unveiled a way to automatically split deep-learning training jobs across multiple physical servers — not just individual GPUs, but whole systems with their own separate sets of GPUs.

Now the bad news: It’s available only in IBM’s PowerAI 4.0 software package, which runs exclusively on IBM’s own OpenPower hardware systems.

Distributed Deep Learning (DDL) doesn’t require developers to learn an entirely new deep learning framework. It repackages several common frameworks for machine learning: TensorFlow, Torch, Caffe, Chainer, and Theano. Deep learning projecs that use those frameworks can then run in parallel across multiple hardware nodes.

IBM claims the speedup gained by scaling across nodes is nearly linear. One benchmark, using the and data sets, needed 16 days to complete on one IBM . Spread across 64 such systems, the same benchmark concluded in seven hours, or 58 times faster.

 or . TensorFlow , but again any integration with other frameworks is something you’ll have to add by hand.

IBM’s claimed advantage is that it works with multiple frameworks and without as much heavy lifting needed to set things up. But those come at the cost of running on IBM’s own iron.