LLVM-powered Pocl puts parallel processing on multiple hardware platforms


LLVM, the open source compiler framework that powers everything from Mozilla’s Rust language to Apple’s Swift, emerges in yet another significant role: an enabler of code deployment systems that target multiple classes of hardware for speeding up jobs like machine learning.

To write code that can run on —hugely useful with machine learning apps—it’s best to use the likes of , which allows a program to be written once, then automatically deployed across different types of hardware.

, an implementation of OpenCL recently revamped to version 0.14, uses the LLVM compiler framework to do the targeting. With Pocl, OpenCL code can be automatically deployed to any hardware platform with LLVM back-end support.

Pocl employs LLVM’s Clang front end to take in C code that uses the OpenCL standard. Version 0.14 works with both LLVM 3.9 and the recently released . It also offers a new binary format for OpenCL executables, so they can run on hosts that don’t have a compiler available.

for version 0.14.

There are other projects that automatically generate OpenCL code tailored to multiple hardware targets—for example, the  project, also written in Java. Lift generates a specially tailored intermediate language (IL) that allows OpenCL abstractions to be readily mapped to the behavior of the target hardware. In fact, LLVM works like this; it generates an IL from source code, which is then compiled for a hardware platform. A similar project, , generates GPU-specific code.

LLVM is also in use as a code-generating system for other aspects of machine learning. The project generates LLVM-deployed code that is designed to speed up the various phases of a data analysis framework. Code spends less time shuttling data back and forth between components in the framework and more time doing actual data processing.

The development of new kinds of hardware targets is likely to continue driving the need for code generation systems that can target multiple hardware types. Google’s , for instance, is a custom ASIC devoted to speeding a particular phase of a machine learning job. If hardware types continue to proliferate and become more specialized, having code for them generated automatically will save time and labor.