Google is nothing if not ambitious about its machine learning plans. Around this time last year it hardware accelerator designed to run its TensorFlow machine learning framework at world-beating speeds.

Now, the company is for machine learning, courtesy of a . The info shows how Google’s approach will influence future development of machine learning powered by custom silicon.

1. Google’s TPUs, like GPUs, address division of machine learning labor

Machine learning generally happens in a few phases. First you gather data, then you train a model with that data, and eventually you make predictions with that model. The first phase doesn’t typically require specialized hardware. Phase two is where GPUs come into play; in theory you can use a GPU for phase three as well.

With Google’s TPU, phase three is handled by an ASIC, which is a custom piece of silicon designed to run a specific program. ASICs are good at integer calculations, which are needed when making predictions from models, while GPUs are better at floating-point math, which is vital when training models. The idea is to have specialized silicon for each aspect of the machine learning process, so each specific step can go as fast as possible.

 that employ them. Machine learning acceleration is one of the many duties that hardware could take on.

That said, FPGAs aren’t a one-to-one solution for ASICs, and they can’t be dropped into a machine learning pipeline as-is. Also, there aren’t as many programming tools for FPGAs in a machine learning context as there are for GPUs.

It’s likely that the best steps in this direction won’t be toolkits that enable machine learning FPGA programming specifically, but general frameworks that can perform for FPGAs, GPUs, CPUs, or custom silicon alike. Such frameworks would have more to work on if Google offers its TPUs as a cloud resource, but there’s already plenty of targets they can start addressing right away.

4. We’ve barely scratched the surface with custom machine learning silicon

Google claims in its paper the speedups possible with its ASIC could be further bolstered by using GPU-grade memory and memory systems, with results anywhere from 30 to 200 times faster than a conventional CPU/GPU mix. That’s without addressing what could be achieved by, say, , or any of the other tricks being hatched outside of Google.

It ought to be clear by now that custom silicon for machine learning will drive the development of both the hardware and software sides of the equation. It’s also clear Google and others have barely begun exploring what’s possible.