Linux never sleeps. Linus Torvalds is already hard at work pulling together changes for the next version of the kernel (4.11). But with Linux 4.10 now out, three groups of changes are worth paying close attention to because they improve performance and enable feature sets that weren’t possible before on Linux.
Here’s a rundown of those changes to 4.10 and what they likely will mean for you, your cloud providers, and your Linux applications.
1. Virtualized GPUs
One class of hardware that’s always been difficult to emulate in virtual machines is GPUs. Typically, VMs provide their own custom video driver (slow), and graphics calls have to be translated (slow) back and forth between guest and host. The ideal solution would be to run the same graphics driver in a guest that you use on the host itself and have all the needed calls simply relayed back to the GPU.
There’s more here than being able to play, say, Battlefield 1 in a VM. Every resource provided by the GPU, including GPU-accelerated processing provided through libraries like CUDA, would be available to the VM as if it were running on regular, unvirtualized iron.
Direct GVT-G support inside the kernel means third-party products can leverage it without anything more than the “vanilla” kernel. It’s akin to how Docker turned a collection of native Linux features into a hugely successful devops solution; a big part of the success was because those features were available to most modern incarnations of Linux.
2. Better cache control technology
CPUs today are staggeringly fast. What’s slow is when they have to pull data from main memory, so crucial data is cached close to the CPU. That strategy that , with ever-growing cache sizes. But at least some of the burden for cache management falls to the OS as well, and Linux 4.10 introduces some new techniques and tooling.
First is support for Intel Cache Allocation Technology (CAT), a feature available on Haswell-generation chip sets or later. With CAT, space in the L3 (and later L2) cache can be apportioned and reserved for specific tasks, so a given application’s cache isn’t flushed out by other applications. CAT also apparently provides some protection against cache-based —no small consideration given that every nook and cranny of modern computing is being scrutinized as a possible attack vector.
. In systems with multiple sockets and nonuniform memory access (NUMA), threads running on different CPUs can make the case less efficient if they try to modify the same memory segments. The perf c2c tool helps tease out such performance issues, although like CAT it’s based on features provided specifically by Intel processors.
3. Writeback management
“Since the dawn of time, the way Linux synchronizes to disk the data written to memory by processes (aka. background writeback) has sucked,” writes KernelNewbies.org in of how Linux writeback management has been improved in 4.10. From now on, the I/O requests queue is monitored for the latency of its requests, and operations that cause longer latencies—heavy write operations in particular—can be throttled to allow other threads to have a chance.
In roughly the same vein, an experimental feature, off by default, provides a RAID5 writeback cache, so writes across multiple disks in a RAID5 array can be batched together. Another experimental feature, hybrid block polling (also off by default), provides a new way to poll devices that use a lot of throughput. Such polling helps improve performance, but if done too often it makes things worse; the new polling mechanisms are designed to ensure polling improves performance without driving up CPU usage.
This is all likely to have a major payoff with cloud computing instances that are optimized for heavy I/O. Amazon has in this class, and a kernel-level improvement that provides more balance between reads and writes, essentially for free, will be welcomed by them—and their industry rivals. This rising tide can’t help but lift all boats.