Spectre, Meltdown and Hana
At the end of 1967, the foundations were laid for what we now call "out-of-order execution", parallelization at the level of the main processor and thus better utilization of the available capacities and ultimately higher throughput of the overall system.
While the security vulnerabilities do not call these achievements into question, the necessary safeguards initially limit their practical use under certain conditions, but most importantly, there is a potential impact on the overall throughput of the system.
I would like to approach the topic in four steps: First, what are Spectre and Meltdown, and who is affected? Second: What can we do? Third, what has Suse done so far, and what do we plan to do in the coming months? And fourth: Why don't some manufacturers provide concrete figures on the potential loss of system performance?
1. who does Spectre and Meltdown affect?
The extension of the "out-of-order execution" discussed above resulted in CPUs today being able to execute alternative program parts before it is even clear which of the options is to be executed.
During this "speculative" execution, gaps occur in which the normally existing memory protection mechanisms between the operating system kernel and processes or between processes per se are no longer effective and unauthorized accesses can therefore occur (see Table 1).
Since the error is in the hardware, the software level (i.e. the operating system and the application programs) does not even notice this access. Every current powerful CPU family is affected by one or more of the possible errors (see Table 2).
2. what can we do?
The most obvious solution - replacing all CPUs - is obviously neither economically sensible nor practically possible: All current server CPUs (including other architectures and manufacturers than those mentioned above) are affected.
A complete solution to the security problems is not possible in software alone (i.e., on the basis of the operating system or the application layer). So the only option is to "plug" the gap as well as possible - in English, you will therefore mainly find the term "mitigation", not "fix" - through a combination of changes to the CPU itself (if they can be changed by so-called microcode) as well as changes to the operating system.
These changes are aimed at either preventing CPU-level "speculation" altogether or at least preventing another process/application/virtual machine from accessing extraneous data in main memory (see Table 3).
In Table 3, the somewhat unclear wording "and/or" is certainly noticeable. This wording points to one of the fundamental challenges in communication by software and especially operating system manufacturers like Suse:
Especially for Spectre variant 2, no simple and unambiguous solution can be found because a different solution path must be taken depending on the CPU manufacturer, CPU type and CPU generation. We will come back to this in Section 4.
3. what has Suse done and what is Suse planning?
At the beginning of January, Suse released a first patch set for all currently supported Suse Linux Enterprise versions, including of course Suse Linux Enterprise Server for SAP Applications, which limits the risk of Spectre (variant 1), prepares the operating system kernel for the described approaches of Spectre (variant 2) and fixes Meltdown (variant 3) at least for the x86-64 architecture.
Based on feedback from our hardware partners and customers, we have been offering an improved approach to Spectre (variant 2) for the x86-64 architecture, the so-called "retpolines", in a second wave of kernel updates since mid-February, and are improving performance in particular by monitoring memory accesses more finely.
This work of constantly improving security as well as fine-tuning access and restrictions will take months; some security experts even expect it to take years.
4. no concrete figures?
Each application, and even the specific use of an application and the corresponding data, contributes to whether and how security patches will have an impact (see Table 4).
From what has been said so far, it almost goes without saying that no general statement can be made about the effects of security updates on the overall throughput of a system; because each statement is only valid for a specific combination of the following four factors:
- Type of application
- Data on which the application operates
- CPU/hardware architecture
- CPU type and CPU generation, including the availability of microcode updates, especially regarding Spectre variant 2 (see above, section 2).
The closest thing that can be said is that applications that perform very long calculations in the CPU without accessing memory are the least affected. I hope it is clear from these explanations why it is more serious from Suse's point of view not to publish figures about the influence of the current security updates on performance, even if such an influence seems unavoidable depending on the circumstances.
Nevertheless, we advise all customers to apply operating system and microcode updates as promptly and regularly as possible as part of their change management processes to minimize the risks of this newly found - yet surprisingly old - family of vulnerabilities.