This blog post is co-authored with Andy Polyakov from the OpenSSL core team.
Advanced Encryption Standard (AES) is the mostly widely used symmetric block cipher today. Its use is mandatory in several US government and industry applications. Among the commercial standards AES is a part of SSL/TLS, IPSec, 802.11i, SSH and numerous other security products used throughout the world.
Ever since the inclusion of AES as a federal standard via FIPS PUB 197 and even before that when it was known as Rijndael, there has been several attempts to cryptanalyze it. However most of these attacks have not gone beyond the academic papers they were written in. One of them worth mentioning at this point is the key recovery attacks in AES-192/AES-256. A second angle to this is attacks on the AES implementations via side-channels. A side-channel attack exploits information which is leaked through physical channels such power-consumption, noise or timing behaviour. In order to observe such a behaviour the attacker usually needs to have some kind of direct or semi-direct control over the implementation.
There has been some interest about side-channel attacks in the way OpenSSL implements AES. I suppose OpenSSL is chosen mainly because its the most popular cross-platform cryptographic library used on the internet. Most Linux/Unix web servers use it, along with tons of closed source products on all platforms. The earliest one dates back to 2005, and the recent ones being about cross-VM cache-timing attacks on OpenSSL AES implementation described here and here. These ones are more alarming, mainly because with applications/data moving into the cloud, recovering AES keys from a cloud-based virtual machine via a side-channel attack could mean complete failure for the code.
After doing some research on how AES is implemented in OpenSSL there are several interesting facts which have emerged, so stay tuned.
What are cache-timing attacks?
Cache memory is random access memory (RAM) that microprocessor can access more quickly than it can access regular RAM. As the microprocessor processes data, it looks first in the cache memory and if it finds the data there (from a previous reading of data), it does not have to do the more time-consuming reading of data from larger memory. Just like all other resources, cache is shared among running processes for the efficiency and economy. This may be dangerous from a cryptographic point of view, as it opens up a covert channel, which allows malicious process to monitor the use of these caches and possibly indirectly recover information about the input data, by carefully noting some timing information about own cache access.
A particular kind of attack called the flush+reload attack works by  forcing data in the victim process out of the cache, waiting a bit, then measuring the time it takes to access the data. If the victim process accesses the data while the spy process is waiting, it will get put back into the cache, and the spy process’s access to the data will be fast. If the victim process doesn’t access the data, it will stay out of the cache, and the spy process’s access will be slow. So, by measuring the access time, the spy can tell whether or not the victim accessed the data during the wait interval. All this under premise that data is shared between victim and adversary.
Note that we are not talking about secret key being shared, but effectively public data, specifically lookup tables discussed in next paragraph.
Is AES implementation in OpenSSL vulnerable to cache-timing attacks?
Any cipher relying heavily on S-boxes may be vulnerable to cache-timing attacks. The processor optimizes execution by loading these S-boxes into the cache so that concurrent accesses/lookups, will not need loading them from the main memory. Textbook implementations of these ciphers do not use constant-time lookups when accessing the data from the S-boxes and worse each lookup depends on portion of the secret encryption key. AES-128, as per the standard, requires 10 rounds, each round involves 16 S-box lookups.
The Rijndael designers proposed a method which results in fast software implementations. The core idea is to merge S-box lookup with another AES operation by switching to larger pre-computed tables. There still are 16 table lookups per round. This 16 are customarily segmented to 4 split tables, so that there are 4 lookups per table and round. Each table consists of 256 32-bit entries. These are referred to as T-tables, and in the case of the current research, the way these are loaded into the cache leads to timing-leakages. The leakage as described in the paper  is quantified by probability of a cache line not being accessed as result of block operation. As each lookup table, be it S-box or pre-computed T-table, consists of 256 entries, probability is (1-n/256)^m, where n is number of table elements accommodated in single cache line, and m is number of references to given table per block operation. Smaller probability is, harder to mount the attack.
Arenât cache-timing attacks local, how is virtualized environment affected?
Enter KSM (Kernel SamePage Merging). KSM enables the kernel to examine two or more already running programs and compare their memory. If any memory regions or pages are identical, KSM reduces multiple identical memory pages to a single page. This page is then marked copy on write. If the contents of the page is modified by a guest virtual machine virtual machine, a new page is created for that guest virtual machine. This means that cross-VM cache-timing attacks would now be possible. You can stop KSM or modifiy its behaviour. Some details are available here.
You did not answer my original question, is AES in OpenSSL affected?
In short, no. But not to settle for easy answers, letâs have a close look at how AES in OpenSSL operates. In fact there are several implementations of AES in OpenSSL codebase and each one of them may or may not be chosen based on specific run-time conditions. Note: All of the above discussions are in about OpenSSL version 1.0.1.
- Intel Advanced Encryption Standard New Instructions or AES-NI, is an extension to the x86 instruction set for intel and AMD machines used since 2008. Intel processors from Westmere onwards and AMD processors from Bulldozer onwards have support for this. The purpose of AES-NI is to allow AES to be performed by dedicated circuitry, no cache is involved here, and hence itâs immune to cache-timing attacks. OpenSSL uses AES-NI by default, unless itâs disabled on purpose. Some hypervisors mask the AES-NI capability bit, which is customary done to make sure that the guests can be freely migrated within heterogeneous cluster/farm. In those cases OpenSSL will resort to other implementations in its codebase.
- If AES-NI is not available, OpenSSL will either use Vector Permutation AES (VPAES) or  Bit-sliced AES (BSAES), provided the SSSE3 instruction set extension is available. SSSE3 was first introduced in 2006, so there is a fair chance that this will be available in most computers used. Both of these techniques avoid data- and key-dependent branches and memory references, and therefore are immune to known timing attacks. VPAES is used for CBC encrypt, ECB and “obscure” modes like OFB, CFB, while BSAES is used for CBC decrypt, CTR and XTS.
- In the end, if your processor does not support AES-NI or SSSE3, OpenSSL falls back to integer-only assembly code. Unlike widely used T-table implementations, this code path uses a single 256-bytes S-box. This means that probability of a cache line not being accessed as result of block operation would be (1-64/256)^160=1e-20. âWould beâ means that actual probability is even less, in fact zero, because S-box is fully prefetched, and even in every round.
For completeness sake it should be noted that OpenSSL does include reference C implementation which has no mitigations to cache-timing attacks. This is a platform-independent fall-back code that is used on platforms with no assembly modules, as well as in cases when assembler fails for some reason. On side note, OpenSSL maintains really minimal assembler requirement for AES-NI and SSSE3, in fact the code can be assembled on Fedora 1, even though support for these instructions was added later.
Bottom line is that if you are using a Linux distribution which comes with OpenSSL binaries, there is a very good chance that the packagers have taken pain to ensure that the reference C implementation is not compiled in. (Same thing would happen if you download OpenSSL source code and compile it)
It’s not clear from the research paper how the researchers were able to conduct the side channel attack. All evidence suggests that they ended up using the standard reference C implementation of AES instead of assembly modules which have mitigations in place. Â The researchers were contacted but did not respond to this point. Â Anyone using an OpenSSL binary they built themselves using the defaults, or precompiled as part of an Linux distribution should not be vulnerable to these attacks.