Categories
Cryptography

Why AES-GCM Sucks

If you’re reading this wondering if you should stop using AES-GCM in some standard protocol (TLS 1.3), the short answer is “No, you’re fine”.

I specialize in secure implementations of cryptography, and my years of experience in this field have led me to dislike AES-GCM.

This post is about why I dislike AES-GCM’s design, not “why AES-GCM is insecure and should be avoided”. AES-GCM is still miles above what most developers reach for when they want to encrypt (e.g. ECB mode or CBC mode). If you want a detailed comparison, read this.

To be clear: This is solely my opinion and not representative of any company or academic institution.

What is AES-GCM?

AES-GCM is an authenticated encryption mode that uses the AES block cipher in counter mode with a polynomial MAC based on Galois field multiplication.

In order to explain why AES-GCM sucks, I have to first explain what I dislike about the AES block cipher. Then, I can describe why I’m filled with sadness every time I see the AES-GCM construction used.

What is AES?

The Advanced Encryption Standard (AES) is a specific subset of a block cipher called Rijndael.

Rijndael’s design is based on a substitution-permutation network, which broke tradition from many block ciphers of its era (including its predecessor, DES) in not using a Feistel network.

AES only includes three flavors of Rijndael: AES-128, AES-192, and AES-256. The difference between these flavors is the size of the key and the number of rounds used, but–and this is often overlooked–not the block size.

As a block cipher, AES always operates on 128-bit (16 byte) blocks of plaintext, regardless of the key size.

This is generally considered acceptable because AES is a secure pseudorandom permutation (PRP), which means that every possible plaintext block maps directly to one ciphertext block, and thus birthday collisions are not possible. (A pseudorandom function (PRF), conversely, does have birthday bound problems.)

Why AES Sucks

Facepaw
Art by Khia.

Side-Channels

The biggest reason why AES sucks is that its design uses a lookup table (called an S-Box) indexed by secret data, which is inherently vulnerable to cache-timing attacks (PDF).

There are workarounds for this AES vulnerability, but they either require hardware acceleration (AES-NI) or a technique called bitslicing.

The short of it is: With AES, you’re either using hardware acceleration, or you have to choose between performance and security. You cannot get fast, constant-time AES without hardware support.

Block Size

AES-128 is considered by experts to have a security level of 128 bits.

Similarly, AES-192 gets certified at 192-bit security, and AES-256 gets 256-bit security.

However, the AES block size is only 128 bits!

That might not sound like a big deal, but it severely limits the constructions you can create out of AES.

Consider the case of AES-CBC, where the output of each block of encryption is combined with the next block of plaintext (using XOR). This is typically used with a random 128-bit block (called the initialization vector, or IV) for the first block.

This means you expect a collision after encrypting 2^{64} (at 50% probability) blocks.

When you start getting collisions, you can break CBC mode, as this video demonstrates:

Their attack on CBC mode completely hand-waves away the block size detail that the demo depends on, but so long as you keep that in mind, their attack is valid.

This is significantly smaller than the 2^{128} you expect from AES.

Post-Quantum Security?

Cryptographers estimate that AES-128 will have a post-quantum security level of 64 bits, AES-192 will have a post-quantum security level of 96 bits, and AES-256 will have a post-quantum security level of 128 bits.

This is because Grover’s quantum search algorithm can search n unsorted items in \sqrt{n} time, which can be used to reduce the total number of possible secrets from 2^{n} to \sqrt{2^{n}}. This cuts the security level, expressed in bits, in half.

But remember, even AES-256 operates on 128-bit blocks.

Consequently, for AES-256, there should be approximately 2^{128} (plaintext, key) pairs that produce any given ciphertext block.

Furthermore, there will be many keys that, for a constant plaintext block, will produce the same ciphertext block despite being a different key entirely. (n.b. This doesn’t mean for all plaintext/ciphertext block pairings, just some arbitrary pairing.)

Concrete example: Encrypting a plaintext block consisting of sixteen NUL bytes will yield a specific 128-bit ciphertext exactly once for each given AES-128 key. However, there are 2^{128} times as many AES-256 keys as there are possible plaintext/ciphertexts. Keep this in mind for AES-GCM.

This means it’s conceivable to accidentally construct a protocol that, despite using AES-256 safely, has a post-quantum security level on par with AES-128, which is only 64 bits.

This would not be nearly as much of a problem if AES’s block size was 256 bits.

Real-World Example: Signal

The Signal messaging app is the state-of-the-art for private communications. If you were previously using PGP and email, you should use Signal instead.

Signal aims to provide private communications (text messaging, voice calls) between two mobile devices, piggybacking on your pre-existing contacts list.

Part of their operational requirements is that they must be user-friendly and secure on a wide range of Android devices, stretching all the way back to Android 4.4.

The Signal Protocol uses AES-CBC + HMAC-SHA256 for message encryption. Each message is encrypted with a different AES key (due to the Double Ratchet), which limits the practical blast radius of a cache-timing attack and makes practical exploitation difficult (since you can’t effectively replay decryption in order to leak bits about the key).

Thus, Signal’s message encryption is still secure even in the presence of vulnerable AES implementations.

Soatok is HYPED!!!
Hooray for well-engineered protocols managing to actually protect users.
Art by Swizz.

However, the storage service in the Signal App uses AES-GCM, and this key has to be reused in order for the encrypted storage to operate.

This means, for older phones without dedicated hardware support for AES (i.e. low-priced phones from 2013, which Signal aims to support), the risk of cache-timing attacks is still present.

Soatok angrily grasping computer monitor
This is unacceptable!

What this means is, a malicious app that can flush the CPU cache and measure timing with sufficient precision can siphon the AES-GCM key used by Signal to encrypt your storage without ever violating the security boundaries enforced by the Android operating system.

As a result of the security boundaries never being crossed, these kind of side-channel attacks would likely evade forensic analysis, and would therefore be of interest to the malware developers working for nation states.

Of course, if you’re on newer hardware (i.e. Qualcomm Snapdragon 835), you have hardware-accelerated AES available, so it’s probably a moot point.

Why AES-GCM Sucks Even More

AES-GCM is an authenticated encryption mode that also supports additional authenticated data. Cryptographers call these modes AEAD.

AEAD modes are more flexible than simple block ciphers. Generally, your encryption API accepts the following:

  1. The plaintext message.
  2. The encryption key.
  3. A nonce (n_{once}: A number that must only be used once).
  4. Optional additional data which will be authenticated but not encrypted.

The output of an AEAD function is both the ciphertext and an authentication tag, which is necessary (along with the key and nonce, and optional additional data) to decrypt the plaintext.

Cryptographers almost universally recommend using AEAD modes for symmetric-key data encryption.

That being said, AES-GCM is possibly my least favorite AEAD, and I’ve got good reasons to dislike it beyond simply, “It uses AES”.

Grrrrrr
The deeper you look into AES-GCM’s design, the harder you will feel this sticker.

GHASH Brittleness

The way AES-GCM is initialized is stupid: You encrypt an all-zero block with your AES key (in ECB mode) and store it in a variable called H. This value is used for authenticating all messages authenticated under that AES key, rather than for a given (key, nonce) pair.

Diagram describing Galois/Counter Mode, taken from Wikipedia.

This is often sold as an advantage: Reusing H allows for better performance. However, it makes GCM brittle: Reusing a nonce allows an attacker to recover H and then forge messages forever. This is called the “forbidden attack”, and led to real world practical breaks.

Let’s contrast AES-GCM with the other AEAD mode supported by TLS: ChaCha20-Poly1305, or ChaPoly for short.

ChaPoly uses one-time message authentication keys (derived from each key/nonce pair). If you manage to leak a Poly1305 key, the impact is limited to the messages encrypted under that (ChaCha20 key, nonce) pair.

While that’s still bad, it isn’t “decrypt all messages under that key forever” bad like with AES-GCM.

Short Nonces

Although the AES block size is 16 bytes, AES-GCM nonces are only 12 bytes. The latter 4 bytes are dedicated to an internal counter, which is used with AES in Counter Mode to actually encrypt/decrypt messages.

(Yes, you can use arbitrary length nonces with AES-GCM, but if you use nonces longer than 12 bytes, they get hashed into 12 bytes anyway, so it’s not a detail most people should concern themselves with.)

If you ask a cryptographer, “How much can I encrypt safely with AES-GCM?” you’ll get two different answers.

  1. Message Length Limit: AES-GCM can be used to encrypt messages up to 2^{36} - 32 bytes long, under a given (key, nonce) pair.
  2. Number of Messages Limit: If you generate your nonces randomly, you have a 50% chance of a nonce collision after 2^{48} messages.

    However, 50% isn’t conservative enough for most systems, so the safety margin is usually much lower. Cryptographers generally set the key wear-out of AES-GCM at 2^{32} random nonces, which represents a collision probability of one in 4 billion.

These limits are acceptable for session keys for encryption-in-transit, but they impose serious operational limits on application-layer encryption with long-term keys.

Random Key Robustness

Before the advent of AEAD modes, cryptographers used to combine block cipher modes of operation (e.g. AES-CBC, AES-CTR) with a separate message authentication code algorithm (e.g. HMAC, CBC-MAC).

You had to be careful in how you composed your protocol, lest you invite Cryptographic Doom into your life. A lot of developers screwed this up. Standardized AEAD modes promised to make life easier.

Many developers gained their intuition for authenticated encryption modes from protocols like Signal’s (which combines AES-CBC with HMAC-SHA256), and would expect AES-GCM to be a drop-in replacement.

Unfortunately, GMAC doesn’t offer the same security benefits as HMAC: Finding a different (ciphertext, HMAC key) pair that produces the same authentication tag is a hard problem, due to HMAC’s reliance on cryptographic hash functions. This makes HMAC-based constructions “message committing”, which instills Random Key Robustness.

Critically, AES-GCM doesn’t have this property. You can calculate a random (ciphertext, key) pair that collides with a given authentication tag very easily.

This fact prohibits AES-GCM from being considered for use with OPAQUE (which requires RKR), one of the upcoming password-authenticated key exchange algorithms. (Read more about them here.)

Better-Designed Algorithms

You might be thinking, “Okay random furry, if you hate AES-GCM so much, what would you propose we use instead?”

Soatok Explains it All
I’m glad you asked!

XChaCha20-Poly1305

For encrypting messages under a long-term key, you can’t really beat XChaCha20-Poly1305.

  • ChaCha is a stream cipher based on a 512-bit ARX hash function in counter mode. ChaCha doesn’t use S-Boxes. It’s fast and constant-time without hardware acceleration.
  • ChaCha20 is ChaCha with 20 rounds.
  • XChaCha nonces are 24 bytes, which allows you to generate them randomly and not worry about a birthday collision until about 2^{80} messages (for the same collision probability as AES-GCM).
  • Poly1305 uses different 256-bit key for each (nonce, key) pair and is easier to implement in constant-time than AES-GCM.
  • XChaCha20-Poly1305 uses the first 16 bytes of the nonce and the 256-bit key to generate a distinct subkey, and then employs the standard ChaCha20-Poly1305 construction used in TLS today.

For application-layer cryptography, XChaCha20-Poly1305 contains most of the properties you’d want from an authenticated mode.

However, like AES-GCM (and all other Polynomial MACs I’ve heard of), it is not message committing.

The Gimli Permutation

For lightweight cryptography (n.b. important for IoT), the Gimli permutation (e.g. employed in libhydrogen) is an attractive option.

Gimli is a Round 2 candidate in NIST’s Lightweight Cryptography project. The Gimli permutation offers a lot of applications: a hash function, message authentication, encryption, etc.

Critically, it’s possible to construct a message-committing protocol out of Gimli that will hit a lot of the performance goals important to embedded systems.

Closing Remarks

Despite my personal disdain for AES-GCM, if you’re using it as intended by cryptographers, it’s good enough.

Don’t throw AES-GCM out just because of my opinions. It’s very likely the best option you have.

Although I personally dislike AES and GCM, I’m still deeply appreciative of the brilliance and ingenuity that went into both designs.

My desire is for the industry to improve upon AES and GCM in future cipher designs so we can protect more people, from a wider range of threats, in more diverse protocols, at a cheaper CPU/memory/time cost.

We wouldn’t have a secure modern Internet without the work of Vincent Rijmen, Joan Daemen, John Viega, David A. McGrew, and the countless other cryptographers and security researchers who made AES-GCM possible.

Soatok heart sticker

By Soatok

Security engineer with a fursona. Ask me about dholes or Diffie-Hellman!

17 replies on “Why AES-GCM Sucks”

Thank you for this very interesting post.

What does “it is not message committing.” mean? And what about the post quantum resistance regarding XChaCha20Poly?

Like

I’m going to agree with you about your basic point, and most of the detail. But you said `You cannot get fast, constant-time AES without hardware support.’ This is… complicated.

If you’re counting the traditional leaky table-lookup implementation as `fast’ (which, by modern standards, it isn’t), then I don’t think this is actually true — mostly. By which I mean that a bitslice AES implementation which does eight blocks at a time using SIMD instructions (ARM NEON, say) has slightly better throughput than the traditional table-lookup code. On the other hand, the latency is terrible, and CBC-mode encryption will perform very badly because in CBC encryption (but not decryption) you can’ properly start encrypting one block before you’ve finished encrypting the previous one. (Oh, Signal. That was a poor choice.) Key setup will be slower, too. And if you don’t have SIIMD instructions then it won’t go fast; but even 2013-era phones should have NEON.

Liked by 1 person

> but even 2013-era phones should have NEON.

I wasn’t talking about high-end smartphones from 2013 on. I’m talking about the lower-end devices more accessible to the majority of (especially poor) people outside the USA and Europe.

Check out the Adiantum paper. https://eprint.iacr.org/2018/720.pdf

“On the ARM architecture, the ARMv8 Cryptography Extensions include instructions that make AES and GF(2^128) multiplications much more efficient. However, smartphones designed for developing markets often use lower-end processors which don’t support these extensions, and as a result there is no existing SPRP construction which performs acceptably on them.”

Like

NEON is not the same as the ARMv8 crypto extensions. For one thing, it’s significantly older. It’s part of the ARMv7 specification, introduced in 2009, so four yours old by our the time our putative 2013 devices were made. And NEON is useful for lots of stuff — image processing, audio and video codecs — that phones do all the time, so even though it was optional in v7 (it’s mandatory in v8), it’s very unlikely that a 2013-era Android device would lack NEON.

The Adiantum paper you cite supports my argument. Indeed, it’s comparing NEON-accelerated ChaCha with (trad, leaky) AES, and (unsurprisingly) deciding that ChaCha is faster. I did say that table-lookup AES isn’t fast by modern standards. ChaCha20 /is/. Of course it’s a better choice than AES on devices which lack hardware AES. On x86 processors, ChaCha20 is now faster than AES, even though the latter has dedicated machine instructions, just because the SIMD registers are so wide, and ChaCha20 is so good at taking advantage of instruction-level parallelism. But on the very specific question of the relative performance of traditional table-lookup and bitslice implementations of AES on a 2013 Android smartphone, I’m saying that bitslice has better(*) throughput — it has much(!) worse latency; it’s still much slower than ChaCha20, but it’s a little faster than table-lookup.

Let’s have some numbers. It turns out that my Samsung Galaxy S3 (2012, ARM Cortex A9) still actually boots. Its kernel is too old to have perf_event_open(2) so I can’t give cycle counts, but: trad table lookup runs at 29.0MB/s (128-bit keys) or 21.4 MB/s (256-bit keys); NEON bitslice runs at 29.2MB/s (128-bit) or 21.04 MB/s (256-bit). By contrast, a halfheartedly optimized ChaCha20 on the same device runs at 73 MB/s.

On my slightly newer Fairphone 2 (2015, Qualcomm Snapdragon 801), trad table-lookup runs at 30.7 cy//byte (68.7 MB/s, 128-bit keys) or 42.6 cy/byte (50.7 MB/s, 256-bit keys); NEON bitslice is 32.3 cy/byte (66.9 MB/s, 128-bit keys) or 45.2 cy/byte (47.9 MB/s, 256-bit keys). ChaCha20 is 14.5 cy/byte (146 MB/s) on this device.

Hmm. So that was mostly a little slower. So I’ll concede that I was wrong in detail; but the speeds are very similar, so I wasn’t totally wide of the mark (and I did say it was close). If the speed of traditional table-lookup AES is acceptable for you, and you’re not doing CBC encryption, then NEON bitslice will be nearly as fast, and won’t leak your keys. I will grant that that code wasn’t at all easy to write.

As an aside, you mention the AES 128-bit key size as being inadequate in a post-quantum world. That’s true; but it’s more significant that it’s inadequate in the classical-computing world we live in if we consider attacks which target a single session out of a widely-deployed multi-user system (like, err, TLS, say). See Bernstein’s 2005 paper `Understanding Brute Force’ for a bit more on this, or his 2015 blog post https://blog.cr.yp.to/20151120-batchattacks.html for a later perspective.`

Liked by 1 person

Since you mentioned libhydrogen, do you know of any formal analysis or proof (or at least explanation of how it works and why it’s secure) of the custom NORX/Gimli construction it uses? I wasn’t able to find any! Thank you!

Like

I have a question regarding GHASH Brittleness: You say that the H value is used “for authenticating all messages authenticated under that AES key, rather than for a given (key, nonce) pair.”
This makes it sound as if the MAC is independent of the IV (nonce). But the last step of calculating the tag is adding Enc(k,IV||0…01) (adding in GF(128) or XOR if you consider it a bit string).
Am I overlooking something here?

Like

H is used for the GHASH. But the last step of calculating the Auth Tag the HF (as you called it) is added (XORed) to the tag. I genuinely don’t understand how you are able to do this when you use the same key (so the same H) but a different nonce.

Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s