Categories
Cryptography Software Security

Canonicalization Attacks Against MACs and Signatures

Canonicalization Attacks occur when a protocol that feeds data into a hash function used in a Message Authentication Code (MAC) or Digital Signature calculation fails to ensure some property that’s expected of the overall protocol.

The textbook example of a canonicalization attack is the length-extension attack against hash functions such as MD5–which famously broke the security of Flickr’s API signatures.

But there’s a more interesting attack to think about, which affects the design of security token/envelope formats (PASETO, DSSE, etc.) and comes up often when folks try to extend basic notions of authenticated encryption (AE) to include additional authenticated (but unencrypted) data (thus yielding an AEAD mode).

Let’s start with a basic AE definition, then extend it to AEAD poorly, then break our extension. Afterwards, we can think about strategies for doing it better.

Turning CTR+HMAC into AEAD

Signal uses AES-CBC then HMAC-SHA2 to encrypt messages between mobile devices.

We often refer to this shape as “CBC+HMAC” (although this is a few typos away from being confused with a very different idea called CBC-MAC).

When CBC+HMAC is used with the AES block cipher with 256-bit keys and HMAC-SHA2, it becomes AES-256-CBC+HMAC-SHA256. What a mouthful!

Yuck! Who let a cryptography nerd name anything?
(Art by Lynx vs Jackalope)

In modern designs, AES-CTR is often preferable to AES-CBC, since you don’t have to deal with padding (or padding oracles).

Most systems these days prefer GCM over CBC+HMAC or CTR+HMAC. I don’t like AES-GCM (especially if your use-case is “support platforms without hardware acceleration”), but it’s hardly the worst choice for most applications. AES-GCM is a dedicated AEAD mode, while CBC+HMAC and CTR+HMAC merely provide AE.

Why Does Additional Data Matter?

The main purpose of Additional Data (the AD in AEAD) is to bind an encrypted payload (ciphertext + authentication tag) to a given context. This is extremely helpful in mitigating Confused Deputy attacks.

Critically, this additional data is not encrypted. (At least, on this level; it’s probably wise to communicate over HTTPS!)

Naive CTR+HMAC to AEAD Extension

In a normal CTR+HMAC definition, your algorithm looks something like this:

  1. Generate a random nonce equal to the block size of your block cipher. (16 bytes for AES.)
  2. Encrypt your message with AES-CTR, using the given key and IV.
  3. Calculate the HMAC-SHA2 output of the IV followed by the ciphertext from step 2.
  4. Return IV, Ciphertext, MAC.

Decryption basically runs steps 3 and 2 in reverse: Recalculate the MAC (in constant-time!), decrypt ciphertext, return plaintext.

The most obvious way to extend this design to support additional authenticated data is to append it to the ciphertext.

This yields the following updated protocol:

  1. Generate a random nonce equal to the block size of your block cipher. (16 bytes for AES.)
  2. Encrypt your message with AES-CTR, using the given key and nonce.
  3. Calculate the HMAC-SHA2 output of the IV followed by the ciphertext from step 2, then the additional authenticated data.
  4. Return IV, Ciphertext, MAC.

Suffice to say, this is not a secure construction.

The Canonicalization Attack

Let’s say you built this extended protocol to encrypt a payload that looks like a URI string, but wanted to bind the token to a given browser session, so you use the HTTP User-Agent header as the AAD.

When you generate a token, you might produce the following:

const crypto = require('crypto');

function splitKey(key) {
    let hmac;
    hmac = crypto.createHmac('sha256', key);
    hmac.update('encrypt-key');
    let Ek = hmac.digest();

    hmac = crypto.createHmac('sha256', key);
    hmac.update('hmac-key');
    let Ak = hmac.digest();
    return [Ek, Ak];
}

function encryptWithContext(plaintext, aad, key) {
    let [encKey, authKey] = splitKey(key);
    console.log(encKey, authKey);
    let nonce = crypto.randomBytes(16);
    const aes = crypto.createCipheriv('aes-256-ctr', encKey, nonce);
    const ciphertext = aes.update(plaintext);
    aes.final();
    // Pay attention to this part:
    const hmac = crypto.createHmac('sha256', authKey);
    hmac.update(nonce);
    hmac.update(ciphertext);
    hmac.update(aad);
    return [nonce, ciphertext, hmac.digest()];
}

let plaintext = [
    'expires=1630223780',
    'access_group=1234',
    'subject=auth-v2.example.com',
    'restrictions=on'
].join('&');

// expires=1630223780&access_group=1234&subject=auth-v2.example.com&restrictions=on

// const key = crypto.randomBytes(32);
let [nonce, ciphertext, tag] = encryptWithContext(plaintext, userAgent, key);

So here’s the clever attack: If you can shift bytes from the payload into the prefix of your User-Agent string, they’ll produce the same HMAC tag.

Attackers can truncate as much of the payload as they want by prepending it to the User-Agent included in their HTTP request.

You can even turn this:

 expires=1630223780
&access_group=1234
&subject=auth-v2.example.com
&restrictions=on

…into this:

 expires=1630223780
&access_group=1234
&subject=auth-v2.example.com

…without invalidating the existing authentication tag–just by ensuring that the last 16 bytes of ciphertext are prepended to your User-Agent and removed from the payload.

More broadly, any time you have a multi-part message being fed into a hash function, if you aren’t careful with how you feed it into the hash function, you can induce trivial collisions.

See also: Iota’s Kerl hash function.

This is obviously true, because hash functions are deterministic: The same input will always produce the same output. If you can control one or more parts of a multi-part message, you can collide the input–thereby creating a collision in the output.

This can affect any protocol that depends on hash functions, but most obviously, HMAC and Digital Signature algorithms are in scope.

But what does “being careful” look like? Let’s look at a safe example.

Pre-Authentication Encoding (PAE)

Earlier I had mentioned PASETO and DSSE. They both have this notion of a “PAE” algorithm which aims to prevent canonicalization attacks.

PASETO’s definiton for PAE is as follows:

function LE64(n) {
    var str = '';
    for (var i = 0; i < 8; ++i) {
        if (i === 7) {
            // Clear the MSB for interoperability
            n &= 127;
        }
        str += String.fromCharCode(n & 255);
        n = n >>> 8;
    }
    return str;
}

function PAE(pieces) {
    if (!Array.isArray(pieces)) {
        throw TypeError('Expected an array.');
    }
    var count = pieces.length;
    var output = LE64(count);
    for (var i = 0; i < count; i++) {
        output += LE64(pieces[i].length);
        /*** Soatok Note: 
         This JS pseudocode incorrectly assumes strings, rather than buffers.
         It's only meant to illustrate the idea.
         In real implementations, don't join Buffers with +.
         ***/
        output += pieces[i];
    }
    return output;
}

What this does (with all lengths as 64-bit unsigned integers, serialized as 8 bytes):

  1. Prepend the number of parts being hashed.
  2. For each part, first prefix its length, then its value.

This is an obvious mitigation for canonicalization attacks:

  • If you feed in a different number of pieces, the count (the first 8 bytes) will differ.
  • If you try to move data from one piece to another, you’ll produce different lengths for both pieces, which will not yield an input collision to the hash function.

However, it’s important that both mechanism are in play to guarantee security:

  • Without the length prefixes, we’re no different than the CTR+HMAC extension we defined above.
  • Without the count prefix, it’s possible to drop pieces and then include a dummy “length” in the payload of others to create an input collision.

DSSE Leaves Me Dizzy

It should come as no surprise that I find DSSE’s definition of PAE to be somewhat bizarre.

PAE(type, body) = "DSSEv1" + SP + LEN(type) + SP + type + SP + LEN(body) + SP + body
+               = concatenation
SP              = ASCII space [0x20]
"DSSEv1"        = ASCII [0x44, 0x53, 0x53, 0x45, 0x76, 0x31]
LEN(s)          = ASCII decimal encoding of the byte length of s, with no leading zeros

The only thing that saves them from canonicalization attacks is that the number of pieces is constant.

If the number of pieces was variable (e.g. if the KEYID was optionally included in the signature, but they forgot to always include a hard-coded 0 length if it was absent), you could defeat their flavor of PAE by constructing two different messages that produce the same hash in the digital signature algorithm.

This is because the number of pieces isn’t included in the DSSE definition. (If they ever support a variable number of components, and fail to include the count in the signature, they’ll be vulnerable.)

Amusingly, the rationale page for DSSE using PAE states:

  • Why use PAE?
    • Because we need an unambiguous way of serializing two fields, payloadType and payload. PAE is already documented and good enough. No need to reinvent the wheel.

…Yet, they didn’t actually use the “already documented and good enough” definition of PAE from PASETO.

Let’s not use DSSE’s definition.
(Art by Lynx vs Jackalope)

Fixing AES-CTR+HMAC with PAE

This is pretty straightforward patch:

  function encryptWithContext(plaintext, aad, key) {
      let [encKey, authKey] = splitKey(key);
      console.log(encKey, authKey);
      let nonce = crypto.randomBytes(16);
      const aes = crypto.createCipheriv('aes-256-ctr', encKey, nonce);
      const ciphertext = aes.update(plaintext);
      aes.final();
      // Pay attention to this part:
      const hmac = crypto.createHmac('sha256', authKey);
-     hmac.update(nonce);
-     hmac.update(ciphertext);
-     hmac.update(aad);
+     hmac.update(PAE([nonce, ciphertext, aad]));
      return [nonce, ciphertext, hmac.digest()];
  }

The only conceivable way to attack this design is to aim for an integer overflow, which will require sending at least 2^63 bytes–at which point, you’re more likely to DDoS the target than succeed.

Wrapping Up

Canonicalization Attacks broadly aren’t well-understood or widely appreciated risks with cryptography protocol design outside of specialist circles (although many application security folks are at least aware of specific instances, i.e. Length Extension).

Part of the reason for this lack of knowledge transfer is that all of the AEAD modes defend against it by design, and most artisanal authenticated encryption constructions don’t bother with additional authenticated data, and most home-made cryptography protocols don’t even authenticate their ciphertexts correctly, and …

You get the point, I hope. There’s unaddressed concerns all the way down. Expecting people who aren’t specialized experts in this specific field to get all of them right is frankly unreasonable. In practice, outside of cryptography, canonicalization either doesn’t matter or there’s something else wrong that’s far more urgent.

By Soatok

Security engineer with a fursona. Ask me about dholes or Diffie-Hellman!

5 replies on “Canonicalization Attacks Against MACs and Signatures”

Could you explain how an attack would work when the number of pieces is not included in the hash?

I would think that the only property you need for a PAE function is that it is injective–which in practice here means that, given the output of the function, you can reconstruct its input. Which you obviously can with just length prefixes per piece.

Adding the number of pieces at the front would mitigate length extension attacks, but that shouldn’t be relevant with HMAC.

Like

The mitigation strategy (PAE) treats what it’s being fed into as a black box. The only assumption it makes about the target is that its output is collision resistant. It could be HMAC, it could be a naked hash fed into another signature algorithm, etc.

What if they aren’t using HMAC, but SHA256(key || message)? If so, then you’ve reopened the door. DSSE makes no assertions or restrictions, at all, about what algorithms are being used with it, so you can’t actually rule that possibility out.

Like

I’m not sure I follow.

First of all, I think that we should distinguish use for MACs and use for signatures.

In the case of MACs, sure, if you add the count in the front, then you could use a plain hash function and be safe from length extension attacks, essentially using PAE + Hash as a sort-of cheap HMAC replacement. But really, you probably should be using HMAC there. Or a keyed hash that doesn’t allow length extension in the first place.

In the case of signatures, though, I don’t see how you could exploit this!? All you could do is “extend” the hash. But you can do that anyway, because there are no secrets involved, all you do is hash a bunch of public data using a well-known hash function, so, of course, you can take that same public data, append some more data, and hash that. But that will change the hash result, thus any signature over the original hash result won’t apply. To attack signatures, you’d have to be able to find a second preimage, and length extention attacks won’t get you there!?

Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s