Most of the data encrypted on the Internet these days uses an AEAD mode, which consists of a stream cipher (which may be built out of a hash function or a block cipher) and a polynomial authenticator (in this order: Encrypt, then Authenticate).

This very loose description of an AEAD mode can describe both AES-GCM and ChaCha20-Poly1305 (although they are very different under-the-hood).

The AEAD modes used to secure most of the Internet has one glaring weakness: If you reuse a nonce, your messages can be decrypted with simple algebra. (In the case of AES-GCM, but not ChaCha20-Poly1305, this kind of misuse allows attackers to also henceforth forge packets at your whim. I don’t like AES-GCM.)

This is a known problem to cryptographers, and there are a couple of solutions for it.

##### Nonce Misuse-Resistant Authenticated Encryption

The first solution is to use a **nonce misuse-resistant** mode, such as AES-GCM-SIV, which prevents cryptanalytic attacks from a limited number of nonce reuses for different messages. These constructions are *relatively* new and less-studied than their contemporary counterparts, but as time goes on we’re slowly accumulating confidence in their security. However, their performance characteristics are a little disappointing.

AES-GCM-SIV is about 70% of the speed in the encrypt path than AES-GCM, but successful decryption is approximately the same speed (since this additional overhead is used to calculate the authentication tag, which is already available to decryptors).

However, the rejection of forged messages in AES-GCM-SIV requires first decrypting the entire message and then recalculating the tag from the message and the authentication key, then comparing it with the provided key.

In AES-GCM, you reject invalid messages before decryption. AES-GCM-SIV is Authenticate-then-Encrypt, instead of Encrypt-then-Authenticate. This two-pass rejection is an additional performance penalty that isn’t often considered by users of nonce misuse-resistant modes.

##### Extended-Nonce Constructions

The other solution is to use a variant of an existing AEAD mode that accepts an **extended nonce**–one large enough that users can generate it randomly and never risk collisions in a practical setting.

For example, ChaCha20-Poly1305 accepts a 256-bit key and 96-bit nonce (with an internal 32-bit block counter). If you generate the nonce randomly, you should expect a collision with probability 2^-32 after 2^32 messages.

An extended-nonce variant of ChaCha20-Poly1305, called XChaCha20-Poly1305, accepts a 256-bit key and 192-bit nonce (with an internal 32-bit block counter). Consequently, if you generate a 192-bit nonce randomly, your 2^-32 collision probability occurs after 2^80 messages rather than 2^32.

We’ve talked about these modes before (as evidenced by the links to previous blog posts), but we’ve never really dived into **how they work** or **why they were designed the way they are**. So let’s explore this topic and the answers to those questions.

## Extended-Nonce Stream Ciphers

For the purposes of this discussion, I’m mostly going to be describing XChaCha (although congruent arguments, and a more readily available security proof, exist for XSalsa20). You could likely adopt these arguments to support an extended-nonce AEAD that works with NIST algorithms exclusively (e.g. SHA-512 and AES-256-GCM, as purely a toy example).

To understand what’s really happening, we need to first take a look at the underlying cipher (n.b. ChaCha), with special attention to its assumptions and the arguments for its security. This might sound intimidating, but I promise it will make sense without needing a Ph.D.

Once we have a feel for the underlying cipher, we need to explore some of the more obvious ways to extend a nonce, and then explain why they’re a bad idea.

Once we have an intuition for all that, we can look at how XSalsa20 and XChaCha were designed, and argue for the security of their designs.

### ChaCha Now, Y’all

ChaCha is a variant of the Salsa20 cipher, which is an eSTREAM finalist created by Daniel J Bernstein.

The original ChaCha cipher accepted 128-bit or 256-bit keys and 64-bit nonces, and had a 64-bit internal counter. However, the variant used by the IETF uses 96-bit nonces and 32-bit internal counters to be consistent with other AEAD modes used in TLS.

ChaCha, like Salsa20 before it, is a stream cipher whose core is an ARX-based pseudorandom function (although it’s often called a hash function) whose output is mixed with the plaintext, using XOR. This paper explains its deviation from the original Salsa20 design.

```
// Encryption
const keystream = chacha20.stream(nonce, key, counter);
const ciphertext = utils.xor(keystream, plaintext);
// Decryption
const keystream = chacha20.stream(nonce, key, counter);
const plaintext = utils.xor(keystream, ciphertext);
```

The internal state of ChaCha is 512 bits: 128 bits of constants, 256 bits are allocated for the key, and the remaining 128 bits is divided up between the nonce and counter.

The ChaCha internal state looks like this, visually:

```
0 1 2 3
cccccccc cccccccc cccccccc cccccccc
4 5 6 7
kkkkkkkk kkkkkkkk kkkkkkkk kkkkkkkk
8 9 10 11
kkkkkkkk kkkkkkkk kkkkkkkk kkkkkkkk
12 13 14 15
nnnnnnnn nnnnnnnn nnnnnnnn nnnnnnnn
```

Once the state is initialized, it copies this state into temporary variables, runs through several rounds (ChaCha20 is 20 rounds, ChaCha12 is 12 rounds, etc.), and then *adds the initial values to the final values*. We’re not super interested in what these rounds do or why they’re designed the way they are (TL;DR it aims to efficiently maximize bit diffusion), but we *are* interested in that final addition.

ChaCha rounds are broken up into what are called quarter rounds, which it applies to rows and then diagonals. These quarter rounds consist entirely of 32-bit additions, bitwise rotations, and XOR operations.

The final addition of the initial state (constants, secret key, nonce/counter) to the terminal state of the last round prevents cryptanalysts from inverting the computation (see section 4.2 of the Salsa20 paper). Although the constants and nonce/counter are public inputs, the key is not, and the key takes up half of the 512-bit state.

**Main takeaway:** The security of both Salsa20 and ChaCha depend on attackers not being able to discover half of the internal state; otherwise they could invert the computation.

One other thing to notice: Those constants at the top of the internal state? That’s the string `expand 32-byte k`

converted into four 32-bit words. A lot of non-symmetric values would have worked here; this was simply chosen to be a Nothing Up My Sleeve number.

That’s all we need to take away from ChaCha’s design. I highly recommend reading the papers for each cipher discussed in this section; they’re incredibly accessible.

### The Obvious But Wrong Way to Extend a Nonce

Earlier, we established that the internal state of ChaCha looks like this:

```
0 1 2 3
cccccccc cccccccc cccccccc cccccccc
4 5 6 7
kkkkkkkk kkkkkkkk kkkkkkkk kkkkkkkk
8 9 10 11
kkkkkkkk kkkkkkkk kkkkkkkk kkkkkkkk
12 13 14 15
nnnnnnnn nnnnnnnn nnnnnnnn nnnnnnnn
```

It might seem tempting to just overwrite the constants with the additional nonce bytes (or XOR them with the existing constants).

```
0 1 2 3
nnnnnnnn nnnnnnnn nnnnnnnn nnnnnnnn
4 5 6 7
kkkkkkkk kkkkkkkk kkkkkkkk kkkkkkkk
8 9 10 11
kkkkkkkk kkkkkkkk kkkkkkkk kkkkkkkk
12 13 14 15
nnnnnnnn nnnnnnnn nnnnnnnn nnnnnnnn
```

This is a bad idea. If those constants were somehow initialized to all `0`

bits, and your key was all `0`

bits, and your nonce/counter was all `0`

bits, ChaCha would output all `0`

bits.

Here’s a fun thought experiment: What kind of fun things would result if attackers can change these values?

In the standard AEAD_ChaCha20Poly1305 construction, the Poly1305 key is derived from the block with counter = 0, and encryption starts with counter=1. If you could coerce ChaCha to have an all-zero state, you could force a trivially guessable Poly1305 key and thereby allow chosen-ciphertext attacks.

Of course, this is setting aside the fact that an all-zero key is itself trivial to guess! Don’t use that for your secret key.

The purpose of these constants (and the reason their actual value isn’t *super important* to the security of ChaCha beyond being an asymmetric pattern of ASCII bytes) is simply to ensure the operations are always diffusing bits, even with very dumb inputs.

With all of that in mind, while it’s tempting to try to extend the nonce by updating the internal ChaCha state with the extra nonce bytes, it’s not a great idea to clobber the ChaCha constants.

There’s something else we can try, which is precisely what XSalsa20 and XChaCha both do.

### What XSalsa20 and XChaCha Do

The extended-nonce constructions specified by Daniel J Bernstein in the XSalsa20 paper and this IETF RFC draft both do (effectively) the same thing:

- Take your “extended nonce” of 192 bits, and split it into a 128-bit segment and a 64-bit segment.
- Run the first 128 bits of the nonce (as the full nonce/counter input) through the underlying PRF with the key
*without*the final addition. (This PRF variant is called HSalsa20 and HChaCha respectively.) - Use the output of the HSalsa20/HChaCha step as the key to encrypt data with the remaining 64 bits of the nonce.

This is a pretty trivial construction, and it costs slightly less than encrypting one additional block of plaintext (which is very fast for both algorithms). Whether or not the X-variant is secure depends entirely on two things: If the normal stream cipher is secure, and if the H-variant of the underlying PRF is secure.

Let’s focus on the difference between HChaCha and ChaCha.

#### HChaCha

ChaCha outputs blocks of 512 bits which are combined with 512 bits of plaintext (using XOR) to encrypt. The final output is the initial added to the result of N (typically 20) rounds of the ChaCha round function.

HChaCha skips the final addition, but only outputs 256 bits. The specific bytes used to produce the final HChaCha output correspond to the public inputs to HChaCha:

```
0 1 2 3
cccccccc cccccccc cccccccc cccccccc
12 13 14 15
nnnnnnnn nnnnnnnn nnnnnnnn nnnnnnnn
```

Because only half the internal state is returned, and it’s the half that corresponds to the public values (constants, nonce), you can get the same half-block security guarantee without needing that final addition. (The final addition is useful for ensuring ChaCha can encrypt large amounts of plaintext efficiently, but here we’re just hashing a subkey out of the extension to the nonce.)

The security proof sketched out in the XSalsa20 paper describes all of this, but in more mathematical notation than plain English.

#### XChaCha

When you assemble all of the pieces, the security argument for XChaCha looks like this:

- An attacker cannot recover half of the internal state from ChaCha outputs (due to the final addition which prevents the computation from being inverted).
- An attacker cannot recover half of the internal state from HChaCha outputs (due to only returning a public computation of these inputs, and half of the internal state not being revealed).
- The underlying transformations used in ChaCha are an efficient way to ensure high bit diffusion in reduced rounds, and the number of rounds is very conservative for the security target of the cipher.
- The internal state of the PRF is 512 bits, which targets a 256-bit security level against birthday collisions.
- The best known attacks only reduce the cost of ChaCha7 to about 2^230, which is computationally infeasible. ChaCha20 remains unaffected by this cryptanalysis.

Note: I won’t call what I wrote here a security proof. My blog is nowhere near formal enough for this sort of analysis, after all!

### Closing Thoughts

If you’ve read this far, I hope that you understand:

- (Loosely) How ChaCha works
- Why the extended-nonce variant (XChaCha) doesn’t just clobber the constant part of the ChaCha internal state but instead uses HChaCha
- Why HChaCha is designed the way it is
- Why ChaCha is secure (and the assumptions it makes)
- Why HChaCha is secure (and the assumptions it makes)
- Why the composition XChaCha is secure

If anything isn’t immediately clear, I consider that a bug that needs to be fixed.

Cryptography can be a very difficult subject, but that doesn’t mean it has to be arcane and unapproachable.

Additionally, the next time someone links you to Latacora’s Cryptographic Right Answers page, I hope you can appreciate how much complexity and nuance was glossed over by the authors to arrive at a generally right answer for 99% of users and situations.

(It turns out their recommendation isn’t random-key robust, which matters if you’re designing the next OPAQUE or shadowsocks, if you’re curious about that last 1%.)

Header art by Scruff Kerfluff.

## 19 replies on “Understanding Extended-Nonce Constructions”

In the obvious extension example you have 192 nonce bits, and 256 key bits, that should make the probability that all of them are zero 1 in 2^448. That is much less likely than for instance an attacker guessing the entire key in one shot. Why should we care about an event that, given correctly set up key and nonce generation, will almost definitely never happen?

448 bits is the total probability space, but it’s not evenly distributed. 256 bits are a key, which will be reused much more often than the nonce (which MUST be unique per message). This is before we even get to the birthday collision problem.

For all practical purposes, these 256 key bits can be considered “fixed” for a given application, and you only have 192 bits of variance. The probability for a collision of these 192 bits colliding (assuming they’re the output of a secure random function) is as follows:

* 1 in 2^32 chance: After 2^80 nonces for this given key.

* 1 in 2 chance: After 2^96 nonces for this given key.

For applications with more than one key, the above analysis is a satisfactory lower bound, but be very careful how you use XChaCha20. If you combine it with a polynomial MAC (GMAC, Poly1305), you might be introducing the risk of partitioning oracle attacks, etc. https://eprint.iacr.org/2020/1491

The nonce collision chance is the same in both schemes. Your argument against the simple scheme is solely that it may produce a zero block, and however you count it, that is improbable.

“The nonce collision chance is the same in both schemes.” What do you mean by “both schemes”?

The nonce collision chance is a function of the discrete probability space of the random nonces and the number of previously generated values. Additionally, a nonce collision is only relevant under the same key.

Okay, let’s take a step back. We have “The Obvious But Wrong Way to Extend a Nonce” (TOBWWtEaN) juxtaposed against XChaCha. The whole argument for the necessity of the more complicated XChaCha construction hinges on TOBWWtEaN being flawed in some manner. You only point out one flaw in TOBWWtEaN, namely that given a zero key, and a zero nonce it produces a zero output. I don’t believe that has any practical relevance, and I’d go as far as arguing that it is not a flaw at all, as the probability of this happening is too low.

The problem is less “you can give it all 0s and get all 0s” and more “the attacker can control the entire ChaCha internal state which invalidates the assumptions of the ChaCha security proof”.

You mean a MITM changing the nonce to see if something is accepted with the wrong nonce and gain information that way. Seems like mainly a problem with the authentication if that is possible.

In any case, there is no security proof for ChaCha, there is a proof that the X variants retain the security of normal ChaCha and Salsa, but no proof for the actual security of ChaCha and Salsa. TOBWWtEaN does not come with this extension proof, but the result is the same, we don’t have a proof for the security of any of the algorithms.

The purpose of this blog post was to explain something very abstract in terms that non-cryptographers can grok. Emphasizing comprehension often comes at a loss of precision. If you want a precise academic discussion of these algorithms, I recommend talking to the author of the algorithms.

Sorry for taking this out on you, the reasoning behind the construction has just been annoying me for a while, seeing you post about it got my hopes up for a better explanation.

If you want to understand the rationale behind the construction, you really have to dig into the Salsa20 design.

The constants used in Salsa20 and ChaCha are NUMS values. One is used in every Quarter Round. If you let an attacker control these values, the most obvious thing to do is set them to all zero values and then you can weaken the diffusion of the Quarter Rounds. The extreme example of this is (all-zero constants, all-zero key, all-zero nonce) -> all-zero keystream. But even a less-extreme setup would call the security of this construction into question.

1/4 of the internal state is constant, 1/4 is attacker-controllable, and 1/2 is secret. The fact that half the initial state is secret is what makes Salsa20 and ChaCha not invertible. The fact that 1/4 of the state is a constant, NUMS, non-invertible value guarantees a minimum diffusion of bits. Letting an attacker control 1/2 the state (instead of 1/4) is risky; allowing them to further zero out these constants allows the attacker to defeat the minimum diffusion built into the designs and assumptions of these stream ciphers.

Now, it might turn out that, outside of the all-zero use case, this increased attacker capability isn’t sufficient to break either cipher (provided the key is a randomly generated 256-bit value)! This isn’t well-studied.

However, the HSalsa20/HChaCha construction side-steps these concerns entirely, at the overhead cost slightly less than encrypting and additional 64 bytes of plaintext (because HSalsa20/HChaCha is another round without the final addition). And there *is* a security proof for this (as you pointed out).

In short: The constructions as designed are easy to reason about, and the “obvious” alternative technique is less boring from a cryptanalysis perspective. Boring crypto is better than interesting crypto.

Thanks for this good article and other articles at your Site.

At the moment I learn a bit more details about crypto based on a textbook[1] and youtube videos[2]

I am coding a bit crypto code for get a better understanding about the whole stuff.

I am just an OPS guy and not a DEV nor CRYPTO guy -> So this is only about gaining more in deep knowlege in what I am using on day to day basis.

1. After I read your Post I see that XChaCha can be used with always new random nonces with low risk of reuse. But I ask my self if it won’t use to much entropy for my CryptSecurePRNG (CSPRNG) from my TrueRNG (TRNG) that I would be in trouble latter (Especially in Servers or Network Devices I believe there is less entropy available). But maybe I got the concept wrong, and it ks not about every 512bit key stream block but for each message. Since XChaCha will use ChaCha with 32bit Counter in the Backend.

2. I read in “#2.3.1 XChaCha Draf”[3], XChaCha is using (IETF) so the first 4 Bytes have to be Null Bytes. I ask myself why they only use 192 bits of nonce then and not 224 bits. So 224-128 would leave 96 bits and not only 64 bits. Maybe XChaCha is more to the original DJB approach aligned.

Would be great if you could clarify this Points this in your article <- "Hey you ask for such comments, so don't complain":-D

PS

I will implement my own extended nonce sheme for AES-GCM, just for learning, no real worl usage ðŸ™‚

[1]: https://www.crypto-textbook.com

[2]: https://www.youtube.com/channel/UC1usFRN4LCMcfIV7UjHNuQg

[3]: https://tools.ietf.org/html/draft-irtf-cfrg-xchacha-03

The XChaCha20 spec based on the IETF draft is based on the XSalsa20 construction based on Salsa20, which uses a 64/64 split. The IETF ChaCha20 uses a 96/32 split, so it adds the 4 NUL bytes so it still keeps the 192-bit nonce.

[…] previously written about extended-nonce constructions with a focus on […]

Mostly regarding ChaCha20, why wouldn’t someone just use the entire nonce/counter space as a single counter?

i.e. instead of

cccccccc cccccccc cccccccc cccccccc

kkkkkkkk kkkkkkkk kkkkkkkk kkkkkkkk

kkkkkkkk kkkkkkkk kkkkkkkk kkkkkkkk

bbbbbbbb nnnnnnnn nnnnnnnn nnnnnnnn

where b is the counter, why not do

cccccccc cccccccc cccccccc cccccccc

kkkkkkkk kkkkkkkk kkkkkkkk kkkkkkkk

kkkkkkkk kkkkkkkk kkkkkkkk kkkkkkkk

bbbbbbbb bbbbbbbb bbbbbbbb bbbbbbbb

?

You certainly can. Cryptographers generally design ciphers to be used for many messages, not one very long message.

In that example, you can still use the 128-bit counter as a nonce? Sorry if this is question is a little dumb

Nonce, counter, who cares? It occupies the same space of the block state and is a public input. You can carve those 128 bits however you’d like.

96 bits of nonce and 32 bits of counter is the IETF variant.

64 bits of nonce and 64 bits of counter is what DJB proposed initially.

You merely suggested 0 bits of nonce and 128 bits of counter.

The security of the construction is unaffected, but by making 100% of those 128 bits a counter, you must only ever use each key once. But your message can be 2^128 blocks in length (roughly 2^134 bytes).

If that’s not what you meant to communicate, then you didn’t convey your intent clearly to me.

What do you think about the AEGIS family of AEAD ciphers? They have 256-bit nonces, with the downside of being based on AES.

I’ve heard some cryptographers love it. I’ve heard some don’t trust it. I’m waiting until both camps have a chance to publish their research before I decide if my initial impression (“wow! such fast, very efficiency” in doge meme format) is valid.