Filename: 355-revisiting-pq.md
Title: Options for postquantum circuit extension handshakes
Author: Nick Mathewson
Created: 6 March 2025
Status: Informational

Introduction

We ought to have a Tor network that can resist a cryptographically relevant quantum computer (CRQC).

There are two environments that we may want to support.

In a transitional environment, we would like to resist a future CRQC: we want the property that if/when somebody builds a CRQC in the future, they can't decrypt traffic intercepted today. This means that we need to defend against an attacker who can break traditional public key algorithms in an offline attack, but we don't need to worry about MITM attacks against the handshake. We can still rely on protocols that use traditional public key signatures, as long as we don't care about their being forged in the future.

In a next-generation environment, we want to resist a present CRQC: we need to resist an attacker who can forge signatures for RSA and EdDSA, and attempt MITM attacks.

Previously, proposal 269 (by Schanck, Whyte, Zhang, Mathewson, Lovecruft, and Schwabe) and proposal 270 (by Lovecruft and Schwabe) defined handshakes for circuit extension in a transitional environment.

Since those proposals, several important developments have occurred:

NIST has standardized a PQ key-encapsulation mechanism (ML-KEM), and two PQ signature mechanisms (ML-DSA and SLH-DSA).¹
Some TLS implementations have begun to support hybrid ML-KEM for transitional environments, in coordination with an in-progress IETF draft.
Tor's circuit extension handshake has migrated to ntor v3, which allows the client to send non-forward-secure "extra data" in its CREATE2 and EXTEND2 messages.
We've specified walking onions, a possible future direction for Tor with improved scalability, but different handshake requirements.

In this document, we'll look at various alternatives for PQ circuit extension. We'll compare them with one another for security and cost.

We won't develop any of these handshakes fully here; this proposal is about exploring our alternatives. All of these handshakes assume a hybrid protocol based on ntor, even if they don't say so explicitly.

Preliminaries: algorithms and performance

All sizes below are in bytes.

Alg.	Variant	"Level"	Pubkey len	Ciphertext/signature len
x25519			32	32
ML-KEM	512	1	800	768
	768	3	1184	1088
	1024	5	1568	1568

ed25519			32	64
ML-DSA	44	2	1312	2420
	65	3	1952	3309
	87	5	2592	4627
SLH-DSA	128s	1	32	7856
	192s	3	48	16224
	256s	5	64	49856

We'll refer to some of these values symbolically:

KEM_PK_LEN: the length of our KEM public key.
KEM_MSG_LEN: The length of a KEM ciphertext.
DSA_PK_LEN: The length of our signing key
DSA_SIG_LEN: The length of our signatures

Most benchmarks from below are from openssl speed in openssl (commit 1eb5ffcdc8a270b6d49b6b6f5097ebe61f66f648), using my 12th Gen Intel i9-12900K. They aren't necessarily super-accurate.

For ML-DSA and SLH-DSA benchmarks, I've used boringssl (commit 2f81008a7dded7053f1adf164a57671efbcbf0e5), since openssl speed doesn't yet support them. Boringssl only supports the variants below.

All times are in microseconds.

Alg.	Variant	keygen	encrypt/sign	decrypt/validate
x25519		19	42	21
ML-KEM	512	15	10	15
	768	22	12	20
	1024	35	17	26
ed25519			19	63
ML-DSA	65	116	567	113
SLH-DSA	128s	22936	175438	174

(This is the last point in this proposal we'll see SLH-DSA, for reasons that should be pretty obvious from its size and speed characteristics.)

Transitional handshakes (PQ-TR)

Proposals 269 and 270, (which we'll call collectively "PQ-TR") build on the ntor handshake by having the client send the relay a single-use KEM public key as part of its handshake; the relay replies with an encapsulated secret key. (The secret key, and all new messages, are included in the inputs to the digest and KDF functions.)

The total handshake size here increases by KEM_PK_LEN + KEM_MSG_LEN.

To extend this approach to work with "extra data" in the handshake, we can continue to encrypt the extensions as we do now. But if we do so, then the CREATE extensions will be decryptable by a CRQC in the future. That vulnerability may be acceptable: Current extensions are only used for negotiating protocol extensions and their parameters.

This handshake works unmodified with walking onions (though the "extra data" issue presents a problem, as we'll discuss below).

Next-generation handshakes, KEM-only

For security in the next-generation model, there are some possible handshakes that only need a KEM. We'll describe two here.

With these handshakes, the relay has to publish a KEM public key (KP_circ_kem) in its router descriptor, and the key needs to be re-published in the relay's microdescriptor.

Thus, both of these handshakes will increase microdescriptor size by KEM_PK_LEN.

(Additionally, security in this model will require relays to have PQ identity keys and PQ descriptor-signing keys. This change is out of scope for this document.)

These KEM-only handshakes require the client to know the relay's KP_circ_kem before the handshake begins: as such, they can't be used with walking onions without adding an extra round trip (but see the appendix).

Single KEM, no forward secrecy (PQ-KEM-1)

This handshake (call it "PQ-KEM-1") builds on ntor again: Here the client additionally sends an encapsulated key, encrypted with the relay's KEM public key KP_circ_kem. The relay and the client use the encapsulated key as an additional input to derive the circuit's symmetric keys.

The total handshake size here increases by only KEM_MSG_LEN.

With this handshake, the client can also use the encapsulated key to derive the encryption key for the "extra data" in the handshake: this will actually protect that data from decryption. (As in ntor-v3, the "extra data" has no forward secrecy.)

Note however that with this handshake, we lose forward secrecy against an adversary who compromises KP_circ_kem. This represents a loss of security as compared to our current model.

Two KEMs, achieving forward secrecy (PQ-KEM-2)

This handshake (call it "PQ-KEM-2") continues to build on ntor. This time the client sends a key encapsulated with KP_circ_kem, and a single-use KEM public key of its own. The relay's response includes a key encapsulated with the client's single-use KEM public key. All encapsulated keys are used when deriving the circuit's symmetric keys.

Here, assuming that the client discards its single-use KEM key immediately after the handshake, the symmetric keys get forward secrecy against a compromise of KP_circ_kem. (As in ntor-v3, the "extra data" still has no forward secrecy.)

The total handshake size here increases by KEM_MSG_LEN * 2 + KEM_PK_LEN.

Next-generation handshakes, KEM and signature (PQ-KEM-DSA-1,2)

In these handshakes, instead of giving the relay a medium-term KEM key, we give it a medium-term handshake signing key KP_circ_sign. Now the client sends only a single-use KEM public key in its handshake. The relay's response includes an encapsulated key, along with a signature of the handshake's authentication digest.

We can either put this signing key in the microdescriptor (call this variant "PQ-KEM-DSA-1") or put a hash of the signing key into the microdescriptor (call this variant "PQ-KEM-DSA-2"). If we put only a hash of the signing key into microdescriptor, the relay's response needs to also include KP_circ_sign.

The PQ-KEM-DSA-1 variant adds DSA_PK_LEN to microdescriptors, and KEM_PK_LEN + KEM_MSG_LEN + DSA_SIG_LEN to handshakes.

The PQ-KEM-DSA-2 variant adds 32 bytes to microdescriptors, and KEM_PK_LEN + KEM_MSG_LEN + DSA_SIG_LEN + DSA_PK_LEN to handshakes.

These variants don't have a good way to support PQ-secure encryption for "extra data" as part of the handshake. Other than that, they do work with walking onions "out of the box".

Comparison and overview

Here we'll compare the variants above. Values are approximate and don't include "extra data", or any symmetric operations. Sizes are in bytes; time is in microseconds. "MD size" is total size used for the handshake in the microdescriptors; "HS size" is total size for the handshake.² I'll assume we're using ML-KEM-768 and ML-DSA-65 for a security level of 3.

(We could conceivably get away with ML-DSA-44 if we believe that breaking such a key would take longer than the key's lifespan.)

When listing vulnerabilities, we say:

Conv if they can be done by a conventional adversary,
PQ if they can be done by an adversary with a CRQC.

The listed attacks are:

Decrypt (can recover the circuit symmetric keys and decrypt all the traffic)
MITM (can impersonate a relay)
Decrypt-XD (can decrypt "extra data")
NoFS (no forward secrecy: can recover the circuit symmetric keys and decrypt all the traffic if they steal a medium-term key before it is deleted.)
NoFS-XD: (no forward secrecy on "extra data": can decrypt the extra data if they steal a medium-term key before it is deleted.)

Variant	MD size	Hs Size	Time	Vulnerable
ntor-v3	32	180	137	Conv:NoFs-XD, PQ:everything!
PQ-TR	32	2452	191	Conv:NoFs-XD, PQ:MITM,Decrypt-XD
PQ-KEM-1	1216	1268	159	Conv:NoFs-XD, PQ:NoFS ³
PQ-KEM-2	1216	3360	223	Conv:NoFs-XD, PQ:NoFS-XD ³
PQ-KEM-DSA-1	1984	5761	734	none ⁴
PQ-KEM-DSA-2	64	7713	734	none ⁴

Conclusions

For our near-term transitional needs, PQ-TR approaches as advocated previously may still be fine so long as we don't mind a postquantum adversary being able to decrypt all our extra data fields in the future.
For our next-generation needs, something like PQ-KEM-2 seems best, if we can afford it.
Any handshake variant that adds a full key to the microdesciptors will erase some or all of our gains from removing TAP keys and implementing happy families. We can reduce ongoing microdesc costs here a little by increasing key rotation time, but that wouldn't help the initial bootstrap costs. The performance costs for PQ-DSA-KEM-2 are still likely too high to make it worthwhile, however, even though it has shorter microdescs.
Some of these designs have implications for what we can safely do with our "extra data" extensions. We should document these constraints as possibly desirable, and make sure that our current extensions obey them.
For walking onions, more analysis is needed, but it appears quite likely that some variation on these approaches is possible, though at best there is an extra round-trip for every circuit built.
To analyze these variants further, it would help to know:
- How frequently we need to build circuits from scratch, vs using established circuits or circuit stubs.
- How many messages we send over the average circuit after it is built. (Or equivalently: what fraction of bandwidth is used for circuit handshakes.)
- What fraction of relays' CPU is currently used for circuit handshakes.

Appendix: round trips and walking-onions support

There are two reasons we might run into a need for extra round trips.

First, the current walking onions design depends on being able to learn the next relay's medium-term public keys as part of the handshake's reply message: As such, it already has compatibility issues with "extra data" as used in ntor-v3. Specifically, if that "extra data" needs to be encrypted, we will need to exchange the extra data after the handshake is done, or we will need to fetch the next relay's medium-term public key before we start the handshake.

Both of these operations entail an extra round trip for the handshake.

Second, if we chose to use the PQ-KEM-DSA-* handshakes, we would need an extra round trip to support them as well (walking onions or not) since they don't support "extra data".

We would naively think, then, that if our current circuit construction requires 3 round trips, with these changes we would need 6. But in the case of exchanging "extra data" after the handshake, there is an optimization we can use if we are willing to allow the client to send messages before receiving a reply to the extra data: for each hop except the last, we send our "extra data" along with the EXTEND2 message for the next hop. Now, instead of doubling our round trips, we only add one.

For example, in building a 3 hop circuit:

Client->R1: "CREATE"
R1->Client: "CREATED2"
Client->R1: "Extra data", "EXTEND2 (for R2)"
R1->Client: "Extra data reply", "EXTENDED2 (from R2)."
Client->R2: "Extra data", "EXTEND2 (for R3)"
R2->Client: "Extra data reply", "EXTENDED2 (from R3)."
Client->R3: "Extra data"
R3->Client: "Extra data reply"

This optimization doesn't work for fetching the next relay's key before the handshake, since the client can't send the handshake message until the key is received: as such it applies to the PQ-KEM-DSA-* variants but not to PQ-KEM-*.

The NIST standards are based on, but not the same as, CRYSTALS-Kyber, CRYSTALS-Dilithium, and SPHINCS+. Each standards document explains the differences. ↩
For all PQ handshake variants, the handshake messages are too large to fit into a single RELAY cell. We will have to use a mechanism for fragmented cells. ↩
The PK-KEM-* handshakes are incompatible with walking onions, unless we add an extra round-trip. ↩ ↩2
The PQ-KEM-DSA-* handshakes are not vulnerable to NoFS-XD, but only because they don't support extra data at all. ↩ ↩2

Tor design proposals