```
Filename: 308-counter-galois-onion.txt
Title: Counter Galois Onion: A New Proposal for Forward-Secure Relay Cryptography
Authors: Jean Paul Degabriele, Alessandro Melloni, Martijn Stam
Created: 13 Sep 2019
Last-Modified: 13 Sep 2019
Status: Superseded
NOTE: This proposal is superseded by an improved version of the
Counter Galois Onion design based on the authors' forthcoming
paper, "Counter Galois Onion: Fast, Forward-Secure, and
Non-Malleable Onion Encryption for Tor". The improved proposal
will be publicly available once the paper is closer to being
ready for publication.
-nickm
1. Background and Motivation
In Proposal 202, Mathewson expressed the need to update Tor's Relay
cryptography and protect against tagging attacks. Towards this goal he
outlined two possible approaches for constructing an onion encryption
scheme that should be able to withstand tagging attacks. Later, in
Proposal 261, Mathewson proposed a concrete scheme based on the
tweakable wide-block cipher AEZ. The security of Proposal 261 was
analysed in [DS18]. An alternative scheme was suggested in Proposal 295
which combines an instantiation of the PIV construction from [ST14] and
a variant of the GCM-RUP construction from [ADL17]. In this document we
propose yet another scheme, Counter Galois Onion (CGO)
which improves over proposals 261 and 295 in a number of ways. CGO has
a minimalistic design requiring only a block cipher in counter-mode and
a universal hash function. To take advantage of Intel's AES-NI and
PCLMULQDQ instructions we recommend using AES and POLYVAL [GLL18]. In
terms of security, it protects against tagging attacks while
simultaneously providing forward security with respect to end-to-end
authenticity and confidentiality. Furthermore CGO performs better than
proposal 295 in terms of efficiency and its support of "leaky pipes".
1.2 Design Overview
CGO makes due with a universal hash function while simultaneously
satisfying forward security. It employs two distinct types of
encryption, a dynamic encryption scheme DEnc and a static encryption
scheme SEnc. DEnc is used for end-to-end encryption (layer n) and SEnc
is used for the intermediate layers (n-1 to 1). DEnc is a Forward-
Secure Authenticated Encryption scheme for securing end-to-end
communication and SEnc provides the non-malleability for protecting
against tagging attacks. In order to provide forward security, the key
material in DEnc is updated with every encryption whereas in SEnc the
key material is static. To support leaky pipes, in the forward
direction each OR first attempts a partial decryption using DEnc and
if it fails it reverts to decrypting using SEnc. The rest of the
document describes the scheme's operation in terms of the low-level
primitives and we make no further mention of DEnc and SEnc. However,
on an intuitive level it can be helpful to think of:
a) the combinations of E(KSf_I, *) and PH(HSf_I, *) as well as
E(KDf_I, *) and PH(HDf_I, *) as two instances of a tweakable block
cipher,
b) the operation E(Sf_I, <0>) | E(Sf_I, <1>) | E(Sf_I, <2>) | ... as a
PRG with seed Sf_I,
c) and E(JSf_I, <IV>) | E(JSf_I, <IV+1>) | ... | E(JSf_I, <IV+31>) as
counter-mode encryption with <IV> as the initial vector.
2. Preliminaries
2.1. Notation
Symbol Meaning
------ -------
M Plaintext
Sf_I PRG Seed, forward direction, layer I
Sb_I PRG Seed, backward direction, layer I
Cf_I Ciphertext, forward direction, layer I
Cb_I Ciphertext, backward direction, layer I
Tf_I Tag, forward direction, layer I
LTf_I Last Tag, forward direction, layer I
Tb_I Tag, backward direction, layer I
LTb_I Last Tag, backward direction, layer I
Nf_I Nonce, forward direction, layer I
LNf_I Last Nonce, forward direction, layer I
Nb_I Nonce, backward direction, layer I
LNb_I Last Nonce, backward direction, layer I
JSf_I Static Block Cipher Key, forward direction, layer I
JSb_I Static Block Cipher Key, backward direction, layer I
KSf_I Static Block Cipher Key, forward direction, layer I
KSb_I Static Block Cipher Key, backward direction, layer I
KDf_I Dynamic Block Cipher Key, forward direction, layer I
KDb_I Dynamic Block Cipher Key, backward direction, layer I
HSf_I Static Poly-Hash Key, forward direction, layer I
HSb_I Static Poly-Hash Key, backward direction, layer I
HDf_I Dynamic Poly-Hash Key, forward direction, layer I
HDb_I Dynamic Poly-Hash Key, backward direction, layer I
^ Bitwise XOR operator
| Concatenation
&& Logical AND operator
Z[a, b] For a string Z, the substring from byte a to byte b
(indexing starts at 1)
INT(X) Translate string X into an unsigned integer
2.2. Security parameters
POLY_HASH_LEN -- The length of the polynomial hash function's output,
in bytes. For POLYVAL, POLY_HASH_LEN = 16.
PAYLOAD_LEN -- The longest allowable cell payload, in bytes (509).
HASH_KEY_LEN -- The key length used to digest messages in bytes.
For POLYVAL, DIG_KEY_LEN = 16.
BC_KEY_LEN -- The key length, in bytes, of the block cipher used. For
AES we recommend ENC_KEY_LEN = 16.
BC_BLOCK_LEN -- The block length, in bytes, of the block cipher used.
For AES, BC_BLOCK_LEN = 16.
2.3. Primitives
The polynomial hash function is POLYVAL with a HASH_KEY_LEN-byte key. We
write this as PH(H, M) where H is the key and M the message to be hashed.
We use AES with a BC_KEY_LEN-byte key. For AES encryption (resp.,
decryption) we write E(K, X) (resp., D(K, X)) where K is a BC_KEY_LEN-byte
key and X the block to be encrypted (resp., decrypted). For an integer
j, we use <j> to denote the string of length BC_BLOCK_LEN representing
that integer.
2.4 Key derivation and initialisation (replaces Section 5.2.2)
For newer KDF needs, Tor uses the key derivation function HKDF from
RFC5869, instantiated with SHA256. (This is due to a construction
from Krawczyk.) The generated key material is:
K = K_1 | K_2 | K_3 | ...
Where H(x, t) is HMAC_SHA256 with value x and key t
and K_1 = H(m_expand | INT8(1) , KEY_SEED )
and K_(i+1) = H(K_i | m_expand | INT8(i+1) , KEY_SEED )
and m_expand is an arbitrarily chosen value,
and INT8(i) is an octet with the value "i".
In RFC5869's vocabulary, this is HKDF-SHA256 with info == m_expand,
salt == t_key, and IKM == secret_input.
2.4.1. Key derivation using the KDF
When used in the ntor handshake, for each layer I, the key material is
split into the following sequence of contiguous values:
Length Purpose Notation
------ ------- --------
BC_KEY_LEN forward Seed Sf_I
BC_KEY_LEN backward Seed Sb_I
if (I < n) in addition derive the following static keys:
BC_KEY_LEN forward BC Key KSf_I
BC_KEY_LEN backward BC Key KSb_I
BC_KEY_LEN forward CTR Key JSf_I
BC_KEY_LEN backward CTR Key JSb_I
HASH_KEY_LEN forward poly hash key HSf_I
HASH_KEY_LEN backward poly hash key HSb_I
Excess bytes from K are discarded.
2.4.2. Initialisation from Seed
For each layer I compute E(Sf_I, <0>) | E(Sf_I, <1>) | E(Sf_I, <2>) | ...
and parse the output as:
Length Purpose Notation
------ ------- --------
BC_BLOCK_LEN forward Nonce Nf_I
BC_KEY_LEN forward BC Key KDf_I
HASH_KEY_LEN forward poly hash key HDf_I
BC_KEY_LEN new forward Seed Sf'_I
Discard excess bytes, replace Sf_I with Sf'_I, and set LNf_n and LTf_I
to the zero string.
Similarly for the backward direction, compute E(Sb_I, <0>) | E(Sb_I, <1>)
| E(Sb_I, <2>) | ... and parse the output as:
Length Purpose Notation
------ ------- --------
BC_BLOCK_LEN backward Nonce Nb_I
BC_KEY_LEN forward BC Key KDb_I
HASH_KEY_LEN forward poly hash key HDb_I
BC_KEY_LEN new backward Seed Sb'_I
Discard excess bytes, replace Sb_I with Sb'_I, and set LNb_n and LTb_I
to the zero string.
NOTE: For layers n-1 to 1 the values Nf_I, KDf_I, HDf_I, Sf_I and their
backward counterparts are only required in order to support leaky
pipes. If leaky pipes is not required these values can be safely
omitted.
3. Routing relay cells
Let n denote the number of nodes in the circuit. Then encryption layer n
corresponds to the encryption between the OP and the exit/destination
node.
3.1. Forward Direction
The forward direction is the direction that CREATE/CREATE2 cells
are sent.
3.1.1. Routing From the Origin
When an OP sends a relay cell, the cell is produced as follows:
The OP computes E(Sf_n, <0>) | E(Sf_n, <1>) | E(Sf_n, <2>) | ...
and parses the output as
Length Purpose Notation
------ ------- --------
509 encryption pad Z
BC_BLOCK_LEN backward Nonce Nf'_I
BC_KEY_LEN forward BC Key KDf'_I
HASH_KEY_LEN forward poly hash key HDf'_I
BC_KEY_LEN new forward Seed Sf'_I
Excess bytes are discarded. It then computes the n'th layer ciphertext
(Tf_n, Cf_n) as follows:
Cf_n = M ^ Z
X_n = PH(HDf_n, (LNf_n | Cf_n))
Y_n = Nf_n ^ X_n
Tf_n = E(KDf_n, Y_n) ^ X_n
and updates its state by overwriting the old variables with the new
ones.
LNf_n = Nf_n
Nf_n = Nf'_n
KDf_n = KDf'_n
HDf_n = HDf'_n
Sf_n = Sf'_n
It then applies the remaining n-1 layers of encryption to (Tf_n, Cf_n)
as follows:
For I = n-1 to 1:
IV = INT(Tf_{I+1})
Z = E(JSf_I, <IV>) | E(JSf_I, <IV+1>) | ... | E(JSf_I, <IV+31>)
% BC_BLOCK_LEN = 16
Cf_I = Cf_{I+1} ^ Z[1, 509]
X_I = PH(HSf_n, (LTf_{I+1} | Cf_I))
Y_I = Tf_{I+1} ^ X_I
Tf_I = E(KSf_I, Y_I) ^ X_I
LTf_{I+1} = Tf_{I+1}
Upon completion the OP sends (Tf_1, Cf_1) to node 1.
3.1.2. Relaying Forward at Onion Routers
When a forward relay cell (Tf_I, Cf_I) is received by OR I, it decrypts
it performs the following set of steps:
'Forward' relay cell:
X_I = PH(HDf_n, (LNf_I | Cf_I))
Y_I = Tf_I ^ X_I
if (Nf_I == D(KDf_I, Y_I) ^ X_I) % cell recognized and authenticated
compute E(Sf_I, <0>) | E(Sf_I, <1>) | E(Sf_I, <2>) | ... and parse the
output as Z, Nf'_I, KDf'_I, HDf'_I, Sf'_I
M = Cf_n ^ Z
LNf_I = Nf_I
Nf_I = Nf'_I
KDf_I = KDf'_I
HDf_I = HDf'_I
Sf_I = Sf'_I
return M
else if (I == n) % last node, decryption has failed
send DESTROY cell to tear down the circuit
else % decrypt and forward cell
X_I = PH(HSf_I, (LTf_{I+1} | Cf_I))
Y_I = Tf_I ^ X_I
Tf_{I+1} = D(KSf_I, Y_I) ^ X_I
IV = INT(Tf_{I+1})
Z = E(JSf_I, <IV>) | E(JSf_I, <IV+1>) | ... | E(JSf_I, <IV+31>)
% BC_BLOCK_LEN = 16
Cf_{I+1} = Cf_I ^ Z[1, 509]
forward (Tf_{I+1}, Cf_{I+1}) to OR I+1
3.2. Backward Direction
The backward direction is the opposite direction from
CREATE/CREATE2 cells.
3.2.1. Routing From the Exit Node
At OR n encryption proceeds as follows:
It computes E(Sb_n, <0>) | E(Sb_n, <1>) | E(Sb_n, <2>) | ...
and parses the output as
Length Purpose Notation
------ ------- --------
509 encryption pad Z
BC_BLOCK_LEN backward Nonce Nb'_I
BC_KEY_LEN forward BC Key KDb'_I
HASH_KEY_LEN forward poly hash key HDb'_I
BC_KEY_LEN new forward Seed Sb'_I
Excess bytes are discarded. It then computes the ciphertext
(Tf_n, Cf_n) as follows:
Cb_n = M ^ Z
X_n = PH(HDb_n, (LNb_n | Cb_n))
Y_n = Nb_n ^ X_n
Tb_n = E(KDb_n, Y_n) ^ X_n)
and updates its state by overwriting the old variables with the new
ones.
LNb_n = Nb_n
Nb_n = Nb'_n
KDb_n = KDb'_n
HDb_n = HDb'_n
Sb_n = Sb'_n
3.2.2. Relaying Backward at the Onion Routers
At OR I (for I < n) when a ciphertext (Tb_I, Cb_I) in the backward
direction is received it is processed as follows:
X_I = PH(HSb_n, (LTb_{I-1} | Cb_I))
Y_I = Tb_I ^ X_I
Tb_{I-1} = D(KSb_I, Y_I) ^ X_I
IV = INT(Tb_{I-1})
Z = E(JSb_I, <IV>) | E(JSb_I, <IV+1>) | ... | E(JSb_I, <IV+31>)
% BC_BLOCK_LEN = 16
Cb_{I-1} = Cb_I ^ Z[1, 509]
The ciphertext (Tb_I, Cb_I) is then passed along the circuit towards
the OP.
3.2.2. Routing to the Origin
When a ciphertext (Tb_1, Cb_1) arrives at an OP, the OP decrypts it in
two stages. It first reverses the layers from 1 to n-1 as follows:
For I = 1 to n-1:
X_I = PH(HSb_I, (LTb_{I+1} | Cb_I))
Y_I = Tb_I ^ X_I
Tb_{I+1} = E(KSb_I, Y_I) ^ X_I
IV = INT(Tb_{I+1})
Z = E(JSb_I, <IV>) | E(JSb_I, <IV+1>) | ... | E(JSb_I, <IV+31>)
% BC_BLOCK_LEN = 16
Cb_{I+1} = Cb_I ^ Z[1, 509]
Upon completion the n'th layer of encryption is removed as follows:
X_n = PH(HDb_n, (LNb_n | Cb_n))
Y_n = Tb_n ^ X_n
if (Nb_n = D(KDb_n, Y_n) ^ X_n) % authentication is successful
compute E(Sb_n, <0>) | E(Sb_n, <1>) | E(Sb_n, <2>) | and parse the
output as Z, Nb'_n, KDb'_n, HDb'_n, Sb'_n
M = Cb_n ^ Z
LNb_n = Nb_n
Nb_n = Nb'_n
KDb_n = KDb'_n
HDb_n = HDb'_n
Sb_n = Sb'_n
return M
else
send DESTROY cell to tear down the circuit
4. Application connections and stream management
4.1. Amendments to the Relay Cell Format
Within a circuit, the OP and the end node use the contents of
RELAY packets to tunnel end-to-end commands and TCP connections
("Streams") across circuits. End-to-end commands can be initiated
by either edge; streams are initiated by the OP.
The payload of each unencrypted RELAY cell consists of:
Relay command [1 byte]
StreamID [2 bytes]
Length [2 bytes]
Data [PAYLOAD_LEN-21 bytes]
The old Digest field is removed since sufficient information for
authentication is now included in the nonce part of the payload.
The old 'Recognized' field is removed. Instead a cell is recognized
via a partial decryption using the node's dynamic keys - namely the
following steps (already included in Section 3):
Forward direction:
X_I = PH(HDf_n, (LNf_I | Cf_I))
Y_I = Tf_I ^ X_I
if (Nf_I == D(KDf_I, Y_I) ^ X_I) % cell is recognized and authenticated
Backward direction (executed by the OP):
If the OP is aware of the number of layers present in the cell there
is no need to attempt to recognize the cell. Otherwise the OP can, for
each layer, first attempt a partial decryption using the dynamic keys
for that layer as follows:
X_I = PH(HDb_I, (LNb_I | Cb_I))
Y_I = Tb_I ^ X_I
if (Nb_I = D(KDb_I, Y_I) ^ X_I) % cell is recognized and authenticated
The 'Length' field of a relay cell contains the number of bytes
in the relay payload which contain real payload data. The
remainder of the payload is padding bytes.
4.2. Appending the encrypted nonce and dealing with version-homogenic
and version-heterogenic circuits
When a cell is prepared to be routed from the origin (see Section
3.1.1) the encrypted nonce N is appended to the encrypted cell
(occupying the last 16 bytes of the cell). If the cell is prepared to
be sent to a node supporting the new protocol, S is combined with other
sources to generate the layer's nonce. Otherwise, if the node only
supports the old protocol, n is still appended to the encrypted cell
(so that following nodes can still recover their nonce), but a
synchronized nonce (as per the old protocol) is used in CTR-mode.
When a cell is sent along the circuit in the 'backward' direction,
nodes supporting the new protocol always assume that the last 16 bytes
of the input are the nonce used by the previous node, which they
process as per Section 3.2.1. If the previous node also supports the
new protocol, these cells are indeed the nonce. If the previous node
only supports the old protocol, these bytes are either encrypted
padding bytes or encrypted data.
5. Security and Design Rationale
We are currently working on a security proof to better substantiate our
security claims. Below is a short informal summary on the security of
CGO and its design rationale.
5.1. Resistance to crypto-tagging attacks
Protection against crypto-tagging attacks is provided by layers n-1 to
1. This part of the scheme is based on the paradigm from [ADL17] which
has the property that if any single bit of the OR's input is changed
then all of the OR's output will be randomised. Specifically, if
(Tf_I, Cf_I) is travelling in the forward direction and is processed by
an honest node I, a single bit flip to either Tf_I or Cf_I will result
in both Tf_{I+1} and Cf_{I+1} being completely randomised. In addition,
the processing of (Tf_I, Cf_I) includes LTf_{I+1} so that any
modification to (Tf_I, Cf_I) at time j will in turn randomise the value
(Tf_{I+1}, Cf_{I+1}) at any time >= j . Thus once a circuit is tampered
with it is not possible to recover from it at a later stage. This helps
to protect against the standard crypto-tagging attack and variations
thereof (Section 5.2 in [DS18]). A similar argument holds in the
backward direction.
5.2. End-to-end authenticated encryption
Layer n provides end-to-end authenticated encryption. Similar to the
old protocol, this proposal only offers end-to-end authentication
rather than per-hop authentication. However, CGO provides 128-bit
authentication as opposed to the 32-bit authentication provided by the
old protocol. A main observation underpinning the design of CGO is
that the n'th layer does not need to be secure against the release of
unverified plaintext (RUP). RUP security is only needed to protect
against tagging attacks and the n'th layer does not help in that regard
(but the layers below do). Consequently we employ a different scheme at
the n'th layer which is designed to provide forward-secure
authenticated encryption.
5.3 Forward Security
As mentioned in the previous section CGO provides end-to-end
authenticated encryption that is also forward secure. Our notion of
forward security follows the definitions of Bellare and Yee [BY03] for
both confidentiality and authenticity. Forward-secure confidentiality
says that upon corrupting either the sender (or the receiver), the
secrecy of the messages that have already been sent (or received) is
still guaranteed. As for forward-secure authentication, upon corrupting
the sender the authenticity of previously authenticated messages is
still guaranteed (even if they have not yet been received). In order to
achieve forward-secure authenticated encryption, CGO updates the key
material of the n'th layer encryption with every cell that is
processed. In order to support leaky pipes the lower layers also need
to maintain a set of dynamic keys that are used to recognize cells that
are intended for them. This key material is only used for partial
processing, i.e. recognizing the cell, and is only updated if
verification is successful. If the cell is not recognized, the node
reverts to processing the cell with the static key material. If support
for leaky-pipes is not required this extra processing can be omitted.
6. Efficiency Considerations
Although we have not carried out any experiments to verify this, we
expect CGO to perform relatively well in terms of efficiency. Firstly,
it manages to achieve forward security with just a universal hash as
opposed to other proposals which suggested the use of SHA2 or SHA3. In
this respect we recommend using POLYVAL [GLL18], a variant of GHASH
that is more compatible with Intel's PCMULQDQ instruction. Furthermore
CGO admits a certain degree of parallelisability. Supporting leaky
pipes requires an OR to first verify the cell using the the dynamic key
material and if the cell is unrecognised it goes on to process the cell
with the static key material. The important thing to note (see for
instance Section 3.1.2) is that the initial processing of the cell
using the static key material is almost identical to the verification
using the dynamic key material, and the two computations are
independent of each other. As such, although in Section 3 these were
described as being evaluated sequentially, they can in fact be computed
in parallel. In particular the two polynomial hashes could be computed
in parallel by using the new vectorised VPCMULQDQ instruction.
We are currently looking into further optimisations of the scheme as
presented here. One such optimisation is the possibility of removing
KDf_I and KDb_I while retaining forward security. This would further
improve the efficiency of the scheme by reducing the amount of dynamic
key material that needs to be updated with every cell that is processed.
References
[ADL17] Tomer Ashur, Orr Dunkelman, Atul Luykx, "Boosting Authenticated
Encryption Robustness with Minimal Modifications", CRYPTO 2017.
[BY03] Mihir Bellare, Bennett Yee, "Forward-Security in Private-Key
Cryptography", CT-RSA 2003.
[DS18] Jean Paul Degabriele, Martijn Stam, "Untagging Tor: A Formal
Treatment of Onion Encryption", EUROCRYPT 2018.
[GLL18] Shay Gueron, Adam Langley, Yehuda Lindell, "AES-GCM-SIV: Nonce
Misuse-Resistant Authenticated Encryption", RFC 8452, April 2019.
[ST13] Thomas Shrimpton, R. Seth Terashima, "A Modular Framework for
Building Variable-Input Length Tweakable Ciphers", ASIACRYPT 2013.
```