202-improved-relay-crypto - Tor design proposals

Filename: 202-improved-relay-crypto.txt
Title: Two improved relay encryption protocols for Tor cells
Author: Nick Mathewson
Created: 19 Jun 2012
Status: Meta

Note: This is an important development step in improving our relay
  crypto, but it doesn't actually say how to do this.


Overview:

   This document describes two candidate designs for a better Tor
   relay encryption/decryption protocol, designed to stymie tagging
   attacks and better accord with best practices in protocol design.

   My hope is that readers will examine these protocols, evaluate their
   security and practicality, improve on them, and help to pick one for
   implementation in Tor.

   In section 1, I describe Tor's current relay crypto protocol, its
   drawbacks, how it fits in with the rest of Tor, and some
   requirements/desiderata for a replacement.  In sections 2 and 3, I
   propose two alternative replacements for this protocol.  In section
   4, I discuss their pros and cons.

1. Background and Motivation

1.0. A short overview of Tor's current protocols

   The core pieces of the Tor protocol are the link protocol,
   the circuit extend protocol, the relay protocol, and the stream
   protocol.  All are documented in [TorSpec].

   Briefly:

     - The link protocol handles all direct pairwise communication
       between nodes.  Everything else is transmitted over it.  It
       uses TLS.

     - The circuit extend protocol uses public-key crypto to set up
       multi-node virtual tunnels, called "circuits", from a client
       through one or more nodes.

   *** The relay protocol uses established circuits to communicate
       from a client to a node on a circuit.  That's the one we'll
       be talking about here. ***

     - The stream protocol is tunneled over relay protocol; clients
       use it to tell servers to open anonymous TCP connections, to
       send data, and so forth.  Servers use it to report success or
       failure opening anonymous TCP connections, to send data from
       TCP connections back to clients, and so forth.

   In more detail: The link protocol's job is to connect two
   computers with an encrypted, authenticated stream, to
   authenticate one or both of them to the other, and to provide a
   channel for passing "cells" between them.  The circuit extend
   protocol's job is to set up circuits: persistent tunnels that
   connect a Tor client to an exit node through a series of
   (typically three) hops, each of which knows only the previous and
   next hop, and each of which has a set of keys that they share
   only with the client.  Finally, the relay protocol's job is to
   allow a client to communicate bidirectionally with the node(s) on
   the circuit, once their shared keys have been established.

   (We'll ignore the stream protocol for the rest of this document.)

   Note on terminology: Tor nodes are sometimes called "servers",
   "relays", or "routers".  I'll use all these terms more or less
   interchangeably.  For simplicity's sake, I will call the party
   who is constructing and using a circuit "the client" or "Alice",
   even though nodes sometimes construct circuits too.

   Tor's internal packets are called "cells".  All the cells we deal
   with here are 512 bytes long.

   The nodes in a circuit are called its "hops"; most circuits are 3
   hops long.  This doesn't include the client: when Alice builds a
   circuit through nodes Bob_1, Bob_2, and Bob_3, the first hop is
   "Bob_1" and the last hop is "Bob_3".

1.1. The current relay protocol and its drawbacks

   [This section describes Tor's current relay protocol.  It is not a
   proposal; rather it is what we do now.  Sections 2 and 3 have my
   proposed replacements for it.]

   A "relay cell" is a cell that is generated by the client to send
   to a node, or by a node to send to the client.  It's called a
   "relay" cell because a node that receives one may need to relay
   it to the next or previous node in the circuit (or to handle the
   cell itself).

   A relay cell moving towards the client is called "inbound"; a
   cell moving away is called "outbound".

   When a relay cell is constructed by the client, the client adds one
   layer of crypto for each node that will process the cell, and gives
   the cell to the first node in the circuit.  Each node in turn then
   removes one layer of crypto, and either forwards the cell to the next
   node in the circuit or acts on that cell itself.

   When a relay cell is constructed by a node, it encrypts it and sends
   it to the preceding node in the circuit.  Each node between the
   originating node and the client then encrypts the cell and passes it
   back to the preceding node.  When the client receives the cell, it
   removes layers of crypto until it has an unencrypted cell, which it
   then acts on.

   In the current protocol, the body of each relay cell contains, in
   its unencrypted form:

        Relay command     [1 byte]
        Zeros             [2 bytes]
        StreamID          [2 bytes]
        Digest            [4 bytes]
        Length            [2 bytes]
        Data              [498 bytes]

   (This only adds up to 509 bytes.  That's because the Tor link
   protocol transfers 512-byte cells, and has a 3 byte overhead per
   cell.  Not how we would do it if we were starting over at this
   point.)

   At every node of a circuit, the node relaying a cell
   encrypts/decrypts it with AES128-CTR.  If the cell is outbound
   and the "zeros" field is set to all-zeros, the node additionally
   checks whether 'digest' is correct.  A correct digest is the
   first 4 bytes of the running SHA1 digest of: a shared secret,
   concatenated with all the relay cells destined for this node on
   this circuit so far, including this cell.  If _that's_ true, then
   the node accepts this cell.  (See section 6 of [TorSpec] for full
   detail; see section A.1 for a more rigorous description.)

   The current approach has some actual problems.  Notably:

      * It permits tagging attacks. Because AES_CTR is an XOR-based
        stream cipher, an adversary who controls the first node in a
        circuit can XOR anything he likes into the relay cell, and
        then see whether he/she encounters a correspondingly
        defaced cell at some exit that he also controls.

        That is, the attacker picks some pattern P, and when he
        would transmit some outbound relay cell C at hop 1, he
        instead transmits C xor P.  If an honest exit receives the
        cell, the digest will probably be wrong, and the honest exit
        will reject it.  But if the attacker controls the exit, he
        will notice that he has received a cell C' where the digest
        doesn't match, but where C' xor P _does_ have a good digest.
        The attacker will then know that his two nodes are on the
        same circuit, and thereby be able to link the user (whom the
        first node sees) to her activities (which the last node sees).

        Back when we did the Tor design, this didn't seem like a
        big deal, since an adversary who controls both the first and
        last node in a circuit is presumed to win already based on
        traffic correlation attacks.  This attack seemed strictly
        worse than that, since it was trivially detectable in the
        case where the attacker _didn't_ control both ends.  See
        section 4.4 of the Tor paper [TorDesign] for our early
        thoughts here; see Xinwen Fu et al's 2009 work for a more
        full explanation of the in-circuit tagging attack [XF]; and
        see "The 23 Raccoons'" March 2012 "Analysis of the Relative
        Severity of Tagging Attacks" mail on tor-dev (and the
        ensuing thread) for a discussion of why we may want to care
        after all, due to attacks that use tagging to amplify route
        capture. [23R]

   It also has some less practical issues.

      * The digest portion is too short.  Yes, if you're an attacker
        trying to (say) change an "ls *" into an "rm *", you can
        only expect to get one altered cell out of 2^32 accepted --
        and all future cells on the circuit will be rejected with
        similar probability due to the change in the running hash
        -- but 1/2^32 is a pretty high success rate for crypto attacks.

      * It does MAC-then-encrypt.  That makes smart people cringe.

      * Its approach to MAC is H(Key | Msg), which is vulnerable to
        length extension attack if you've got a Merkle-Damgard hash
        (which we do).  This isn't relevant in practice right now,
        since the only parties who see the digest are the two
        parties that rely on it (because of the MAC-then-encrypt).


1.2. Some feature requirements

   Relay cells can originate at the client or at a relay.  Relay cells
   that originate at the client are given to the first node in the
   circuit, and constructed so that they will be decrypted and forwarded
   by the first M-1 nodes in the circuit, and finally decrypted and
   processed by the Mth node, where the client chooses M. (Usually, the
   Mth node is the the last one, which will be an exit node.) Relay
   cells that originate at a hop in the circuit are given to the
   preceding node, and eventually delivered to the client.

   Tor provides a so called "leaky pipe" circuit topology
   [TorDesign]: a client can send a cell to any node in the circuit,
   not just the last node. I'll try to keep that property, although
   historically we haven't really made use of it.

   In order to implement telescoping circuit construction (where the
   circuit is built by iteratively asking the last node in the
   circuit to extend the circuit one hop more), it's vital that the
   last hop of the circuit be able to change.

   Proposal 188 [Prop188] suggests that we implement a "bridge guards"
   feature: making some (non-public) nodes insert an extra hop into
   the circuit after themselves, in a way that will make it harder
   for other nodes in the network to enumerate them.  We
   therefore want our circuits to be one-hop re-extensible: when the
   client extends a circuit from Bob1 to Bob2, we want Bob1 to be
   able to introduce a new node "Bob1.5" into the circuit such that
   the circuit runs Alice->Bob1->Bob1.5->Bob2. (This feature has
   been called "loose source routing".)

   Any new approach should be able to coexist on a circuit
   with the old approach.  That is, if Alice wants to build a
   circuit through Bob1, Bob2, and Bob3, and only Bob2 supports a
   revised relay protocol, then Alice should be able to build a
   circuit such that she can have Bob1 and Bob3 process each cell
   with the current protocol, and Bob2 process it with a revised
   protocol.  (Why?  Because if all nodes in a circuit needed to use
   the same relay protocol, then each node could learn information
   about the other nodes in the circuit from which relay protocol
   was chosen.  For example, if Bob1 supports the new protocol, and
   sees that the old relay protocol is in use, and knows that Bob2
   supports the new one, then Bob1 has learned that Bob3 is some
   node that does not support the new relay protocol.)

   Cell length needs to be constant as cells move through the
   network.  For historical reasons mentioned above in section 1.1,
   the length of the encrypted part of a relay cell needs to be 509
   bytes.

1.3. Some security requirements

   Two adjacent nodes on a circuit can trivially tell that they are
   on the same circuit, and the first node can trivially tell who
   the client is. Other than that, we'd like any attacker who
   controls a node on the circuit not to have a good way to learn
   any other nodes, even if he/she controls those nodes. [*]

   Relay cells should not be malleable: no relay should be able to
   alter a cell between an honest sender and an honest recipient in
   a way that they cannot detect.

   Relay cells should be secret: nobody but the sender and recipient
   of a relay cell should be able to learn what it says.

   Circuits should resist transparent, recoverable tagging attacks:
   if an attacker controls one node in a circuit and alters a relay
   cell there, no non-adjacent point in the circuit should be able
   to recover the relay cell as it would have received it had the
   attacker not altered it.

   The above properties should apply to sequences of cells too:
   an attacker shouldn't be able to change what sequence of cells
   arrives at a destination (for example, by removing, replaying, or
   reordering one or more cells) without being detected.

   (Feel free to substitute whatever formalization of the above
   requirements makes you happiest, and add whatever caveats are
   necessary to make you comfortable.  I probably missed at least
   two critical properties.)

   [*] Of course, an attacker who controls two points on a circuit
       can probably confirm this through traffic correlation.  But
       we'd prefer that the cryptography not create other, easier
       ways for them to do this.

1.4. A note on algorithms

   This document is deliberately agnostic concerning the choice of
   cryptographic primitives -- not because I have no opinions about
   good ciphers, MACs, and modes of operation -- but because
   experience has taught me that mentioning any particular
   cryptographic primitive will prevent discussion of anything else.

   Please DO NOT suggest algorithms to use in implementing these
   protocols yet.  It will distract!  There will be time later!

   If somebody _else_ suggests algorithms to use, for goodness' sake
   DON'T ARGUE WITH THEM!  There will be time later!


2. Design 1: Large-block encryption

   In this approach, we use a tweakable large-block cipher for
   encryption and decryption, and a tweak-chaining function TC.

2.1. Chained large-block what now?

   We assume the existence of a primitive that provides the desired
   properties of a tweakable[Tweak] block cipher, taking blocks of any
   desired size.  (In our case, the block size is 509 bytes[*].)

   It also takes a Key, and a per-block "tweak" parameter that plays
   the same role that an IV plays in CBC, or that the counter plays
   in counter mode.

   The Tweak-chaining function TC takes as input a previous tweak, a
   tweak chaining key, and a cell; it outputs a new tweak.  Its
   purpose is to make future cells undecryptable unless you have
   received all previous cells.  It could probably be something like
   a MAC of the old tweak and the cell using the tweak chaining key
   as the MAC key.

   (If the initial tweak is secret, I am not sure that TC needs to
   be keyed.)

   [*] Some large-block cipher constructions use blocks whose size is
       the multiple of some underlying cipher's block size.  If we wind
       up having to use one of those, we can use 496-byte blocks instead
       at the cost of 2.5% wasted space.

2.2. The protocol

2.2.1. Setup phase

   The circuit construction algorithm needs to produce forward and
   backward keys Kf and Kb, the forward and backward tweak chaining
   keys TCKf and TCKb, as well as initial tweak values Tf and Tb.

2.2.2. The cell format

   We replace the "Digest" and "Zeros" fields of the cell with a
   single Z-byte "Zeros" field to determine when the cell is
   recognized and correctly decrypted; its length is a security
   parameter.

2.2.3. The decryption operations

   For a relay to handle an inbound RELAY cell, it sets Tb_next to
   TC(TCKb, Tb, Cell).  Then it encrypts the cell using the large
   block cipher keyed with Kb and tweaked with Tb.  Then it sets Tb
   to Tb_next.

   For a relay to handle an outbound RELAY cell, it sets Tf_next to
   TC(TCKf, Tf, Cell).  Then it decrypts the cell using the large
   block cipher keyed with Kf and tweaked with Tf.  Then it sets Tf
   to Tf_next.  Then it checks the 'Zeros' field on the cell;
   if that field is all [00] bytes, the cell is for us.

2.3. Security discussion

   This approach is fairly simple (at least, no more complex than
   its primitives) and achieves some of our security goals.  Because
   of the large block cipher approach, any change to a cell will
   render that cell undecryptable, and indistinguishable from random
   junk.  Because of the tweak chaining approach, if even one cell
   is missed or corrupted or reordered, future cells will also
   decrypt into random junk.

   The tagging attack in this case is turned into a circuit-junking
   attack: an adversary who tries to mount it can probably confirm
   that he was indeed first and last node on a target circuit
   (assuming that circuits which turn to junk in this way are rare),
   but cannot recover the circuit after that point.

   As a neat side observation, note that this approach improves upon
   Tor's current forward secrecy, by providing forward secrecy while
   circuits are still operational, since each change to the tweak
   should make previous cells undecryptable if the old tweak value
   isn't recoverable.

   The length of Zeros is a parameter for what fraction of "random
   junk" cells will potentially be accepted by a router or client.
   If Zeros is Z bytes long, then junk cells will be accepted with
   P < 2^-(8*Z + 7).  (The '+7' is there because the top 7 bits of
   the Length field must also be 0.)

   There's no trouble using this protocol in a mixed circuit, where
   some nodes speak the old protocol and some speak the
   large-block-encryption protocol.

3. Design 2: short-MAC-and-pad

   In this design, we behave more similarly to a mix-net design
   (such as Mixmaster or Mixminion's headers).  Each node checks a
   MAC, and then re-pads the cell to its chosen length before
   decoding the cell.

   This design uses as a primitive a MAC and a stream cipher.  It
   might also be possible to use an authenticating cipher mode,
   if we can find one that works like a stream cipher and allows us
   to efficiently output authenticators for the stream so far.

   NOTE TO AE/AEAD FANS: The encrypt-and-MAC model here could be
   replaced with an authenticated encryption mode without too much
   loss of generality.

3.1. The protocol

3.1.1 Setup phase

   The circuit construction algorithm needs to produce forward and
   backward keys Kf and Kb, forward and backward stream cipher IVs
   IVf and IVb, and forward and backward MAC keys Mf and Mb.

   Additionally, the circuit construction algorithm needs a way for
   the client to securely (and secretly? XXX) tell each hop in the
   circuit a value 'bf' for the number of bytes of MAC it should
   expect on outbound cells and 'bb' for the number of bytes it
   should use on cells it's generating.   Each node can get a
   different 'bf' and 'bb'.  These values can be 0: if a node's bf
   is 0, it doesn't authenticate cells; if its bb is 0, it doesn't
   originate them.

   The circuit construction algorithm also needs a way to tell each
   the client to securely (and secretly? XXX) tell each hop in the
   circuit whether it is allowed to be the final destination for
   relay cells.

   Set the stream Sf and the stream Sb to empty values.

3.1.2. The cell format

   The Zeros and Digest field of the cell format are removed.

3.1.3. The relay operations

   Upon receiving an outbound cell, a node removes the first b bytes
   of the cell, and puts them aside as 'M'.  The node then computes
   between 0 and 2 MACs of the stream consisting of all previously
   MAC'd data, plus the remainder of the cell:

      If b>0 and there is a next hop, the node computes M_relay.

      If this node was told to deliver traffic, or it is the last
      node in the circuit so far, the node computes M_receive.

   M_relay is computed as MAC(stream | "relay"); M_receive is
   computed as MAC(stream | "receive").

   If M = M_receive, this cell is for the node; it should process
   it.

   If M = M_relay, or if b == 0, this cell should be relayed.

   If a MAC was computed and neither of the above cases was met,
   then the cell is bogus; the node should discard it and destroy
   the circuit.

   The node then removes the first bf bytes of the cell, and pads the
   cell at the end with bf zero bytes.  Finally, the node decrypts
   the whole remaining padded cell with the stream cipher.

   To handle an inbound cell, the node simply does a stream cipher
   with no checking.

3.1.4. Generating inbound cells.

   To generate an inbound cell, a node makes a 509-bb byte RELAY
   cell C, encrypts that cell with Kb, appends the encrypted cell to
   Sb, and prepends M_receive(Sb) to the cell.

3.1.5. Generating outbound cells

   Generating an outbound cell is harder, since we need to know what
   padding the earlier nodes will generate in order to know what
   padding the later nodes will receive and compute their MACs, but
   we also need to know what MACs we'll send to the later nodes in
   order to compute which MACs we'll send to the earlier ones.

   Mixnet clients have needed to do this for ages, though, so the
   algorithms are pretty well settled.  I'll give one below in A.3.

3.2. Security discussion

   This approach is also simple and (given good parameter choices)
   can achieve our goals.  The trickiest part is the algorithm that
   clients must follow to package cells, but that's not so bad.

   It's not necessary to check MACs on inbound traffic, because
   nobody but the client can tell scrambled messages from good ones,
   and the client can be trusted to keep the client's own secrets.

   With this protocol, if the attacker tries to do a tagging attack,
   the circuit should get destroyed by the next node in the circuit
   that has a nonzero "bf" value, with probability == 1-2^-(8*bf).
   (If there are further intervening honest nodes, they also have a
   chance to detect the attack.)

   Similarly, any attempt to replay, or reorder outbound cells
   should fail similarly.

   The "bf" values could reveal to each node its position in the
   circuit and the client preferences, depending on how we set them;
   on the other hand, having a fixed bf value would reveal to the
   last node the length of the circuit.  Neither option seems great.

   This protocol doesn't provide any additional forward secrecy
   beyond what Tor has today.  We could fix that by changing our use
   of the stream cipher so that instead of running in counter mode
   between cells, we use a tweaked stream cipher and change the
   tweak with each cell (possibly based on the unused portion of the
   MAC).

   This protocol does support loose source routing, so long as
   no padding bytes are added by any router-added nodes.

   In a circuit, every node has at least one relay cell sent to it:
   even non-exit nodes get a RELAY_EXTEND cell.

4. Discussion

   I'm not currently seeing a reason to strongly prefer one of these
   approaches over another.

   In favor of large-block encryption:
     - The space overhead seems smaller: we need to use up fewer
       bytes in order to get equivalent looking security.

       (For example, if we want to have P < 2^64 that a cell altered
       by hop 1 could be accepted by hop 2 or hop 3, *and* we want P
       < 2^64 that a cell altered by hop 2 could be accepted by hop
       3, the large-block approach needs about 8 bytes for the Zeros
       field, whereas the short-MAC-and-pad approach needs 16 bytes
       worth of MACs.)

     - We get forward secrecy pretty easily.

     - The format doesn't leak anything about the length of the
       circuit, or limit it.

     - We don't need complicated logic to set the 'b' parameters.

     - It doesn't need tricky padding code.

   In the favor of short-MAC-and-pad:
     - All of the primitives used are much better analyzed and
       understood.  There's nothing esoteric there.  The format
       itself is similar to older, well-analyzed formats.

     - Most of the constructions for the large-block-cipher approach
       seem less efficient in CPU cycles than a good stream cipher
       and a MAC. (But I don't want to discuss that now; see section
       1.4 above!)

   Unclear:

     - Suppose that an attacker controls the first and last hop of a
       circuit, and tries an end-to-end tagging attack.  With
       large-block encryption, the tagged cell and all future cells
       on the circuit turn to junk after the tagging attack, with
       P~~100%.  With short-MAC-and-pad, the circuit is destroyed at
       the second hop, with P ~~ 1- 2^(-8*bf).  Is one of these
       approaches materially worse for the attacker?

     - Can we do better than the "compute two MACs" approach for
       distinguishing the relay and receive cases of the
       short-MAC-and-pad protocol?

     - To what extent can we improve these protocols?

     - If we do short-MAC-and-pad, should we apply the forward
       security hack alluded to in section 3.2?

5. Acknowledgments

   Thanks to the many reviewers of the initial drafts of this
   proposal.  If you can make any sense of what I'm saying, they
   deserve much of the credit.

A. Formal description

   Note that in all these cases, more efficient descriptions exist.

A.1. The current Tor relay protocol.

   Relay cell format:

     Relay command     [1 byte]
     Zeros             [2 bytes]
     StreamID          [2 bytes]
     Digest            [4 bytes]
     Length            [2 bytes]
     Data              [498 bytes]

   Circuit setup:

     (Specified elsewhere; the client negotiates with each router in
     a circuit the secret AES keys Kf, Kb, and the secret 'digest
     keys' Df, and Db.  They initialize AES counters Cf and Cb to
     0.  They initialize the digest stream Sf to Df, and Sb to Db.)

   HELPER FUNCTION: CHECK(Cell [in], Stream [in,out]):

     1. If the Zeros field of Cell is not [00 00], return False.
     2. Let Cell' = Cell with its Digest field set to [00 00 00 00].
     3. Let S' = Stream | Cell'.
     4. If SHA1(S') = the Digest field of Cell, set Stream to S',
        and return True.
     5. Otherwise return False.

   HELPER FUNCTION: CONSTRUCT(Cell' [in], Stream [in,out])

     1. Set the Zeros and Digest field of Cell' to [00] bytes.
     2. Set Stream to Stream | Cell'.
     3. Construct Cell from Cell' by setting the Digest field to
        SHA1(Stream), and taking all other fields as-is.
     4. Return Cell.

   HELPER_FUNCTION: ENCRYPT(Cell [in,out], Key [in], Ctr [in,out])
     1. Encrypt Cell's contents using AES128_CTR, with key 'Key' and
        counter 'Ctr'.  Increment 'Ctr' by the length of the cell.

   HELPER_FUNCTION: DECRYPT(Cell [in,out], Key [in], Ctr [in,out])
     1. Same as ENCRYPT.


   Router operation, upon receiving an inbound cell -- that is, one
   sent towards the client.

     1. ENCRYPT(cell, Kb, Cb)
     2. Send the decrypted contents towards the client.

   Router operation, upon receiving an outbound cell -- that is, one
   sent away from the client.

     1. DECRYPT(cell, Kf, Cf)
     2. If CHECK(Cell, Sf) is true, this cell is for us.  Do not
        relay the cell.
     3. Otherwise, this cell is not for us.  Send the decrypted cell
        to the next hop on the circuit, or discard it if there is no
        next hop.

   Router operation, to create a relay cell that will be delivered
   to the client.

     1. Construct a Relay cell Cell' with the relay command, length,
        stream ID, and body fields set as appropriate.
     2. Let Cell = CONSTRUCT(Cell', Sb).
     3. ENCRYPT(Cell, Kb, Cb)
     4. Send the encrypted cell towards the client.

   Client operation, receiving an inbound cell.

     For each hop H in a circuit, starting with the first hop and
     ending (possibly) with the last:

        1. DECRYPT(Cell, Kb_H, Cb_H)

        2. If CHECK(Cell, Sb_H) is true, this cell was sent from hop
           H.  Exit the loop, and return the cell in its current
           form.

     If we reach the end of the loop without finding the right hop,
     the cell is bogus or corrupted.

   Client operation, sending an outbound cell to hop H.

     1. Construct a Relay cell Cell' with the relay command, length,
        stream ID, and body fields set as appropriate.
     2. Let Cell = CONSTRUCT(Cell', Sf_H)
     3. For i = H..1:
          A. ENCRYPT(Cell, Sf_i, Cf_i)
     4. Deliver Cell to the first hop in the circuit.

A.2. The large-block-cipher protocol

   Same as A.1, except for the following changes.

   The cell format is now:
        Zeros             [Z bytes]
        Length            [2 bytes]
        StreamID          [2 bytes]
        Relay command     [1 byte]
        Data              [504-Z bytes]

   Ctr is no longer a counter, but a Tweak value.

   Each key is now a tuple of (Key_Crypt, Key_TC)

   Streams are no longer used.

   HELPER FUNCTION: CHECK(Cell [in], Stream [in,out])
        1. If the Zeros field of Cell contains only [00] bytes,
           return True.
        2. Otherwise return false.

   HELPER FUNCTION: CONSTRUCT(Cell' [in], Stream [in,out])
        1. Let Cell be Cell', with its "Zeros" field set to Z [00]
           bytes.
        2. Return Cell'.

   HELPER FUNCTION: ENCRYPT(Cell [in,out], Key [in], Tweak [in,out])
        2. Encrypt Cell using Key and Tweak
        1. Let Tweak' = TC(Key_TC, Tweak, Cell)
        3. Set Tweak to Tweak'.

   HELPER FUNCTION: DECRYPT(Cell [in,out], Key [in], Tweak [in,out])
        1. Let Tweak' = TC(Key_TC, Tweak, Cell)
        2. Decrypt Cell using Key and Tweak
        3. Set Tweak to Tweak'.

A.3. The short-MAC-and-pad protocol.

   Define M_relay(K,S) as MAC(K, S|"relay").
   Define M_receive(K,S) as MAC(K, S|"receive").
   Define Z(n) as a series of n [00] bytes.
   Define BODY_LEN as 509

   The cell message format is now:

     Relay command     [1 byte]
     StreamID          [2 bytes]
     Length            [2 bytes]
     Data              [variable bytes]

   Helper function: CHECK(Cell [in], b [in], K [in], S [in,out]):
       Let M = Cell[0:b]
       Let Rest = Cell[b:...]
       If b == 0:
          Return (nil, Rest)
       Let S' = S | Rest
       If M == M_relay(K,S')[0:b]:
          Set S = S'
          Return ("relay", Rest)
       If M == M_receive(K,S')[0:b]:
          Set S = S'
          Return ("receive", Rest)
       Return ("BAD", nil)

   HELPER_FUNCTION: ENCRYPT(Cell [in,out], Key [in], Ctr [in,out])
     1. Encrypt Cell's contents using AES128_CTR, with key 'Key' and
        counter 'Ctr'.  Increment 'Ctr' by the length of the cell.

   HELPER_FUNCTION: DECRYPT(Cell [in,out], Key [in], Ctr [in,out])
     1. Same as ENCRYPT.

   Router operation, upon receiving an inbound cell:
     1. ENCRYPT(cell, Kb, Cb)
     2. Send the decrypted contents towards the client.

   Router operation, upon receiving an outbound cell:
     1. Let Status, Msg = CHECK(Cell, bf, Mf, Sf)
     2. If Status == "BAD", drop the cell and destroy the circuit.
     3. Let Cell' = Msg | Z(BODY_LEN - len(Msg))
     4. DECRYPT(Cell', Kf, Cf) [*]
     5. If Status == "receive" or (Status == nil and there is no
        next hop), Cell' is for us: process it.
     6. Otherwise, send Cell' to the next node.

   Router operation, sending a cell towards the client:
     1. Let Body = a relay cell body of BODY_LEN-bb_i bytes.
     2. Let Cell' = ENCRYPT(Body, Kb, Cb)
     3. Let Sb = Sb | Cell'
     4. Let M = M_receive(Mb, Sb)[0:b]
     5. Send the cell M | Cell' back towards the client.

   Client operation, upon receiving an inbound cell:

     For each hop H in the circuit, from first to last:

        1. Let Status, Msg = CHECK(Cell, bb_i, Mb_i, Sb_i)
        2. If Status = "relay", drop the cell and destroy
           the circuit.  (BAD is okay; it means that this hop didn't
           originate the cell.)
        3. DECRYPT(Msg, Kb_i, Cb_i)
        4. If Status = "receive", this cell is from hop i; process
           it.
        5. Otherwise, set Cell = Msg.

   Client operation, sending an outbound cell:

        Let BF = the total of all bf_i values.

        1. Construct a relay cell body Msg of BODY_LEN-BF bytes.
        2. For each hop i, let Stream_i = ENCRYPT(Kf_i,Z(CELL_LEN),Cf_i)
        3. Let Pad_0 = "".
        4. For i in range 1..N, where N is the number of hops:
             Let Pad_i = Pad_{i-1} | Z(bf_i)
             Let S_last = the last len(Pad_i) bytes of Stream_i.
             Let Pad_i = Pad_i xor S_last
           Now Pad_i is the padding as it will stand after node i
           has processed it.

        5. For i in range N..1, where N is the number of hops:
             If this is the last hop, let M_* = M_receive. Else let
             M_* = M_relay.

             Let Body = Msg xor the first len(Msg) bytes of Stream_i

             Let M = M_*(Mf, Body | Pad_(i-1))

             Set Msg = M[:bf_i] | Body

        6. Send Msg outbound to the first relay in the circuit.


   [*] Strictly speaking, we could omit the pad-and-decrypt
       operation once we know we're the final hop.



R. References

[Prop188] Tor Proposal 188: Bridge Guards and other anti-enumeration defenses
     https://gitweb.torproject.org/torspec.git/blob/HEAD:/proposals/188-bridge-guards.txt

[TorSpec] The Tor Protocol Specification
     https://gitweb.torproject.org/torspec.git?a=blob_plain;hb=HEAD;f=tor-spec.txt

[TorDesign] Dingledine et al, "Tor: The Second Generation Onion
     Router",
     https://svn.torproject.org/svn/projects/design-paper/tor-design.pdf

[Tweak] Liskov et al, "Tweakable Block Ciphers",
        http://www.cs.berkeley.edu/~daw/papers/tweak-crypto02.pdf

[XF] Xinwen Fu et al, "One Cell is Enough to Break Tor's Anonymity"

[23R] The 23 Raccoons, "Analysis of the Relative Severity of Tagging
      Attacks"  http://archives.seul.org/or/dev/Mar-2012/msg00019.html
      (You'll want to read the rest of the thread too.)