236-single-guard-node - Tor design proposals

Filename: 236-single-guard-node.txt
Title: The move to a single guard node
Author: George Kadianakis, Nicholas Hopper
Created: 2014-03-22
Status: Closed

-1. Implementation-status

   Partially implemented, and partially superseded by proposal 271.

0. Introduction

   It has been suggested that reducing the number of guard nodes of
   each user and increasing the guard node rotation period will make
   Tor more resistant against certain attacks [0].

   For example, an attacker who sets up guard nodes and hopes for a
   client to eventually choose them as their guard will have much less
   probability of succeeding in the long term.

   Currently, every client picks 3 guard nodes and keeps them for 2 to
   3 months (since 0.2.4.12-alpha) before rotating them. In this
   document, we propose the move to a single guard per client and an
   increase of the rotation period to 9 to 10 months.

1. Proposed changes

1.1. Switch to one guard per client

   When this proposal becomes effective, clients will switch to using
   a single guard node.

   That is, in its first startup, Tor picks one guard and stores its
   identity persistently to disk. Tor uses that guard node as the
   first hop of its circuits from thereafter.

   If that Guard node ever becomes unusable, rather than replacing it,
   Tor picks a new guard and adds it to the end of the list. When
   choosing the first hop of a circuit, Tor tries all guard nodes from
   the top of the list sequentially till it finds a usable guard node.

   A Guard node is considered unusable according to section "5. Guard
   nodes" in path-spec.txt. The rest of the rules from that section
   apply here too. XXX which rules specifically? -asn
   XXX Probably the rules about how to add a new guard (only after
       contact), when to re-try a guard for reachability, and when to
       discard a guard?  -nickhopper

   XXX Do we need to specify how already existing clients migrate?

1.1.1. Alternative behavior to section 1.1

   Here is an alternative behavior than the one specified in the
   previous section. It's unclear which one is better.

   Instead of picking a new guard when the old guard becomes unusable,
   we pick a number of guards in the beginning but only use the top
   usable guard each time. When our guard becomes unusable, we move to
   the guard below it in the list.

   This behavior _might_ make some attacks harder; for example, an
   attacker who shoots down your guard in the hope that you will pick
   his guard next, is now forced to have evil guards in the network at
   the time you first picked your guards.

   However, this behavior might also influence performance, since a
   guard that was fast enough 7 months ago, might not be this fast
   today. Should we reevaluate our opinion based on the last
   consensus, when we have to pick a new guard? Also, a guard that was
   up 7 months ago might be down today, so we might end up sampling
   from the current network anyway.

1.2. Increase guard rotation period

   When this proposal becomes effective, Tor clients will set the
   lifetime of each guard to a random time between 9 to 10 months.

   If Tor tries to use a guard whose age is over its lifetime value,
   the guard gets discarded (also from persistent storage) and a new
   one is picked in its place.

   XXX We didn't do any analysis on extending the rotation period.
       For example, we don't even know the average age of guards, and
       whether all guards stay around for less than 9 months anyway.
       Maybe we should do some analysis before proceeding?

   XXX The guard lifetime should be controlled using the
       (undocumented?) GuardLifetime consensus option, right?

1.2.1. Alternative behavior to section 1.2

   Here is an alternative behavior than the one specified in the
   previous section. It's unclear which one is better.

   Similar to section 1.2, but instead of rotating to completely new
   guard nodes after 9 months, we pick a few extra guard nodes in the
   beginning, and after 9 months we delete the already used guard
   nodes and use the one after them.

   This has approximately the same tradeoffs as section 1.1.1.

   Also, should we check the age of all of our guards periodically, or
   only check them when we try to use them?

1.3. Age of guard as a factor on guard probabilities

   By increasing the guard rotation period we also increase the lack
   of utilization for young guards since clients will rotate guards even
   more infrequently now (see 'Phase three' of [1]).

   We can mitigate this phenomenon by treating these recent guards as
   "fractional" guards:

   To do so, everytime an authority needs to vote for a guard, it
   reads a set of consensus documents spanning the past NNN months,
   where NNN is the number of months in the guard rotation period (10
   months if this proposal is adopted in full) and calculates in how
   many consensuses it has had the guard flag for.

   Then, in their votes, the authorities include the Guard Fraction of
   each guard by appending '[SP "GuardFraction=" INT]' in the guard's
   "w" line. Its value is an integer between 0 and 100, with 0 meaning
   that it's a brand new guard, and 100 that it has been present in
   all the inspected consensuses.

   A guard N that has been visible for V out of NNN*30*24 consensuses
   has had the opportunity to be chosen as a guard by approximately
   F = V/NNN*30*24 of the clients in the network, and the remaining
   1-F fraction of the clients have not noticed this change.  So when
   being chosen for middle or exit positions on a circuit, clients
   should treat N as if F fraction of its bandwidth is a guard
   (respectively, dual) node and (1-F) is a middle (resp, exit) node.
   Let Wpf denote the weight from the 'bandwidth-weights' line a
   client would apply to N for position p if it had the guard
   flag, Wpn the weight if it did not have the guard flag, and B the
   measured bandwidth of N in the consensus.  Then instead of choosing
   N for position p proportionally to Wpf*B or Wpn*B, clients should
   choose N proportionally to F*Wpf*B + (1-F)*Wpn*B.

   Similarly, when calculating the bandwidth-weights line as in
   section 3.8.3 of dir-spec.txt, directory authorities should treat N
   as if fraction F of its bandwidth has the guard flag and (1-F) does
   not.  So when computing the totals G,M,E,D, each relay N with guard
   visibility fraction F and bandwidth B should be added as follows:

   G' = G + F*B, if N does not have the exit flag
   M' = M + (1-F)*B, if N does not have the exit flag
   D' = D + F*B, if N has the exit flag
   E' = E + (1-F)*B, if N has the exit flag

1.3.1. Guard Fraction voting

  To pass that information to clients, we introduce consensus method
  19, where if 3 or more authorities provided GuardFraction values in
  their votes, the authorities produce a consensus containing a
  GuardFraction keyword equal to the low-median of the GuardFraction
  votes.

  The GuardFraction keyword is appended in the 'w' line of each router
  in the consensus, after the optional 'Unmeasured' keyword. Example:
    w Bandwidth=20 Unmeasured=1 GuardFraction=66
  or
    w Bandwidth=53600 GuardFraction=99

1.4. Raise the bandwidth threshold for being a guard

   From dir-spec.txt:
      "Guard" -- A router is a possible 'Guard' if its Weighted Fractional
       Uptime is at least the median for "familiar" active routers, and if
       its bandwidth is at least median or at least 250KB/s.

   When this proposal becomes effective, authorities should change the
   bandwidth threshold for being a guard node to 2000KB/s instead of
   250KB/s.

   Implications of raising the bandwidth threshold are discussed in
   section 2.3.

   XXX Is this insane? It's an 8-fold increase.

2. Discussion

2.1. Guard node set fingerprinting

   With the old behavior of three guard nodes per user, it was
   extremely unlikely for two users to have the same guard node
   set. Hence the set of guard nodes acted as a fingerprint to each
   user.

   When this proposal becomes effective, each user will have one guard
   node. We believe that this slightly reduces the effectiveness of
   this fingerprint since users who pick a popular guard node will now
   blend in with thousands of other users. However, clients who pick a
   slow guard will still have a small anonymity set [2].

   All in all, this proposal slightly improves the situation of guard
   node fingerprinting, but does not solve it. See the next section
   for a suggested scheme that would further fix the guard node set
   fingerprinting problem

2.1.1. Potential fingerprinting solution: Guard buckets

   One of the suggested alternatives that moves us closer to solving
   the guard node fingerprinting problem, would be to split the list
   of N guard nodes into buckets of K guards, and have each client
   pick a bucket [3].

   This reduces the fingerprint from N-choose-k to N/k guard set
   choices; it also allows users to have multiple guard nodes which
   provides reliability and performance.

   Unfortunately, the implementation of this idea is not as easy and
   its anonymity effects are not well understood so we had to reject
   this alternative for now.

2.2. What about 'multipath' schemes like Conflux?

   By switching to one guard, we rule out the deployment of
   'multipath' systems like Conflux [4] which build multiple circuits
   through the Tor network and attempt to detect and use the most
   efficient circuits.

   On the other hand, the 'Guard buckets' idea outlined in section
   2.1.1 works well with Conflux-type schemes so it's still worth
   considering.

2.3. Implications of raising the bandwidth threshold for guards

   By raising the bandwidth threshold for being a guard we directly
   affect the performance and anonymity of Tor clients. We performed a
   brief analysis of the implications of switching to one guard and
   the results imply that the changes are not tragic [2].

   Specifically, it seems that the performance of about half of the
   clients will degrade slightly, but the performance of the other
   half will remain the same or even improve.

   Also, it seems that the powerful guard nodes of the Tor network
   have enough total bandwidth capacity to handle client traffic even
   if some slow guard nodes get discarded.

   On the anonymity side, by increasing the bandwidth threshold to
   2MB/s we half our guard nodes; we discard 1000 out of 2000
   guards. Even if this seems like a substantial diversity loss, it
   seems that the 1000 discarded guard nodes had a very small chance
   of being selected in the first place (7% chance of any of the being
   selected).

   However, it's worth noting that the performed analysis was quite
   brief and the implications of this proposal are complex, so we
   should be prepared for surprises.

2.4. Should we stop building circuits after a number of guard failures?

   Inspired by academic papers like the Sniper attack [5], a powerful
   attacker can choose to shut down guard nodes till a client is
   forced to pick an attacker controlled guard node. Similarly, a
   local network attacker can kill all connections towards all guards
   except the ones she controls.

   This is a very powerful attack that is hard to defend against. A
   naive way of defending against it would be for Tor to refuse to
   build any more circuits after a number of guard node failures have
   been experienced.

   Unfortunately, we believe that this is not a sufficiently strong
   countermeasure since puzzled users will not comprehend the
   confusing warning message about guard node failures and they will
   instead just uninstall and reinstall TBB to fix the issue.

2.5. What this proposal does not propose

   Finally, this proposal does not aim to solve all the problems with
   guard nodes. This proposal only tries to solve some of the problems
   whose solution is analyzed sufficiently and seems harmless enough
   to us.

   For example, this proposal does not try to solve:
   - Guard enumeration attacks. We need guard layers or virtual
     circuits for this [6].
   - The guard node set fingerprinting problem [7]
   - The fact that each isolation profile or virtual identity should
     have its own guards.

XXX It would also be nice to have some way to easily revert back to 3
    guards if we later decide that a single guard was a very stupid
    idea.

References:

[0]: https://blog.torproject.org/blog/improving-tors-anonymity-changing-guard-parameters
     http://freehaven.net/anonbib/#wpes12-cogs

[1]: https://blog.torproject.org/blog/lifecycle-of-a-new-relay

[2]: https://lists.torproject.org/pipermail/tor-dev/2014-March/006458.html

[3]: https://trac.torproject.org/projects/tor/ticket/9273#comment:4

[4]: http://freehaven.net/anonbib/#pets13-splitting

[5]: https://blog.torproject.org/blog/new-tor-denial-service-attacks-and-defenses

[6]: https://trac.torproject.org/projects/tor/ticket/9001

[7]: https://trac.torproject.org/projects/tor/ticket/10969