Filename: 313-relay-ipv6-stats.txt
Title: Tor Relay IPv6 Statistics
Author: teor, Karsten Loesing, Nick Mathewson
Created: 10-February-2020
Status: Accepted
Ticket: #33159

0. Abstract

   We propose that:
     * tor relays should collect statistics on IPv6 connections, and
     * tor relays and bridges should collect statistics on consumed bandwidth.
   Like tor's existing connection and consumed bandwidth statistics, these new
   IPv6 statistics will be published in each relay's extra-info descriptor.

   We also plan to write a script that shows the number of relays in the
   consensus that support:
     * IPv6 extends, and
     * IPv6 client connections.
   This script will be used for medium-term monitoring, during the deployment
   of tor's IPv6 changes in 2020. (See [Proposal 311: Relay IPv6 Reachability]
   and [Proposal 312: Relay Auto IPv6 Address].)

1. Introduction

   Tor relays (and bridges) can accept IPv6 client connections via their
   ORPort. But current versions of tor need to have an explicitly configured
   IPv6 address (see [Proposal 312: Relay Auto IPv6 Address]), and they don't
   perform IPv6 reachability self-checks (see
   [Proposal 311: Relay IPv6 Reachability]).

   As we implement these new IPv6 features in tor, we want to monitor their
   impact on the IPv6 connections and bandwidth in the tor network.

   Tor developers also need to know how many relays support these new IPv6
   features, so they can test tor's IPv6 reachability checks. (In particular,
   see section 4.3.1 in [Proposal 311: Relay IPv6 Reachability]:  Refusing to
   Publish the Descriptor.)

2. Scope

   This proposal modifies Tor's behaviour as follows:

   Relays, bridges, and directory authorities collect statistics on:
     * IPv6 connections, and
     * IPv6 consumed bandwidth.
   The design of these statistics will be based on tor's existing connection
   and consumed bandwidth statistics.

   Tor's existing consumed bandwidth statistics truncate their totals to the
   nearest kilobyte. The existing connection statistics do not perform any
   binning.

   We do not proposed to add any extra noise or binning to these statistics.
   Instead, we expect to leave these changes until we have a consistent
   privacy-preserving statistics framwework for tor. As an example of this
   kind of framework, see
   [Proposal 288: Privacy-Preserving Stats with Privcount (Shamir version)].

   We avoid:
     * splitting connection statistics into clients and relays, and
     * collecting circuit statistics.
   These statistics are more sensitive, so we want to implement
   privacy-preserving statistics, before we consider adding them.

   Throughout this proposal, "relays" includes directory authorities, except
   where they are specifically excluded. "relays" does not include bridges,
   except where they are specifically included. (The first mention of "relays"
   in each section should specifically exclude or include these other roles.)

   Tor clients do not collect any statistics for public reporting. Therefore,
   clients are out of scope in this proposal.

   When this proposal describes Tor's current behaviour, it covers all
   supported Tor versions (0.3.5.7 to 0.4.2.5), as of January 2020, except
   where another version is specifically mentioned.

   This proposal also includes a medium-term monitoring script, which
   calculates the number of relays in the consensus that support IPv6 extends,
   and IPv6 client connections.

3. Monitoring IPv6 Relays in the Consensus

   We propose writing a script that calculates:
     * the number of relays, and
     * the consensus weight fraction of relays,
   in the consensus that:
     * have an IPv6 ORPort,
     * support IPv6 reachability checks,
     * support IPv6 clients, and
     * support IPv6 reachability checks, and IPv6 clients.

   In order to provide easy access to these statistics, we propose
   that the script should:
     * download a consensus (or read an existing consensus), and
     * calculate and report these statistics.

   The following consensus weight fractions should divide by the total
   consensus weight:
     * have an IPv6 ORPort (all relays have an IPv4 ORPort), and
     * support IPv6 reachability checks (all relays support IPv4 reachability).

   The following consensus weight fractions should divide by the
   "usable Guard" consensus weight:
     * support IPv6 clients, and
     * support IPv6 reachability checks and IPv6 clients.

   "Usable Guards" have the Guard flag, but do not have the Exit flag. If the
   Guard also has the BadExit flag, the Exit flag should be ignored.

   Note that this definition of "Usable Guards" is only valid when the
   consensus contains many more guards than exits. That is, Wgd must be 0 in
   the consensus. (See the [Tor Directory Protocol] for more details.)

   Therefore, the script should check that Wgd is 0. If it is not, the script
   should log a warning about the accuracy of the "Usable Guard" statistics.

4. Collecting IPv6 Consumed Bandwidth Statistics

   We propose that relays (and bridges) collect IPv6 consumed bandwidth
   statistics.

   To minimise development and testing effort, we propose re-using the existing
   "bw_array" code in rephist.c.

   In particular, tor currently counts these bandwidth statistics:
     * read,
     * write,
     * dir_read, and
     * dir_write.

   We propose adding the following bandwidth statistics:
     * ipv6_read, and
     * ipv6_write.
   (The IPv4 statistics can be calculated by subtracting the IPv6 statistics
   from the existing total consumed bandwidth statistics.)

   We believe that collecting IPv6 consumed bandwidth statistics is about as
   safe as the existing IPv4+IPv6 total consumed bandwidth statistics.

   See also section 7.5, which adds a BandwidthStatistics torrc option and
   consensus parameter. BandwidthStatistics is an optional change.

5. Collecting IPv6 Connection Statistics

   We propose that relays (but not bridges) collect IPv6 connection statistics.

   Bridges refuse to collect the existing ConnDirectionStatistics, so we do not
   believe it is safe to collect the smaller IPv6 totals on bridges.

   To minimise development and testing effort, we propose re-using the existing
   "bidi" code in rephist.c. (This code may require some refactoring, because
   the "bidi" totals are globals, rather than a struct.)

   In particular, tor currently counts these connection statistics:
     * below threshold,
     * mostly read,
     * mostly written, and
     * both read and written.

   We propose adding IPv6 variants of all these statistics. (The IPv4
   statistics can be calculated by subtracting the IPv6 statistics from the
   existing total connection statistics.)

   See also section 7.6, which adds a ConnDirectionStatistics consensus
   parameter. This consensus paramter is an optional change.

6. Directory Protocol Specification Changes

   We propose adding IPv6 variants of the consumed bandwidth and connection
   direction statistics to the tor directory protocol.

   We propose the following additions to the [Tor Directory Protocol]
   specification, in section 2.1.2. Each addition should be inserted below the
   existing consumed bandwidth and connection direction specifications.

    "ipv6-read-history" YYYY-MM-DD HH:MM:SS (NSEC s) NUM,NUM,NUM... NL
        [At most once]
    "ipv6-write-history" YYYY-MM-DD HH:MM:SS (NSEC s) NUM,NUM,NUM... NL
        [At most once]

        Declare how much bandwidth the OR has used recently, on IPv6
        connections. See "read-history" and "write-history" for more details.
        (The migration notes do not apply to IPv6.)

    "ipv6-conn-bi-direct" YYYY-MM-DD HH:MM:SS (NSEC s) BELOW,READ,WRITE,BOTH NL
        [At most once]

        Number of IPv6 connections, that are used uni-directionally or
        bi-directionally. See "conn-bi-direct" for more details.

   We also propose the following replacement, in the same section:

    "dirreq-read-history" YYYY-MM-DD HH:MM:SS (NSEC s) NUM,NUM,NUM... NL
        [At most once]
    "dirreq-write-history" YYYY-MM-DD HH:MM:SS (NSEC s) NUM,NUM,NUM... NL
        [At most once]

        Declare how much bandwidth the OR has spent on answering directory
        requests. See "read-history" and "write-history" for more details.
        (The migration notes do not apply to dirreq.)

   This replacement is optional, but it may avoid the 3 *read-history
   definitions getting out of sync.

7. Optional Changes

   We propose some optional changes to help relay operators, tor developers,
   and tor's network health. We also expect that these changes will drive IPv6
   relay adoption.

   Some of these changes may be more appropriate as future work, or along with
   other proposed features.

7.1. Log IPv6 Statistics in Tor's Heartbeat Logs

   We propose this optional change, so relay operators can see their own IPv6
   statistics:

   We propose that tor logs its IPv6 consumed bandwidth and connection
   statistics in its regular "heartbeat" logs.

   These heartbeat statistics should be collected over the lifetime of the tor
   process, rather than using the state file, like the statistics in sections
   4 and 5.

   Tor's existing heartbeat logs already show its consumed bandwidth and
   connections (in the link protocol counts).

   We may also want to show IPv6 consumed bandwidth and connections as a
   propotion of the total consumed bandwidth and connections.

   These statistics only show a relay's local bandwidth usage, so they can't
   be used for reporting.

7.2. Show IPv6 Relay Counts on Consensus Health

   The [Consensus Health] website displays a wide rage of tor statistics,
   based on the most recent consensus.

   We propose this optional change, to:
     * help tor developers improve IPv6 support on the tor network,
     * help diagnose issues with IPv6 on the tor network, and
     * drive IPv6 adoption on tor relays.

   Consensus Health adds an IPv6 section, with relays in the consensus that:
     * have an IPv6 ORPort, and
     * support IPv6 reachability checks.

   The definitions of these statistics are in section 3.

   These changes can be tested using the script proposed in section 3.

7.3. Add an IPv6 Reachability Pseudo-Flag on Relay Search

   The [Relay Search] website displays tor relay information, based on the
   current consensus and relay descriptors.

   We propose this optional change, to:
     * help relay operators diagnose issues with IPv6 on their relays, and
     * drive IPv6 adoption on tor relays.

   Relay Search adds a pseudo-flag for relay IPv6 reachability support.

   This pseudo-flag should be given to relays that have:
     * a reachable IPv6 ORPort (in the consensus), and
     * support tor subprotocol version "Relay=3" (or later).
   See [Proposal 311: Relay IPv6 Reachability] for details.

   TODO: Is this a useful change?
         Are there better ways of driving IPv6 adoption?

7.4. Add IPv6 Connections and Consumed Bandwidth Graphs to Tor Metrics

   The [Tor Metrics: Traffic] website displays connection and bandwidth
   information for the tor network, based on relay extra-info descriptors.

   We propose these optional changes, to:
     * help tor developers improve IPv6 support on the tor network,
     * help diagnose issues with IPv6 on the tor network, and
     * drive IPv6 adoption on tor relays.

   Tor Metrics adds the following information to the graphs on the Traffic
   page:

   Consumed Bandwidth by IP version
     * added to the existing [Tor Metrics: Advertised bandwidth by IP version]
       page
     * as a stacked graph, like
       [Tor Metrics: Advertised and consumed bandwidth by relay flags]

   Fraction of connections used uni-/bidirectionally by IP version
     * added to the existing
       [Tor Metrics: Fraction of connections used uni-/bidirectionally] page
     * as a stacked graph, like
       [Tor Metrics: Advertised and consumed bandwidth by relay flags]

7.5. Add a BandwidthStatistics option

   We propose adding a new BandwidthStatistics torrc option and consensus
   parameter, which activates reporting of all these statistics. Currently,
   the existing statistics are controlled by ExtraInfoStatistics, but we
   propose using the new BandwidthStatistics option for them as well.

   The default value of this option should be "auto", which checks the
   consensus parameter. If there is no consensus parameter, the default should
   be 1. (The existing bandwidth statistics are reported by default.)

7.6. Add a ConnDirectionStatistics consensus parameter

   We propose using the existing ConnDirectionStatistics torrc option, and
   adding a consensus parameter with the same name. This option will control
   the new and existing connection statistics.

   The default value of this option should be "auto", which checks the
   consensus parameter. If there is no consensus parameter, the default should
   be 0.

   Bridges refuse to collect the existing ConnDirectionStatistics, so we do not
   believe it is safe to collect the smaller IPv6 totals on bridges. The new
   consensus parameter should also be ignored on bridges.

   If we implement the ConnDirectionStatistics consensus parameter, we can set
   the consensus parameter to 1 for a week or two, so we can collect these
   statistics.

8. Test Plan

   We provide a quick summary of our testing plans.

8.1. Testing IPv6 Relay Consensus Calculations

   We propose to test the IPv6 Relay consensus script using chutney networks.
   However, chutney creates a limited number of relays, so we also need to
   test these changes on consensuses from the public tor network.

   Some of these calculations are similar to the calculations that tor will do,
   to find out if IPv6 reachability checks are reliable. So we may be able to
   check the script against tor's reachability logs. (See section 4.3.1 in
   [Proposal 311: Relay IPv6 Reachability]:  Refusing to Publish the
   Descriptor.)

   The Tor Metrics team may also independently check these calculations.

   Once the script is completed, its output will be monitored by tor
   developers, as more volunteer relay operators deploy the relevant tor
   versions. (And as the number of IPv6 relays in the consensus increases.)

8.2. Testing IPv6 Extra-Info Statistics

   We propose to test the connection and consumed bandwidth statistics using
   chutney networks. However, chutney runs for a short amount of time, and
   creates a limited amount of traffic, so we also need to test these changes
   on the public tor network.

   In particular, we have struggled to test statistics using chutney, because
   tor's hard-coded statistics period is 24 hours. (And most chutney networks
   run for under 1 minute.)

   Therefore, we propose to test these changes on the public network with a
   small number of relays and bridges.

   During 2020, the Tor Metrics team will analyse these statistics on the
   public tor network, and provide IPv6 progress reports. We expect that we may
   discover some bugs during the first analysis.

   Once these changes are merged, they will be monitored by tor developers, as
   more volunteer relay operators deploy the relevant tor versions. (And as the
   number of IPv6 relays in the consensus increases.)

References:

[Consensus Health]:
   https://consensus-health.torproject.org/

[Proposal 288: Privacy-Preserving Stats with Privcount (Shamir version)]:
   https://gitweb.torproject.org/torspec.git/tree/proposals/288-privcount-with-shamir.txt

[Proposal 311: Relay IPv6 Reachability]:
   https://gitweb.torproject.org/torspec.git/tree/proposals/311-relay-ipv6-reachability.txt

[Proposal 312: Relay Auto IPv6 Address]:
   https://gitweb.torproject.org/torspec.git/tree/proposals/312-relay-auto-ipv6-addr.txt

[Relay Search]:
   https://metrics.torproject.org/rs.html

[Tor Directory Protocol]:
   (version 3) https://gitweb.torproject.org/torspec.git/tree/dir-spec.txt

[Tor Manual Page]:
   https://2019.www.torproject.org/docs/tor-manual.html.en

[Tor Metrics: Advertised and consumed bandwidth by relay flags]:
   https://metrics.torproject.org/bandwidth-flags.html

[Tor Metrics: Advertised bandwidth by IP version]:
   https://metrics.torproject.org/advbw-ipv6.html

[Tor Metrics: Fraction of connections used uni-/bidirectionally]:
   https://metrics.torproject.org/connbidirect.html

[Tor Metrics: Traffic]:
   https://metrics.torproject.org/bandwidth-flags.html

[Tor Specification]:
   https://gitweb.torproject.org/torspec.git/tree/tor-spec.txt