Filename: 275-md-published-time-is-silly.txt
Title: Stop including meaningful "published" time in microdescriptor consensus
Author: Nick Mathewson
Created: 20-Feb-2017
Status: Closed
Target: 0.3.1.x-alpha
Implemented-In: 0.4.8.1-alpha

0. Status:

   As of 0.2.9.11 / 0.3.0.7 / 0.3.1.1-alpha, Tor no longer takes any
   special action on "future" published times, as proposed in section 4.

   As of 0.4.0.1-alpha, we implemented a better mechanism for relays to know
   when to publish. (See proposal 293.)

1. Overview

   This document proposes that, in order to limit the bandwidth needed
   for networkstatus diffs, we remove "published" part of the "r" lines
   in microdescriptor consensuses.

   The more extreme, compatibility-breaking version of this idea will
   reduce ed consensus diff download volume by approximately 55-75%.  A
   less-extreme interim version would still reduce volume by
   approximately 5-6%.

2. Motivation

   The current microdescriptor consensus "r" line format is:
     r Nickname Identity Published IP ORPort DirPort
   as in:
     r moria1 lpXfw1/+uGEym58asExGOXAgzjE 2017-01-10 07:59:25 \
        128.31.0.34 9101 9131

   As I'll show below, there's not much use for the "Published" part
   of these lines.  By omitting them or replacing them with
   something more compressible, we can save space.

   What's more, changes in the Published field are one of the most
   frequent changes between successive networkstatus consensus
   documents.  If we were to remove this field, then networkstatus diffs
   (see proposal 140) would be smaller.

3. Compatibility notes

   Above I've talked about "removing" the published field.  But of
   course, doing this would make all existing consensus consumers
   stop parsing the consensus successfully.

   Instead, let's look at how this field is used currently in Tor,
   and see if we can replace the value with something else.

      * Published is used in the voting process to decide which
        descriptor should be considered.  But that is taken from
        vote networkstatus documents, not consensuses.

      * Published is used in mark_my_descriptor_dirty_if_too_old()
        to decide whether to upload a new router descriptor.  If the
        published time in the consensus is more than 18 hours in the
        past, we upload a new descriptor.  (Relays are potentially
        looking at the microdesc consensus now, since #6769 was
        merged in 0.3.0.1-alpha.)  Relays have plenty of other ways
        to notice that they should upload new descriptors.

      * Published is used in client_would_use_router() to decide
        whether a routerstatus is one that we might possibly use.
        We say that a routerstatus is not usable if its published
        time is more than OLD_ROUTER_DESC_MAX_AGE (5 days) in the
        past, or if it is not at least
        TestingEstimatedDescriptorPropagationTime (10 minutes) in
        the future. [***] Note that this is the only case where anything
        is rejected because it comes from the future.

          * client_would_use_router() decides whether we should
            download a router descriptor (not a microdescriptor)
            in routerlist.c

          * client_would_use_router() is used from
            count_usable_descriptors() to decide which relays are
            potentially usable, thereby forming the denominator of
            our "have descriptors / usable relays" fraction.

   So we have a fairly limited constraints on which Published values
   we can safely advertize with today's Tor implementations.  If we
   advertise anything more than 10 minutes in the future,
   client_would_use_router() will consider routerstatuses unusable.
   If we advertize anything more than 18 hours in the past, relays
   will upload their descriptors far too often.

4. Proposal

   Immediately, in 0.2.9.x-stable (our LTS release series), we
   should stop caring about published_on dates in the future.  This
   is a two-line change.

   As an interim solution: We should add a new consensus method number
   that changes the process by which Published fields in consensuses are
   generated.  It should set all Published fields in the consensus
   to be the same value.  These fields should be taken to rotate
   every 15 hours, by taking consensus valid-after time, and rounding
   down to the nearest multiple of 15 hours since the epoch.

   As a longer-term solution: Once all Tor versions earlier than 0.2.9.x
   are obsolete (in mid 2018), we can update with a new consensus
   method, and set the published_on date to some safe time in the
   future.

5. Analysis

   To consider the impact on consensus diffs: I analyzed consensus
   changes over the month of January 2017, using scripts at [1].

   With the interim solution in place, compressed diff sizes fell by
   2-7% at all measured intervals except 12 hours, where they increased
   by about 4%.  Savings of 5-6% were most typical.

   With the longer-term solution in place, and all published times held
   constant permanently, the compressed diff sizes were uniformly at
   least 56% smaller.

   With this in mind, I think we might want to only plan to support the
   longer-term solution.

    [1] https://github.com/nmathewson/consensus-diff-analysis