Header List format

It consists of a Timestamp line and zero or more HeaderLines.

All the header lines MUST conform to the HeaderLine format, except the first Timestamp line.

The Timestamp line is not a HeaderLine to keep compatibility with the legacy Bandwidth File format.

Some header Lines MUST appear in specific positions, as documented below. All other Lines can appear in any order.

If a parser does not recognize any extra material in a header Line, the Line MUST be ignored.

If a header Line does not conform to this format, the Line SHOULD be ignored by parsers.

It consists of:

Timestamp NL

[At start, exactly once.]

The Unix Epoch time in seconds of the most recent generator bandwidth result.

If the generator implementation has multiple threads or subprocesses which can fail independently, it SHOULD take the most recent timestamp from each thread and use the oldest value. This ensures all the threads continue running.

If there are threads that do not run continuously, they SHOULD be excluded from the timestamp calculation.

If there are no recent results, the generator MUST NOT generate a new file.

It does not follow the KeyValue format for backwards compatibility with version 1.0.0.

"version" version_number NL

[In second position, zero or one time.]

The specification document format version. It uses semantic versioning [5].

This Line was added in version 1.1.0 of this specification.

Version 1.0.0 documents do not contain this Line, and the version_number is considered to be "1.0.0".

"software" Value NL

[Zero or one time.]

The name of the software that created the document.

This Line was added in version 1.1.0 of this specification.

Version 1.0.0 documents do not contain this Line, and the software is considered to be "torflow".

"software_version" Value NL

[Zero or one time.]

The version of the software that created the document. The version may be a version_number, a git commit, or some other version scheme.

This Line was added in version 1.1.0 of this specification.

"file_created" DateTime NL

[Zero or one time.]

The date and time timestamp in ISO 8601 format and UTC time zone when the file was created.

This Line was added in version 1.1.0 of this specification.

"generator_started" DateTime NL

[Zero or one time.]

The date and time timestamp in ISO 8601 format and UTC time zone when the generator started.

This Line was added in version 1.1.0 of this specification.

"earliest_bandwidth" DateTime NL

[Zero or one time.]

The date and time timestamp in ISO 8601 format and UTC time zone when the first relay bandwidth was obtained.

This Line was added in version 1.1.0 of this specification.

"latest_bandwidth" DateTime NL

[Zero or one time.]

The date and time timestamp in ISO 8601 format and UTC time zone of the most recent generator bandwidth result.

This time MUST be identical to the initial Timestamp line.

This duplicate value is included to make the format easier for people to read.

This Line was added in version 1.1.0 of this specification.

"number_eligible_relays" Int NL

[Zero or one time.]

The number of relays that have enough measurements to be included in the bandwidth file.

This Line was added in version 1.2.0 of this specification.

"minimum_percent_eligible_relays" Int NL

[Zero or one time.]

The percentage of relays in the consensus that SHOULD be included in every generated bandwidth file.

If this threshold is not reached, format versions 1.3.0 and earlier SHOULD NOT contain any relays. (Bandwidth files always include a header.)

Format versions 1.4.0 and later SHOULD include all the relays for diagnostic purposes, even if this threshold is not reached. But these relays SHOULD be marked so that Tor does not vote on them. See section 1.4 for details.

The minimum percentage is 60% in Torflow, so sbws uses 60% as the default.

This Line was added in version 1.2.0 of this specification.

"number_consensus_relays" Int NL

[Zero or one time.]

The number of relays in the consensus.

This Line was added in version 1.2.0 of this specification.

"percent_eligible_relays" Int NL

[Zero or one time.]

The number of eligible relays, as a percentage of the number of relays in the consensus.

      This line SHOULD be equal to:
          (number_eligible_relays * 100.0) / number_consensus_relays
      to the number of relays in the consensus to include in this file.

      This Line was added in version 1.2.0 of this specification.

    "minimum_number_eligible_relays" Int NL

      [Zero or one time.]

The minimum number of relays that SHOULD be included in the bandwidth file. See minimum_percent_eligible_relays for details.

      This line SHOULD be equal to:
          number_consensus_relays * (minimum_percent_eligible_relays / 100.0)

      This Line was added in version 1.2.0 of this specification.

    "scanner_country" CountryCode NL

      [Zero or one time.]

      The country, as in political geolocation, where the generator is run.

      This Line was added in version 1.2.0 of this specification.

    "destinations_countries" CountryCodeList NL

      [Zero or one time.]

The country, as in political geolocation, or countries where the destination Web server(s) are located. The destination Web Servers serve the data that the generator retrieves to measure the bandwidth.

This Line was added in version 1.2.0 of this specification.

"recent_consensus_count" Int NL

[Zero or one time.].

The number of the different consensuses seen in the last data_period days. (data_period is 5 by default.)

      Assuming that Tor clients fetch a consensus every 1-2 hours,
      and that the data_period is 5 days, the Value of this Key SHOULD be
      between:
          data_period * 24 / 2 =  60
          data_period * 24     = 120

      This Line was added in version 1.4.0 of this specification.

    "recent_priority_list_count" Int NL

      [Zero or one time.]

The number of times that a list with a subset of relays prioritized to be measured has been created in the last data_period days. (data_period is 5 by default.)

      In 2019, with 7000 relays in the network, the Value of this Key SHOULD be
      approximately:
          data_period * 24 / 1.5 = 80
      Being 1.5 the approximate number of hours it takes to measure a
      priority list of 7000 * 0.05 (350) relays, when the fraction of relays
      in a priority list is the 5% (0.05).

      This Line was added in version 1.4.0 of this specification.

    "recent_priority_relay_count" Int NL

      [Zero or one time.]

The number of relays that has been in in the list of relays prioritized to be measured in the last data_period days. (data_period is 5 by default.)

      In 2019, with 7000 relays in the network, the Value of this Key SHOULD be
      approximately:
          80 * (7000 * 0.05) = 28000
      Being 0.05 (5%) the fraction of relays in a priority list and 80
      the approximate number of priority lists (see
      "recent_priority_list_count").

      This Line was added in version 1.4.0 of this specification.

    "recent_measurement_attempt_count" Int NL

      [Zero or one time.]

The number of times that any relay has been queued to be measured in the last data_period days. (data_period is 5 by default.)

In 2019, with 7000 relays in the network, the Value of this Key SHOULD be approximately the same as "recent_priority_relay_count", assuming that there is one attempt to measure a relay for each relay that has been prioritized unless there are system, network or implementation issues.

This Line was added in version 1.4.0 of this specification and removed in version 1.5.0.

"recent_measurement_failure_count" Int NL

[Zero or one time.]

The number of times that the scanner attempted to measure a relay in the last data_period days (5 by default), but the relay has not been measured because of system, network or implementation issues.

This Line was added in version 1.4.0 of this specification.

"recent_measurements_excluded_error_count" Int NL

[Zero or one time.]

The number of relays that have no successful measurements in the last data_period days (5 by default).

(See the note in section 1.4, version 1.4.0, about excluded relays.)

This Line was added in version 1.4.0 of this specification.

"recent_measurements_excluded_near_count" Int NL

[Zero or one time.]

The number of relays that have some successful measurements in the last data_period days (5 by default), but all those measurements were performed in a period of time that was too short (by default 1 day).

(See the note in section 1.4, version 1.4.0, about excluded relays.)

This Line was added in version 1.4.0 of this specification.

"recent_measurements_excluded_old_count" Int NL

[Zero or one time.]

The number of relays that have some successful measurements, but all those measurements are too old (more than 5 days, by default).

Excludes relays that are already counted in recent_measurements_excluded_near_count.

(See the note in section 1.4, version 1.4.0, about excluded relays.)

This Line was added in version 1.4.0 of this specification.

"recent_measurements_excluded_few_count" Int NL

[Zero or one time.]

The number of relays that don't have enough recent successful measurements. (Fewer than 2 measurements in the last 5 days, by default).

Excludes relays that are already counted in recent_measurements_excluded_near_count and recent_measurements_excluded_old_count.

(See the note in section 1.4, version 1.4.0, about excluded relays.)

This Line was added in version 1.4.0 of this specification.

"time_to_report_half_network" Int NL

[Zero or one time.]

The time in seconds that it would take to report measurements about the half of the network, given the number of eligible relays and the time it took in the last days (5 days, by default).

(See the note in section 1.4, version 1.4.0, about excluded relays.)

This Line was added in version 1.4.0 of this specification.

"tor_version" version_number NL

[Zero or one time.]

The Tor version of the Tor process controlled by the generator.

This Line was added in version 1.4.0 of this specification.

"mu" Int NL

[Zero or one time.]

The network stream bandwidth average calculated as explained in B4.2.

This Line was added in version 1.7.0 of this specification.

"muf" Int NL

[Zero or one time.]

The network stream bandwidth average filtered calculated as explained in B4.2.

This Line was added in version 1.7.0 of this specification.

KeyValue NL

[Zero or more times.]

"dirauth_nickname" NL

[Zero or one time.]

The dirauth's nickname which publishes this V3BandwidthsFile.

This Line was added in version 1.8.0 of this specification.

There MUST NOT be multiple KeyValue header Lines with the same key. If there are, the parser SHOULD choose an arbitrary Line.

If a parser does not recognize a Keyword in a KeyValue Line, it MUST be ignored.

Future format versions may include additional KeyValue header Lines. Additional header Lines will be accompanied by a minor version increment.

Implementations MAY add additional header Lines as needed. This specification SHOULD be updated to avoid conflicting meanings for the same header keys.

Parsers MUST NOT rely on the order of these additional Lines.

Additional header Lines MUST NOT use any keywords specified in the relay measurements format. If there are, the parser MAY ignore conflicting keywords.

Terminator NL

[Zero or one time.]

The Header List section ends with a Terminator.

In version 1.0.0, Header List ends when the first relay bandwidth is found conforming to the next section.

Implementations of version 1.1.0 and later SHOULD use a 5-character terminator.

Tor 0.4.0.1-alpha and later look for a 5-character terminator, or the first relay bandwidth line. sbws versions 0.1.0 to 1.0.2 used a 4-character terminator, this bug was fixed in 1.0.3.