Filename: 328-relay-overload-report.md
Title: Make Relays Report When They Are Overloaded
Author: David Goulet, Mike Perry
Created: November 3rd 2020
Status: Closed

0. Introduction

Many relays are likely sometimes under heavy load in terms of memory, CPU or network resources which in turns diminishes their ability to efficiently relay data through the network.

Having the capability of learning if a relay is overloaded would allow us to make better informed load balancing decisions. For instance, we can make our bandwidth scanners more intelligent on how they allocate bandwidth based on such metrics from relays.

We could furthermore improve our network health monitoring and pinpoint relays possibly misbehaving or under DDoS attack.

1. Metrics to Report

We propose that relays start collecting several metrics (see section 2) reflecting their loads from different component of tor.

Then, we propose that 1 new line be added to the server descriptor document (see dir-spec.txt, section 2.1.1) for the general overload case.

And 2 new lines to the extra-info document (see dir-spec.txt, section 2.1.2) for more specific overload cases.

The following describes a series of metrics to collect but more might come in the future and thus this is not an exhaustive list.

1.1. General Overload

The general overload line indicates that a relay has reached an "overloaded state" which can be one or many of the following load metrics:

Any OOM invocation due to memory pressure
Any ntor onionskins are dropped [Removed in tor-0.4.6.11 and 0.4.7.5-alpha]
A certain ratio of ntor onionskins dropped. [Added in tor-0.4.6.11 and 0.4.7.5-alpha]
TCP port exhaustion
DNS timeout reached (X% of timeouts over Y seconds). [Removed in tor-0.4.7.3-alpha]
CPU utilization of Tor's mainloop CPU core above 90% for 60 sec [Never implemented]
Control port overload (too many messages queued) [Never implemented]

For DNS timeouts, the X and Y are consensus parameters (overload_dns_timeout_scale_percent and overload_dns_timeout_period_secs) defined in param-spec.txt.

The format of the overloaded line added in the server descriptor document is as follows:

"overload-general" SP version SP YYYY-MM-DD HH:MM:SS NL
   [At most once.]

The timestamp is when at least one metric was detected. It should always be at the hour and thus, as an example, "2020-01-10 13:00:00" is an expected timestamp. Because this is a binary state, if the line is present, we consider that it was hit at the very least once somewhere between the provided timestamp and the "published" timestamp of the document which is when the document was generated.

The overload field should remain in place for 72 hours since last triggered. If the limits are reached again in this period, the timestamp is updated, and this 72 hour period restarts.

The 'version' field is set to '1' for the initial implementation of this proposal which includes all the above overload metrics except from the CPU and control port overload.

1.2. Token bucket size

Relays should report the 'BandwidthBurst' and 'BandwidthRate' limits in their descriptor, as well as the number of times these limits were reached, for read and write, in the past 24 hours starting at the provided timestamp rounded down to the hour.

The format of this overload line added in the extra-info document is as follows:

"overload-ratelimits" SP version SP YYYY-MM-DD SP HH:MM:SS
                      SP rate-limit SP burst-limit
                      SP read-overload-count SP write-overload-count NL
  [At most once.]

The "rate-limit" and "burst-limit" are the raw values from the BandwidthRate and BandwidthBurst found in the torrc configuration file.

The "{read|write}-overload-count" are the counts of how many times the reported limits of burst/rate were exhausted and thus the maximum between the read and write count occurrences. To make the counter more meaningful and to avoid multiple connections saturating the counter when a relay is overloaded, we only increment it once a minute.

The 'version' field is set to '1' for the initial implementation of this proposal.

1.3. File Descriptor Exhaustion

Not having enough file descriptors in this day of age is really a misconfiguration or a too old operation system. That way, we can very quickly notice which relay has a value too small and we can notify them.

The format of this overload line added in the extra-info document is as follows:

"overload-fd-exhausted" SP version YYYY-MM-DD HH:MM:SS NL
  [At most once.]

As the overloaded line, the timestamp indicates that the maximum was reached between the this timestamp and the "published" timestamp of the document.

This overload field should remain in place for 72 hours since last triggered. If the limits are reached again in this period, the timestamp is updated, and this 72 hour period restarts.

The 'version' field is set to '1' for the initial implementation of this proposal which detects fd exhaustion only when a socket open fails.

2. Load Metrics

This section proposes a series of metrics that should be collected and reported to the MetricsPort. The Prometheus format (only one supported for now) is described for each metrics.

2.1 Out-Of-Memory (OOM) Invocation

Tor's OOM manages caches and queues of all sorts. Relays have many of them and so any invocation of the OOM should be reported.

# HELP Total number of bytes the OOM has cleaned up
# TYPE counter
tor_relay_load_oom_bytes_total{<LABEL>} <VALUE>

Running counter of how many bytes were cleaned up by the OOM for a tor component identified by a label (see list below). To make sense, this should be visualized with the rate() function.

Possible LABELs for which the OOM was triggered:

subsys=cell: Circuit cell queue
subsys=dns: DNS resolution cache
subsys=geoip: GeoIP cache
subsys=hsdir: Onion service descriptors

2.2 Onionskin Queues

Onionskins handling is one of the few items that tor processes in parallel but they can be dropped for various reasons when under load. For this metrics to make sense, we also need to gather how many onionskins are we processing and thus one can provide a total processed versus dropped ratio:

# HELP Total number of onionskins
# TYPE counter
tor_relay_load_onionskins_total{<LABEL>} <NUM>

Possible LABELs are:

type=<handshake_type>: Type of handshake of that onionskins.
- Possible values: ntor, tap, fast
action=processed: Indicating how many were processed.
action=dropped: Indicating how many were dropped due to load.

2.3 File Descriptor Exhaustion

Relays can reach a "ulimit" (on Linux) cap that is the number of allowed opened file descriptors. In Tor's use case, this is mostly sockets. File descriptors should be reported as follow:

# HELP Total number of sockets
# TYPE gauge
tor_relay_load_socket_total{<LABEL>} <NUM>

Possible LABELs are:

: How many available sockets.
state=opened: How many sockets are opened.

Note: since tor does track that value in order to reserve a block for critical port such as the Control Port, that value can easily be exported.

2.4 TCP Port Exhaustion

TCP protocol is capped at 65535 ports and thus if the relay ever is unable to open more outbound sockets, that is an overloaded state. It should be reported:

# HELP Total number of times we ran out of TCP ports
# TYPE gauge
tor_relay_load_tcp_exhaustion_total <NUM>

2.5 Connection Bucket Limit

Rate limited connections track bandwidth using a bucket system. Once the bucket is filled and tor wants to send more, it pauses until it is refilled a second later. Once that is hit, it should be reported:

# HELP Total number of global connection bucket limit reached
# TYPE counter
tor_relay_load_global_rate_limit_reached_total{<LABEL>} <NUM>

Possible LABELs are:

side=read: Read side of the global rate limit bucket.
side=write: Write side of the global rate limit bucket.

Tor design proposals