Flow control

Each client or relay should do appropriate bandwidth throttling to keep its user happy.

Communicants rely on TCP's default flow control to push back when they stop reading.

The mainline Tor implementation uses token buckets (one for reads, one for writes) for the rate limiting.

Since 0.2.0.x, Tor has let the user specify an additional pair of token buckets for "relayed" traffic, so people can deploy a Tor relay with strict rate limiting, but also use the same Tor as a client. To avoid partitioning concerns we combine both classes of traffic over a given OR connection, and keep track of the last time we read or wrote a high-priority (non-relayed) cell. If it's been less than N seconds (currently N=30), we give the whole connection high priority, else we give the whole connection low priority. We also give low priority to reads and writes for connections that are serving directory information. See proposal 111 for details.

Link padding can be created by sending PADDING or VPADDING cells along the connection; relay messages of type "DROP" can be used for long-range padding. The bodies of PADDING cells, VPADDING cells, or DROP message are filled with padding bytes. See Cell Packet format.

If the link protocol is version 5 or higher, link level padding is enabled as per padding-spec.txt. On these connections, clients may negotiate the use of padding with a PADDING_NEGOTIATE command whose format is as follows:

         Version           [1 byte]
         Command           [1 byte]
         ito_low_ms        [2 bytes]
         ito_high_ms       [2 bytes]

Currently, only version 0 of this cell is defined. In it, the command field is either 1 (stop padding) or 2 (start padding). For the start padding command, a pair of timeout values specifying a low and a high range bounds for randomized padding timeouts may be specified as unsigned integer values in milliseconds. The ito_low_ms field should not be lower than the current consensus parameter value for nf_ito_low (default: 1500). The ito_high_ms field should not be lower than ito_low_ms. (If any party receives an out-of-range value, they clamp it so that it is in-range.)

For the stop padding command, the timeout fields should be sent as zero (to avoid client distinguishability) and ignored by the recipient.

For more details on padding behavior, see padding-spec.txt.

Circuit-level flow control

To control a circuit's bandwidth usage, each OR keeps track of two 'windows', consisting of how many DATA-bearing relay cells it is allowed to originate or willing to consume.

(For the purposes of flow control, we call a relay cell "DATA-bearing" if it holds a DATA relay message. Note that this design does not limit relay cells that don't contain a DATA message; this limitation may be addressed in the future.)

These two windows are respectively named: the package window (packaged for transmission) and the deliver window (delivered for local streams).

Because of our leaky-pipe topology, every relay on the circuit has a pair of windows, and the OP has a pair of windows for every relay on the circuit. These windows apply only to originated and consumed cells. They do not, however, apply to relayed cells, and a relay that is never used for streams will never decrement its windows or cause the client to decrement a window.

Each 'window' value is initially set based on the consensus parameter 'circwindow' in the directory (see dir-spec.txt), or to 1000 DATA-bearing relay cells if no 'circwindow' value is given. In each direction, cells that are not RELAY_DATA cells do not affect the window.

An OR or OP (depending on the stream direction) sends a RELAY_SENDME message to indicate that it is willing to receive more DATA-bearing cells when its deliver window goes down below a full increment (100). For example, if the window started at 1000, it should send a RELAY_SENDME when it reaches 900.

When an OR or OP receives a RELAY_SENDME, it increments its package window by a value of 100 (circuit window increment) and proceeds to sending the remaining DATA-bearing cells.

If a package window reaches 0, the OR or OP stops reading from TCP connections for all streams on the corresponding circuit, and sends no more DATA-bearing cells until receiving a RELAY_SENDME message.

If a deliver window goes below 0, the circuit should be torn down.

Starting with tor-0.4.1.1-alpha, authenticated SENDMEs are supported (version 1, see below). This means that both the OR and OP need to remember the rolling digest of the relay cell that precedes (triggers) a RELAY_SENDME. This can be known if the package window gets to a multiple of the circuit window increment (100).

When the RELAY_SENDME version 1 arrives, it will contain a digest that MUST match the one remembered. This represents a proof that the end point of the circuit saw the sent relay cells. On failure to match, the circuit should be torn down.

To ensure unpredictability, random bytes should be added to at least one RELAY_DATA cell within one increment window. In other word, at every 100 data-bearing cells (increment), random bytes should be introduced in at least one cell.

SENDME Message Format

A circuit-level RELAY_SENDME message always has its StreamID=0.

An OR or OP must obey these two consensus parameters in order to know which version to emit and accept.

      'sendme_emit_min_version': Minimum version to emit.
      'sendme_accept_min_version': Minimum version to accept.

If a RELAY_SENDME version is received that is below the minimum accepted version, the circuit should be closed.

The body of a RELAY_SENDME message contains the following:

      VERSION     [1 byte]
      DATA_LEN    [2 bytes]
      DATA        [DATA_LEN bytes]

The VERSION tells us what is expected in the DATA section of length DATA_LEN and how to handle it. The recognized values are:

0x00: The rest of the message should be ignored.

0x01: Authenticated SENDME. The DATA section MUST contain:

DIGEST [20 bytes]

         If the DATA_LEN value is less than 20 bytes, the message should be
         dropped and the circuit closed. If the value is more than 20 bytes,
         then the first 20 bytes should be read to get the DIGEST value.

         The DIGEST is the rolling digest value from the DATA-bearing relay cell that
         immediately preceded (triggered) this RELAY_SENDME. This value is
         matched on the other side from the previous cell sent that the OR/OP
         must remember.

         (Note that if the digest in use has an output length greater than 20
         bytes—as is the case for the hop of an onion service rendezvous
         circuit created by the hs_ntor handshake—we truncate the digest
         to 20 bytes here.)

If the VERSION is unrecognized or below the minimum accepted version (taken from the consensus), the circuit should be torn down.

Stream-level flow control

Edge nodes use RELAY_SENDME messages to implement end-to-end flow control for individual connections across circuits. Similarly to circuit-level flow control, edge nodes begin with a window of DATA-bearing cells (500) per stream, and increment the window by a fixed value (50) upon receiving a RELAY_SENDME message. Edge nodes initiate RELAY_SENDME messages when both a) the window is <= 450, and b) there are less than ten cells' worth of data remaining to be flushed at that edge.

Stream-level RELAY_SENDME messages are distinguished by having nonzero StreamID. They are still empty; the body still SHOULD be ignored.