Filename: 349-command-state-validation.md
Title: Client-Side Command Acceptance Validation
Author: Mike Perry
Created: 2023-08-17
Status: Draft

Introduction

The ability of relays to inject end-to-end relay cells that are ignored by clients allows malicious relays to create a covert channel to verify that they are present in multiple positions of a path. This covert channel allows a Guard to deanonymize 100% of its traffic, or just all the traffic of a particular client IP address.

This attack was first documented in DROPMARK. Proposal 344 describes the severity of this attack, and how this kind of end-to-end covert channel leads to full deanonymization, in a reliable way, in practice. (Recall that dropped cell attacks are most severe when an adversary can inject arbitrary end-to-end data patterns at times when the circuit is known to be idle, before it is used for traffic; injection at this point enables path bias attacks which can ensure that only malicious Guard+Exit relays are present in all circuits used by a particular target client IP address. For further details, see Proposal 344.)

This proposal is targeting arti-client, not C-Tor. This proposal is specific to client-side checks of relay cells and relay messages. Its primary change to behavior is the definition of state machines that enforce what relay message commands are acceptable on a given circuit, and when.

By applying and enforcing these state machine rules, we prevent the end-to-end transmission of arbitrary amounts of data, and ensure that predictable periods of the protocol are happening as expected, and not filled with side channel packet patterns.

Overview of dropped cell types

Dropped cells are cells that a relay can inject that end up ignored and discarded by a Tor client.

These include:

Unparsable cells
invalid relay commands
Unrecognized cells (ie: wrong source hop, or decrypt failures)
unsupported (or consensus-disabled) relay commands or extensions
out-of-context relay commands
duplicate relay commands
relay commands that hit any error codepaths
relay commands for an invalid or already-closed stream ID
semantically void relay cells (incl relay data len == 0, or PING)
onion descriptor-appended junk

Items 1-4 and 8 are handled by the existing relay command parsers in arti. In these cases, arti closes the circuit already.

XXX: Arti's relay parser is lazy; see https://gitlab.torproject.org/tpo/core/arti/-/merge_requests/1978 Does this mean that individual components need to properly propagate error information in order for circuits to get closed, when a command does not parse?

The state machines of this proposal handle 5-7 in a rigorous way. (In many cases of out-of-context relay cells, arti already closes the circuit; our goal here is to centralize this validation so that we can ensure that it is not possible for any relay commands to omit checks or allow unbounded activity.)

XXX: Does arti allow extra onion-descriptor junk to be appended after the descriptor signature? C-Tor does...

Architectural Patterns and Behavior

Ideally, the handling of invalid protocol behavior should be centralized, so that validation can happen in one easy-to-audit place, rather than spread across the codebase (as it currently is with C-Tor).

For some narrow cases of invalid protocol activity, this is trivial. The relay command acceptance is centralized in arti, which allows arti to immediately reject unknown or disabled relay commands. This kind of validation is necessary, but not sufficient, in order to prevent dropped cell vectors.

Things quickly get complicated when handling parsable relay cells sent during an inappropriate time, or other activity such as duplicate relay commands, semantically void cells, or commands that would hit an error condition, or lazy parsing failure, deep in the code and be silently accepted without closing the circuit.

To handle such cases, we propose adding a relay command message state machine pattern. Each relay protocol, when it becomes active on a circuit, must register a state machine that handles validating its messages.

Because multiple relay protocols can be active at a time, multiple validation state machines can be attached to a circuit. This also allows protocols to create their own validation without needing to modify the entire validation process. Relay messages that are not accepted by any active protocol validation handler MUST result in circuit close.

Architectural Patterns

In order to handle these cases, we rely on some architectural patterns:

No relay message command may be sent to the client unless it is unless explicitly allowed by the specification, advertised as supported, and negotiated on a particular channel or circuit. (Prop#346)
Any relay commands or extension fields not successfully negotiated on a circuit are invalid. This includes cells from intermediate hops, which must also negotiate their use (example: padding machine negotiation to middles).
By following the above principles, state machines can be developed that govern when a relay command is acceptable. This covers the majority of protocol activity. See Section 3.
For some commands, additional checks must be performed by using context of the protocol itself.

The following relay commands require additional module state to enforce limitations, beyond what is known by a state machine, for #4:

RELAY_COMMAND_SENDME
- Requires checking that the auth digest hash is accurate
RELAY_COMMAND_XOFF and RELAY_COMMAND_XON
- Context and rate limiting is stream-dependent
RELAY_COMMAND_DROP:
- This can only be accepted from a hop if there is a padding machine at that hop.
RELAY_COMMAND_INTRODUCE2
- Requires inspecting replay cache (however, circuits should not get closed because replays can come from the client)

Behavior

When an invalid relay cell or relay message is encountered, the corresponding circuit should be immediately closed.

Initially, this can be accomplished by sending a DESTROY cell to the Guard relay.

Additionally, when closing circuits in this way, clients must take care not to allow cases of adversarially-induced infinite circuit creation in non-onion service protocols that are not protected by Vanguards/Vanguards-lite, by limiting the number of retries they perform. (One such example of this is a malicious conflux exit that repeatedly kills only one leg by injecting dropped cells to close the circuit.)

While we also specify some cases where the channel to the Guard should be closed, this is not necessary in the general case.

XXX: I can't think of any issues severe enough to actually warrant the following, but Florentin pointed it out as a possibility: A malicious Guard may withhold the DESTROY, and still allow full identifier transmission before the circuit is closed. While this does not directly allow full deanonymization because the client won't actually use the circuit, it may still be enough to make the vector useful for other attacks. For completeness against this vector, we may want to consider sending a new RELAY_DESTROY command to the middle node, such that it has responsibility for tearing down a circuit by sending its own DESTROYS in both directions, and then have the client send its own DESTROY if the client does not get a DESTROY from the Guard. >>> See torspec#220: https://gitlab.torproject.org/tpo/core/torspec/-/issues/220

State machine descriptions

These state machines apply only at the client. (There is no information leak from extra cells in the protocol on the relay side, so we will not be specifying relay-side enforcement, or implementing it for C-Tor.)

There are multiple state machines, describing particular circuit purposes and/or components of the Tor relay protocol.

Each state machine has a "Trigger", and a "Message Scope". The "Trigger" is the condition, relay command, or action that causes the state machine to get added to a circuit's command state validator set. The Message Scope is where the state machine applies: to specific a hop number, stream ID, or both.

A circuit can have multiple state machines attached at one time.

If no state machine accepts a relay command, then the circuit MUST be closed.
When we say "receive X" we mean "receive a valid cell of type X". If the cell is invalid, we MUST kill the circuit

Relay message handlers

The state machines process enveloped relay message commands. Ie, with respect to prop#340, they operate on the message bodies, with associated stream ID.

With respect to Proposal #340, the calls to state machine validation would go after converting cells to messages, but before parsing the message body itself, to still minimize exposure of the parser attack surfaces.

XXX: Again, some validation will require early parsing, not lazy parsing

There are multiple relay message handlers that can be registered with each circuit ID, for a specific hop on that circuit ID, depending on the protocols that are in use on that circuit with that hop, as well as the streams to that hop.

Each handler has a Message Scope, that acts as a filter such that only relay command messages from this scope are processed by that handler.

If a message is not accepted by any active handler, the circuit MUST be closed.

Base Handler

Purpose: This handler validates commands for circuit construction and circuit-level SENDME activity.

Trigger: Creation of a circuit; ntor handhshake success for a hop

Message Scope: The circuit ID and hop number must match for this handler to apply. (Because of leaky pipes, each hop of the circuit has a base handler added when that hop completes an ntor handshake and is added to the circuit.)

START:
  Upon sending EXTEND:
     Enter EXTEND_SENT.

  Receive SENDME:
     Ensure expected auth digest matches; close circuit otherwise
     No transition.

EXTEND_SENT:
  Receiving EXTENDED:
     Enter START.

  Receive SENDME:
     Ensure expected auth digest matches; close circuit otherwise
     No transition.

Client Introducing Handler

Purpose: Circuits used by clients to connect to a service introduction point have this handler attached.

Trigger: Usage of a circuit for client introduction

Message Scope: Circuit ID and hop number must match

CLIENT_INTRO_START:
  Upon sending INTRODUCE1:
    Enter CLIENT_INTRO_WAIT

CLIENT_INTRO_WAIT
  Receieve INTRODUCE_ACK:
    Accept
    Transition to CLIENT_INTRO_END

CLIENT_INTRO_END:
  No transitions possible
  - XXX: Enforce that no new handlers can be added? We may still have padding
    handlers though.

Service Introduce Handler

Purpose: Service-side onion service introduction circuits have this handler attached.

Trigger: Onion service establishing an introduction point circuit

Message Scope: Circuit ID and hop number must match

SERVICE_INTRO_START:
  Upon sending ESTABLISH_INTRO:
    Enter SERVICE_INTRO_ESTABLISH

SERVICE_INTRO_ESTABLISH:
  Receiving INTRO_ESTABLISHED:
    Enter SERVICE_INTRO_ESTABLISHED

SERVICE_INTRO_ESTABLISHED:
  Receiving INTRODUCE2
    Accept

Client Rendezvous Handler

Purpose: Circuits used by clients to build a rendezvous point have this handler attached.

Trigger: Client rendezvous initiation

Message Scope: Circuit ID and hop number must match

CLIENT_REND_START:
  Upon Sending RENDEZVOUS1:
    Enter CLIENT_REND_WAIT

CLIENT_REND_WAIT:
  Receive RENDEZVOUS2:
    Enter CLIENT_REND_ESTABLISHED

CLIENT_REND_ESTABLISHED:
  Remain in this state; launch TCP, UDP, or Conflux handlers for streams

Service Rendezvous Handler

Purpose: Circuits used by services to connect to a rendezvous point have this handler attached.

Trigger: Incoming introduce cell/service rend initiation

Message Scope: Circuit ID and hop number must match

SERVICE_REND_START:
  Upon sending ESTABLISH_RENDEZVOUS:
    Enter SERVICE_REND_WAIT

SERVICE_REND_WAIT:
  Receive RENDEZVOUS_ESTABLISHED:
    Enter SERVICE_REND_ESTABLISHED

SERVICE_REND_ESTABLISHED:
  Remain in this state; launch TCP, UDP, or Conflux handlers for streams

CircPad Handler

Purpose: Circuit-level padding is negotiated with a particular hop in the circuit; when it is negotiated, we need to allow padding cells from that hop.

Trigger: Negotiation of a circuit padding machine

Message Scope: Circuit ID and hop must match; padding machine must be active

PADDING_START:
  Upon sending PADDING_NEGOTIATE:
    Enter PADDING_NEGOTIATING

PADDING_NEGOTIATING:
  Receiving PADDING_NEGOTIATED:
    Enter PADDING_ACTIVE

PADDING_ACTIVE:
  Receiving DROP:
    Accept (if from correct hop)
    - XXX: We could perform more sophisticated rate limiting accounting here
      too?

Resolve Stream Handler

Purpose: This handler is created on circuits when a resolve happens.

Trigger: RESOLVE message

Message Scope: Circuit ID, stream ID, and hop number must all match

RESOLVE_START:
  Send a RESOLVE message:
    Enter RESOLVE_SENT

RESOLVE_SENT:
  Receive a RESOLVED or an END:
    Enter RESOLVE_START.

TCP Stream handler

Purpose: This handler is created when the client creates a new stream ID, using either BEGIN or BEGIN_DIR. This handler has several states to prevent unbounded XOFF+XON pairs (they are bounded to not occur more than once per SENDME, though this is somewhat arbitrarily chosen). It also has an END_SENT state, which handles half-closed stream data.

Trigger: New AP or DirConn stream

Message Scope: Circuit ID, stream ID, and hop number must all match; stream ID must be open or half-open (half-open is END_SENT).

TCP_STREAM_START:
  Send a BEGIN or BEGIN_DIR message:
    Enter BEGIN_SENT.

BEGIN_SENT:
  Receive an END:
    Enter TCP_STREAM_START.
  Receive a CONNECTED:
    Enter STREAM_OPEN.

STREAM_OPEN:
  Receive DATA:
    Verify length is > 0
    XXX: Handle [HSDIRINFLATION] here?
    Process.

  Receive XOFF:
    Enter STREAM_XOFF

  Send END:
    Enter END_SENT.

  Receive END:
    Enter TCP_STREAM_START

STREAM_XOFF:
  Receive DATA:
    Verify length is > 0
    XXX: Handle [HSDIRINFLATION] here?
    Process.
 
  Send END:
    Enter END_SENT.

  Receive XON:
    Enter STREAM_X_WAIT

  Receive END:
    Enter TCP_STREAM_START

STREAM_X_WAIT:
  Receive DATA:
    Verify length is > 0
    XXX: Handle [HSDIRINFLATION] here?
    Process.

  Receive SENDME:
    Enter STREAM_XON

  Send END:
    Enter END_SENT.

  Receive END:
    Enter TCP_STREAM_START

STREAM_XON:
  Receive DATA:
    Verify length is > 0
    XXX: Handle [HSDIRINFLATION] here?
    Process.

  Receive XOFF:
    Enter STREAM_XOFF

  Receive XON:
    Verify rate has changed
    Enter STREAM_X_WAIT

  Send END:
    Enter END_SENT.

  Receive END:
    Enter TCP_STREAM_START

END_SENT:
  Same as STREAM_OPEN, except do not actually deliver data.
  Only remain in this state for one RTT_max, or until END_ACK.

Conflux Handler

Purpose: Circuits that are a part of a conflux set have a conflux handler, associated with the last hop. This handler prevents excessive link control cells, and enforces that at least one DATA cell must occur between SWITCH cells.

Trigger: Creation of a conflux set

Message Scope: Circuit ID and hop number must match. This means that this handler is only for one leg of a conflux set; as new legs are added, they get their own handlers.

The SWITCH_WAIT state needs the relay command to have been validated by a different handler, in order to allow further switching. This handler may be bound to a different circuit and circuit ID than this leg's hander.

CONFLUX_START: (all conflux leg circuits start here)
  Upon sending CONFLUX_LINK:
     Enter CONFLUX_LINKING

CONFLUX_LINKING:
  Receiving CONFLUX_LINKED:
     Send CONFLUX_LINKED_ACK
     Enter CONFLUX_SWITCH_WAIT

CONFLUX_SWITCH_WAIT:
  Receiving validated DATA message (must be accepted by TCP stream handler):
     Enter CONFLUX_SWITCH_OK

CONFLUX_SWITCH_OK:
  Receiving CONFLUX_SWITCH:
     Enter CONFLUX_SWITCH_WAIT

UDP Stream Handler

Purpose: Circuits that are using prop#339

Trigger: UDP stream creation

Message Scope: Circuit ID, hop number, and stream-id must match

UDP_STREAM_START:
  If no other udp streams used on circuit:
    Send CONNECT_UDP for any stream, enter UDP_CONNECTING
  else:
    Immediately enter UDP_CONNECTING
    (CONNECTED_UDP MAY arrive without a CONNECT_UDP, after the first UDP
     stream on a circuit is established)

UDP_CONNECTING:
  Upon receipt of CONNECTED_UDP, enter UDP_CONNECTED

UDP_CONNECTED:
  Receive DATAGRAM:
    Verify length > 0
    Verify Prop#344 NAT rules are obeyed, including srcport and stream limits
    Process.

  Send END:
    Enter UDP_END_SENT

UDP_END_SENT:
  Same as UDP_CONNECTED, except do not actually deliver data.
  Only remain in this state for one RTT_max, or until END_ACK,
  then transition to UDP_STREAM_START.

HSDIR Inflation

XXX: This can be folded into the state machines and/or rend-spec.. The state machines should actually be able to handle this, once they are ready for it.

One of the most common questions about dropped cells is "what about data cells with a 1 byte payload?". As Prop#344 makes clear, this is not a dropped cell attack, but is instead an instance of an Active Traffic Manipulation Covert Channel, described in Section 1.3.2. The lower severity of active traffic manipulation is due to the fact that it cannot be used to deanonymize 100% of a target client's circuits, where as the combination of path bias and pre-usage dropped cells can.

However, there is one case where one can construct a potent attack from this Active Traffic Manipulation: by making use of onion service circuits being built on demand by an application. Further, because the onion service handshake is uniquely fingerprintable (see Section 1.2.1 of Prop#344), it is possible to use this vector in this specific case to encode an identifier in the timing and traffic patterns of the onion service descriptor download, similar to how the CMU attack operated, and use both the onion service fingerprint and descriptor traffic pattern to transmit the fact that a particular onion service was visited, to the Guard or possibly even a local network observer.

A normal hidden service descriptor occupies only ~10 cells (with a hard max of 30KB, or ~60 cells). This is not enough to reliably encode the full address of the onion service in a timing-based covert channel.

However, there are two ways to cause this descriptor download to transmit enough data to encode such a covert channel, and replicate the CMU attack using timing information of this data.

First, the actual descriptor payload can be spread across many DATA cells that are filled only partially with data (which does not happen if the HSDIR is honest and well-behaved, because it always has the full descriptor on hand).

Second, in C-tor, additional junk can be appended at the end of a onion service descriptor document that does not count against the 30KB maximum, which the client will happily download and then ignore.

Neither of these things are necessary to preserve, and neither can happen in normal operation. They can either be addressed directly by checks on HSDIR-based RELAY_COMMAND_DATA lengths and descriptor parsing, or by simply enforcing that circuits used to fetch service descriptors can only receive as many bytes as the maximum descriptor size, before being closed.

XXX: Consider RELAY_COMMAND_END_ACK also..

https://gitlab.torproject.org/tpo/core/torspec/-/issues/196

XXX: Tickets to grovel through for other stuff: https://gitlab.torproject.org/tpo/core/torspec/-/issues/38 https://gitlab.torproject.org/tpo/core/torspec/-/issues/39 https://gitlab.torproject.org/tpo/core/arti/-/issues/525

Command Allowlist enumeration

XXX: We are planning to remove this section after we finish the state machines; keeping it for reference until then for cross-checking.

Formerly, in C-Tor, we took the approach of performing a series of checks for each message command, ad-hoc. Here's those rules, for spot-checking that the above state machines cover them.

All relay commands are rejected by clients and serviced unless a rule says they are OK.

Here's a list of those rules, by relay command:

RELAY_COMMAND_DATA 2 - This command MUST only arrive for valid open or half-open stream ID - This command MUST have data length > 0 - On HSDIR circuits, ONLY ONE command is allowed to have a non-full payload (the last command). See Section 4.
RELAY_COMMAND_END 3
- This command MUST only arrive ONCE for each valid open or half-open stream ID
RELAY_COMMAND_CONNECTED 4
- This command MUST ONLY be accepted ONCE by clients if they sent a BEGIN or BEGIN_DIR
- The stream ID MUST match the stream ID from BEGIN (or BEGIN_DIR)
RELAY_COMMAND_DROP 10
- This command is accepted by clients from any hop that they have negotiated an active circuit padding machine with
RELAY_COMMAND_CONFLUX_LINKED 20
- Ensure that a LINK cell was sent to the hop that sent this
- Ensure that no previous LINKED cell has arrived on this circuit
RELAY_COMMAND_CONFLUX_SWITCH 22
- Ensure that conflux is enabled and linked
- If Prop#340 is in use, this cell MUST be packed with a valid multiplexed RELAY_COMMAND_DATA cell.
RELAY_COMMAND_INTRODUCE2 35
- Services MUST check:
  - The intro is for a valid service identity and auth
  - The command has a valid sub-credential
  - The command is not a replay (possibly not close circuit?)
RELAY_COMMAND_RENDEZVOUS2 37
- This command MUST ONLY arrive ONCE in response to a sent REND1 cell, on the appropriate circuit
- The ntor handshake must succeed with MAC validation
RELAY_COMMAND_INTRO_ESTABLISHED 38
- Services MUST check:
  - This cell MUST ONLY come ONCE in response to RELAY_COMMAND_ESTABLISH_INTRO, for the appropriate service identity
RELAY_COMMAND_RENDEZVOUS_ESTABLISHED 39
- This command MUST ONLY be accepted ONCE in response to RELAY_COMMAND_ESTABLISH_RENDEZVOUS
RELAY_COMMAND_INTRODUCE_ACK 40
- This command MUST ONLY be accepted ONCE by clients, in response to RELAY_COMMAND_INTRODUCE1
RELAY_COMMAND_PADDING_NEGOTIATED 42
- This command MUST ONLY be accepted by clients in response to PADDING_NEGOTIATE
RELAY_COMMAND_XOFF 43
- Ensure that congestion control is enabled and negotiated
- Ensure that the stream id is either opened or half-open
- Ensure that the stream id is in "XON" state
RELAY_COMMAND_XON 44
- Ensure that congestion control is enabled and negotiated
- Ensure that the stream id is either opened or half-open
- Enforce always packing this to a SENDME with Prop#340?
RELAY_COMMAND_CONNECTED_UDP
- The stream id in this command MUST match that from RELAY_COMMAND_CONNECT_UDP
- This command is only accepted once per UDP stream id
RELAY_COMMAND_DATAGRAM
- This command MUST only arrive for valid open or half-open stream ID
- This command MUST have data length > 0

References:

Tor design proposals