Notation and conventions

These conventions apply, at least in theory, to all of the specification documents unless stated otherwise.

Remember, our specification documents were once a collection of separate text files, written separately and edited over the course of years.

While we are trying (as of 2023) to edit them into consistency, you should be aware that these conventions are not now followed uniformly everywhere.

MUST, SHOULD, and so on

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.

Data lengths

Unless otherwise stated, all lengths are given as a number of 8-bit bytes.

All bytes are 8 bits long. We sometimes call them "octets"; the terms as used here are interchangeable.

When referring to longer lengths, we use SI binary prefixes (as in "kibibytes", "mebibytes", and so on) to refer unambiguously to increments of 1024X bytes.

If you encounter a reference to "kilobytes", "megabytes", or so on, you cannot safely infer whether the author intended a decimal (1000N) or binary (1024N) interpretation. In these cases, it is better to revise the specifications.

Integer encoding

Unless otherwise stated, all multi-byte integers are encoded in big-endian ("network") order.

For example, 4660 (0x1234), when encoded as a two-byte integer, is the byte 0x12 followed by the byte 0x34. ([12 34])

When encoded as a four-byte integer, it is the byte 0x00, the byte 0x00, the byte 0x12, and the byte 0x34. ([00 00 12 34]).

Binary-as-text encodings

When we refer to "base64", "base32", or "base16", we mean the encodings described in RFC 4648, with the following notes:

  • In base32, we never insert linefeeds in base32, and we omit trailing = padding characters.
  • In base64, we sometimes omit trailing = padding characters, and we do not insert linefeeds unless explicitly noted.
  • We do not insert any other whitespace, except as specifically noted.

Base 16 and base 32 are case-insensitive. Unless otherwise stated, implementations should accept any cases, and should produce a single uniform case.

We sometimes refer to base16 as "hex" or "hexadecimal".

Note that as of 2023, in some places, the specs are not always explicit about:

  • which base64 strings are multiline
  • which base32 strings and base16 strings should be generated in what case.

This is something we should correct.

Notation

Operations on byte strings

  • A | B represents the concatenation of two binary strings A and B.

Binary literals

When we write a series of one-byte hexadecimal literals in square brackets, it represents a multi-byte binary string.

For example, [6f 6e 69 6f 6e 20 72 6f 75 74 69 6e 67] is a 13-byte sequence representing the unterminated ASCII string, onion routing.