netdoc document meta-format
Server descriptors, directories, and running-routers documents all obey the following lightweight extensible information format, known as netdoc format.
netdoc syntax
A netdoc is a text file, with Unix line endings.
The highest level object is a Document, which consists of one or more Items. Every Item begins with a KeywordLine, followed by zero or more Objects. A KeywordLine begins with a Keyword, optionally followed by whitespace and more non-newline characters, and ends with a newline. A Keyword is a sequence of one or more characters in the set [A-Za-z0-9-], but may not start with -. An Object is a block of encoded data in pseudo-Privacy-Enhanced-Mail (PEM) style format: that is, lines of encoded data MAY be wrapped by inserting an ascii linefeed ("LF", also called newline, or "NL" here) character (cf. RFC 4648 ยง3.1). When line wrapping, implementations MUST wrap lines at 64 characters. Upon decoding, implementations MUST ignore and discard all linefeed characters.
A netdoc consists of unicode code points, and MUST be encoded as UTF-8 without a BOM prefix. Implementations SHOULD reject netdocs that are not UTF-8, or which contain the NUL character.
Future directions:
(Note that we may impose additional restrictions on the set of allowable Unicode characters in future, to restrict control characters and other oddities.)
Conformance:
Arti currently rejects all non-UTF-8 documents.
C Tor directory authorities (as of 0.4.8.x) reject non-UTF-8 and UTF-with-BOMs when receiving router descriptors; C tor accepts arbitrary non-NUL byte sequences otherwise.
More formally:
NL = The ascii LF character (hex value 0x0a).
Document ::= (Item | NL)+
Item ::= KeywordLine Object?
KeywordLine ::= (Opt WS)? ItemKeyword (WS Arguments)? NL
ItemKeyword = Keyword
Arguments ::= Any sequence of unicode characters encoded in UTF-8, excluding NL and NUL.
WS = (SP | TAB)+
Object ::= BeginLine Base64-encoded-data EndLine
BeginLine ::= "-----BEGIN " Keyword (" " Keyword)*"-----" NL
EndLine ::= "-----END " Keyword (" " Keyword)* "-----" NL
Keyword = KeywordStart KeywordChar*
KeywordStart ::= 'A' ... 'Z' | 'a' ... 'z' | '0' ... '9'
KeywordChar ::= KeywordStart | '-'
Opt ::= "opt"
The documentation for each ItemKeyword must specify its expected Arguments and Objects. Unless otherwise stated, a KeywordLine contains a sequence of space/tab-separated arguments:
Arguments ::= Argument (WS Arguments)?
Argument := ArgumentChar+
ArgumentChar ::= Any unicode characters encoded in UTF-8, excluding NL, NUL, and SP.
A ItemKeyword may not be opt
.
Implementations MUST NOT generate "Opt" in a keyword line, though they SHOULD accept it.
Conformance:
Some implementations do not accept Opt on all items. Notably, C Tor will reject many netdocs if they use "Opt" on an KeywordLine used to indicate the start or end of a section, or an a KeywordLine containing a signature.
Before Tor 0.1.2.5-alpha, Opt was used to indicate that if a parser did not recognize an ItemKeyword, it should ignore it. Now all unrecognized ItemKeywords are treated that way.
In Tor 0.2.0.5-alpha through 0.2.4.1-alpha we stopped generating Opt. No currently supported Tor release generates it.
The BeginLine and EndLine of an Object must use the same keyword.
Compatibility and extensibility
When interpreting a Document, software MUST ignore any KeywordLine that starts with a keyword it doesn't recognize; future implementations MUST NOT require current clients to understand any KeywordLine not currently described.
Other implementations that want to extend Tor's directory format MAY introduce their own items. The keywords for extension items SHOULD start with the characters "x-" or "X-", to guarantee that they will not conflict with keywords used by future versions of Tor.
Permit additional arguments
For forward compatibility, each item MUST allow extra arguments at the end of the line unless otherwise noted. So, for example, if an item's description is given as:
thing
int int int ..
then implementations SHOULD accept this string as well:
thing 5 9 11 13 16 12
NL
but not this string:
thing 5
NL
Typically the text would state that the int
arguments are integers,
so the implementation should also reject this string:
thing 5 10 thing
NL
Whenever an item DOES NOT allow extra arguments, we will tag it with "No extra arguments" in the syntax bullet points. (If the .. has been omitted, but there is no "no extra arguments" statement, the omission of the .. is a spec mistake and extra arguments are allowed.)
netdoc structure
Each type of netdoc requires, and permits, certain ItemKeywords, with certain restrictions on their order. In some cases ItemKeywords can introduce sections, providing structure to the document; this will be stated in the description for that ItemKeyword in that type of document.
netdoc format description conventions
NB these conventions are not yet followed everywhere in the Tor Specifications.
When presenting a specific document format, the Items forming the document are shown one per subsection.
The syntax of each item is defined in detail with a bulleted list at the start of the section.
The first bullet point shows the syntax of the line introducing the item.
Literal parts (including the Item Keyword) are shown in fixed width
.
Arguments are shown with italic emphasis.
The spaces between arguments, and the final newline, are not depicted.
If (as is usual) extra arguments are to be tolerated (for future expansion),
a short ellipsis .. is shown as a reminder.
Optional arguments are shown in [ ].
When an Item has (or may have) an Object, that is shown as the 2nd line in the bullet list, in the form:
- something, Object,
OBJECT KEYWORD
where something will be used to refer to the Object in the text, andOBJECT KEYWORD
is the Object's Keyword in the base64 delimiters. (The----BEGIN
etc. are not depicted.)
Further bullet points give further information about the syntax - often, in terms defined more fully here.
The type (therefore, format) of arguments, and permissible values, are stated in the text. The argument is named in bold-italic in its principal description.
Position and multiplicity
The syntax bullet points for an Item state its permissible multiplicity and position, within each Document of its particular document type, in the following terms:
-
"At start, exactly once" --- MUST occur exactly once, and MUST be the first item.
-
"Exactly once" --- MUST occur exactly once.
-
"At end, exactly once" --- MUST occur exactly once, and MUST be the last item.
-
"At most once" --- MAY occur zero or one times but MUST NOT occur more than once.
-
"Any number" --- MAY occur zero, one, or more times.
-
"Once or more" --- MUST occur at least once and MAY occur more than once.
Rest-of-line arguments
Exceptionally, for some items there is a "rest of line" argument. This is denoted by writing ARGUMENT.... in the syntax summary, in the first bullet point, and stating
- ARGUMENT is the whole rest of the line,
in the syntax description.
In this case, the value of the argument is all the characters after the SP following the keyword or previous argument.
Signing documents
Every signable document below is signed in a similar manner, using a given "Initial Item", a final "Signature Item", a digest algorithm, and a signing key.
The Initial Item must be the first item in the document.
The Signature Item has the following format:
<signature item keyword> [arguments] NL SIGNATURE NL
The "SIGNATURE" Object contains a signature (using the signing key) of the PKCS#1 1.5 padded digest of the entire document, taken from the beginning of the Initial item, through the newline after the Signature Item's keyword and its arguments.
The signature does not include the algorithmIdentifier specified in PKCS #1.
Unless specified otherwise, the digest algorithm is SHA-1.
All documents are invalid unless signed with the correct signing key.
The "Digest" of a document, unless stated otherwise, is its digest as signed by this signature scheme.