Filename: 356-desc-parsing-variance.md
Title: Increasing netdoc strictness not considered (very) harmful
Author: Nick Mathewson
Created: 13 March 2025
Status: Informational

Introduction

Here we discuss the issue of document parsing distinguishability, its use in version/implementation fingerprinting, and the ensuing implications for protocol evolution.

We conclude that in general it is not worthwhile to attempt to prevent this kind of fingerprinting, but that some mitigations are worthwhile.

The issue: differences in allowable documents

As Tor's document formats have changed over the years, it has often been the case that a network document that was accepted by older versions of Tor would become rejected by newer ones.

This happens more often than you might think: every time we add a new field to these documents, if there are any restrictions on the value of that field, then the versions that recognize the new field will reject invalid values, whereas older versions of Tor will ignore them.

Because of this, it becomes possible that an attacker could use these differences to learn which version of Tor a particular client is using, and perhaps to use that information for further attacks.

(Note that in this proposal we are only considering the documents in the netdoc metaformat. we are not considering other differences across versions, including handling of messages, cells, or other protocol elements.)

The attacks: learning client versions

Of the various network document types, only two of them provide meaningful opportunities for distinguishing client versions: bridge descriptors, and hidden service descriptors.

With a bridge descriptor, a bridge can try serving a target client several different versions of its descriptor, compatible with different Tor versions, until it has pinpointed the client's possible range of versions. (On the other hand, a bridge can partition clients far more uniquely by their IP addresses.)

With hidden service descriptors, a hidden service can upload different versions to different HSDirs, including different introduction points in each one, and then see which clients successfully connect using which introduction points.

The attacks: exploiting client versions

Once an attacker has a way to separate clients by version, what can they do then?

(Actually, the attacker only learns to a range of possible versions: we don't change our parsing rules with every release.)

One possiblity is learning which other attacks will work: if some versions of Tor are known to have some vulnerabilities, the attacker might attack those versions knowing that they will succeed. (But if a version of Tor is vulnerable, we should already be encouraging users to upgrade!)

Another possibility is partitioning clients into smaller sets, to aid with other kinds of traffic analysis. If some versions are extremely rare, that might help fingprint those users directly. (But in general we should not have very rare versions, or have many supported versions of our document formats at a given time.)

Mitigations

Most of all, we can continue to encourage users to upgrade to our latest stable versions, and we can drop support for older versions as they become obsolete.

We can also, when it's convenient, batch multiple parsing changes into a single release to avoid proliferation of multiple behaviors.

Additionally, we can ensure that the version detection attacks discussed above do not happen silently, by ensuring that clients report the origin of any unparseable bridge descriptors or hidden service descriptors. (There is currently an arti ticket about this.)

Alternatives

To solve these attacks completely, we could (but will not) take several approaches.

First, we could minimize what counts as an unparseable document: for example, whenever we add a new field, we could treat violations of its format as equivalent to its absence.

Alternatively, when we change a network document format, we could add an enforcement flag as a consensus parameter: only when the flag was present would implementations treat violations of the new format as an error.

Both of these approaches, however, are somewhat cumbersome, somewhat error-prone, and would tend to make it harder to evolve our protocol over time.

Conclusion

Because of the limited impact of the attacks described above, and of the reasonably good chances of mitigation, we have decided not to pursue the alternatives discussed above. Rather, it seems that it's okay to simply fix our document format when it's broken,

(For more details, we link to a discussion on gitlab.)