321-happy-families - Tor design proposals

Filename: 321-happy-families.md
Title: Better performance and usability for the MyFamily option (v2)
Author: Nick Mathewson
Created: 27 May 2020
Status: Closed
Implemented-In: Tor 0.4.9.1-alpha, Arti 1.4.1

Problem statement.

The current family mechanism allows well-behaved relays to identify that they all belong to the same 'family', and should not be used in the same circuits.

Right now, families work by having every family member list every other family member in its server descriptor. This winds up using O(n^2) space in microdescriptors and server descriptors. (For RAM, we can de-duplicate families which sometimes helps.) Adding or removing a server from the family requires all the other servers to change their torrc settings.

This growth in size is not just a theoretical problem. Family declarations currently make up a little over 55% of the microdescriptors in the directory--around 24% after compression. The largest family has around 270 members. With Walking Onions, 270 members times a 160-bit hashed identifier leads to over 5 kilobytes per SNIP, which is much greater than we'd want to use.

This is an updated version of proposal 242. It differs by clarifying requirements and providing a more detailed migration plan.

Update: As of Feb 2025, families make up over 80% of the microdescriptors, even after compression.

Design overview.

In this design, every family has a master ed25519 "family key", which we'll call KS_familyid_ed. A node is in a family of this kind iff its server descriptor includes a certificate of its ed25519 identity key with the family key. The certificate format is the one in the tor-certs.txt spec; we would allocate a new certificate type for this usage. These certificates would need to include the signing key in the appropriate extension.

Note that because server descriptors are signed with the node's ed25519 signing key, this creates a bidirectional relationship between the two keys, so that nodes can't be put in families without their consent.

Changes to router descriptors

We add a new entry to server descriptors:

"family-cert" NL
"-----BEGIN FAMILY CERT-----" NL
cert
"-----END FAMILY CERT-----".

This entry contains a base64-encoded certificate as described above. It may appear any number of times; authorities MAY reject descriptors that include it more than three times.

Parties MUST reject any descriptor containing a family-cert entry if any of the following are true:

the certificate is of the wrong type
the certificate is invalid
the certificate is expired
the certified key in the certificate does not match the relay's ed25519 identity key

Changes to microdescriptors

We add a new entry to microdescriptors: family-ids.

This line contains one or more space-separated strings identifying the families to which the node belongs. These strings are called "family IDs." These strings MUST be sorted in lexicographic order.

When a family ID is derived from a KP_familyid_ed, it is constructed by encoding it in base64, and then adding the prefix "ed25519:".

Clients SHOULD accept unrecognized key formats.

Changes to voting algorithm

We allocate a new consensus method number for voting on these keys.

When generating microdescriptors using a suitable consensus method, the authorities include a "family-ids" line if the underlying server descriptor contains any valid family-cert lines. For each valid family-cert in the server descriptor, they add a base-64-encoded string of that family-cert's signing key.

See also "deriving family lists from family-ids?" below for an interesting but more difficult extension mechanism that I would not recommend.

Relay configuration

There are several ways that we could configure relays to let them include family certificates in their descriptors.

The easiest would be putting the private family key on each relay, so that the relays could generate their own certificates. This is easy to configure, but slightly risky: if the private key is compromised on any relay, anybody can claim membership in the family. That isn't so very bad, however -- all the relays would need to do in this event would be to move to a new private family key.

A more orthodox method would be to keep the private key somewhere offline, and use it to generate a certificate for each relay in the family as needed. These certificates should be made with long-enough lifetimes, and relays should warn when they are going to expire soon.

We currently plan to implement the first method; we may implement the second if there is demand.

Changes to relay behavior

It is possible we will not implement this section, but instead ask that relay operators continue maintaining both their legacy family lines and their new family certs for some migration period.

Each relay should track which other relays they have seen using the same family key as itself. When generating a router descriptor, each relay should list all of these relays on the legacy 'family' line. This keeps the "family" lines up-to-date with family keys for compliant relays.

Relays should continue listing relays in their family lines if they have seen a relay with that identity using the same family key at any time in the last 7 days.

The presence of this line should be configured by a network parameter, derive-family-list.

Relays whose family lines do not stay at least mostly in sync with their family keys should be marked invalid by the authorities.

Client behavior

Clients should treat node A and node B as belonging to the same family if ANY of these is true:

The client has descriptors for A and B, and A's descriptor lists B in its family line, and B's descriptor lists A in its family line.
Client A has descriptors for A and B, and they contain matching entries in their family-ids or family-cert. (Note that a family-cert key may match a base64-encoded entry in the family-keys entry.)

Migration

For some time, existing relays and clients will not support family certificates. Because of this, we try to make sure above the well-behaved relays will list the same entries in both places.

Here is a rough timeline of a deployment plan:

Stage 1: Deploy capabilities.
1. Implement all specified behavior in Arti and C Tor clients.
2. Implement all specified behavior in relays.
3. Implement a consensus method to publish family-ids in microdescriptors instead of family lines.
4. Before a major release, test the above in chutney.
Stage 2: Wait for upgrade.
1. Encourage relay operators to begin using family-certs, possibly with a warning if a family is configured but family-certs are not.
2. Deploy the new consensus method from above.
3. Wait for enough clients and relays to upgrade, and for enough relays to migrate to family-certs. (See "How long to wait" below.)
4. At some point, dis-recommend all client and relay versions without family-certs support; recommend the subprotocol version indicating support for this protocol.
Stage 3:
1. Optionally, configure the authorities to reject any relay versions without family-certs support if they list a family.
2. Optionally, block any relays that still configure a family entry but which do not has a family cert.
3. Disable use-family-lists, publish-family-list, and derive-family-list (if implemented) in the consensus parameters.

How long to wait

We will need to reach a decision about how long to wait for clients and relays to upgrade before we start "Stage 3" above, and change microdescriptors to include family-ids.

Reasons for a faster timeline are:

We derive no actual benefit from this change until we stop publishing legacy family lines.
The longer we spend in stage 2, the longer relay operators will need to maintain both kinds of family configuration.

Reasons for a slower timeline are:

Any clients that don't upgrade will get reduced security when we do Stage 3.
Any relays that don't upgrade will, if they are in families, need to be rejected until they upgrade.

Security

Listing families remains as voluntary in this design as in today's Tor, though bad-relay hunters can continue to look for families that have not adopted a family key.

A hostile relay family could list a "family" line that did not match its "family-certs" values. However, the only reason to do so would be in order to launch a client partitioning attack, which is probably less valuable than the kinds of attacks that they could run by simply not listing families at all.

A note on bridges

Note that with this design, we get bridge families "for free": if a bridge lists a family-cert in its router descriptor, clients will treat it as belonging to a family with any relay or bridge that lists the same family-cert.

Note however that supporting bridge families would require some refinement to our path building strategy: see https://gitlab.torproject.org/tpo/core/tor/-/issues/40935#note_3090023.

Appendix: deriving family lists from family-ids?

Note: We do not currently plan to implement this.

As an alternative, we might declare that authorities should keep family lines in sync with family-certs. Here is a design sketch of how we might do that, but I don't think it's actually a good idea, since it would require major changes to the data flow of the voting system.

In this design, authorities would include a "family-ids" line in each router section in their votes corresponding to a relay with any family-cert. When generating final microdescriptors using this method, the authorities would use these lines to add entries to the microdescriptors' family lines:

For every relay appearing in a routerstatus's family-ids, the relays calculate a consensus family-ids value by listing including all those keys that are listed by a majority of those voters listing the same router with the same descriptor. (This is the algorithm we use for voting on other values derived from the descriptor.)
The authorities then compute a set of "expanded families": one for each family key. Each "expanded family" is a set containing every router in the consensus associated with that key in its consensus family-ids value.
The authorities discard all "expanded families" of size 1 or smaller.
Every router listed for the "expanded family" has every other router added to the "family" line in its microdescriptor. (The "family" line is then re-canonicalized according to the rules of proposal 298 to remove its )
Note that the final microdescriptor consensus will include the digest of the derived microdescriptor in step 4, rather than the digest of the microdescriptor listed in the original votes. (This calculation is deterministic.)

The problem with this approach is that authorities would have to fetch microdescriptors they do not have in order to replace their family lines. Currently, voting never requires an authority to fetch a microdescriptor from another authority. If we implement vote compression and diffs as in the Walking Onions proposal, however, we might suppose that votes could include microdescriptors directly.

Still, this is likely more complexity than we want for a transition mechanism.

Appendix: Deriving family-ids from families??

Note: We do not currently plan to implement this.

We might also imagine that authorities could infer which families exist from the graph of family relationships, and then include synthetic "family-ids" entries for routers that belong to the same family.

This has two challenges: first, to compute these synthetic family keys, the authorities would need to have the same graph of family relationships to begin with, which once again would require them to include the complete list of families in their votes.

Secondly, finding all the families is equivalent to finding all maximal cliques in a graph. This problem is NP-hard in its general case. Although polynomial solutions exist for nice well-behaved graphs, we'd still need to worry about hostile relays including strange family relationships in order to drive the algorithm into its exponential cases.

Appendix: How to rotate family keys

If a relay operator needs to change a family key (as they would do, for example, if a family key were compromised) the process is this:

Create a new family key, and use it (along with the old key) on every relay in the family.
Wait a few days.
Remove the old family key.

Appendix: New assigned values

We need a new assigned value for the certificate type used for family signing keys. (Reserved: 0x0C)

We need a new consensus method for placing family-ids lines in microdescriptors. (Reserved: 35)

We need a new subprotocol version to indicate receptive support for family certs and family ids. (Reserved: Desc=4)

Appendix: New network parameters

derive-family-list: If 1, relays should derive family lines from observed family keys. If 0, they do not. Min: 0, Max: 1. Default: 1.
publish-family-list: If 1, relays should include any configured or derived family list in their published descriptors. If 0, they should not. Min: 0, Max: 1. Default: 1.
use-family-ids: If 1, clients should consider family IDs (including those from microdescriptors' family-ids and those from router descriptors' family-cert entries) when building paths. If 0, they should not. Min: 0, Max: 1. Default: 1.
use-family-lists: If 1, clients should consider legacy family lists (including family entries in router descriptors and microdescriptors) when building paths. If 0, they should not. Min: 0, Max: 1. Default: 1.

Appendix: Terminology

"Family key" -- an ed25519 key used to prove membership in a relay family (KS_familyid_ed)
"Family cert" -- an ed25519 cert proving one relay's membership in a relay family. Signs KP_relayid_ed with KS_familyid_ed.
"Family list" -- a legacy list of family members, as it appears in the family entry of a microdesc or routerdesc.
"Family ID" -- a short string identifying a family, appearing in a microdescriptor. Derived from family keys by the authorities.