Filename: 321-happy-families.md
Title: Better performance and usability for the MyFamily option (v2)
Author: Nick Mathewson
Created: 27 May 2020
Status: Accepted
Problem statement.
The current family mechanism allows well-behaved relays to identify that they all belong to the same 'family', and should not be used in the same circuits.
Right now, families work by having every family member list every other family member in its server descriptor. This winds up using O(n^2) space in microdescriptors and server descriptors. (For RAM, we can de-duplicate families which sometimes helps.) Adding or removing a server from the family requires all the other servers to change their torrc settings.
This growth in size is not just a theoretical problem. Family declarations currently make up a little over 55% of the microdescriptors in the directory--around 24% after compression. The largest family has around 270 members. With Walking Onions, 270 members times a 160-bit hashed identifier leads to over 5 kilobytes per SNIP, which is much greater than we'd want to use.
This is an updated version of proposal 242. It differs by clarifying requirements and providing a more detailed migration plan.
Update: As of Feb 2025, families make up over 80% of the microdescriptors, even after compression.
Design overview.
In this design, every family has a master ed25519 "family key",
which we'll call KS_familyid_ed
.
A node
is in a family of this kind iff its server descriptor includes a certificate of its
ed25519 identity key with the family key. The certificate
format is the one in the tor-certs.txt spec; we would allocate a new
certificate type for this usage. These certificates would need to
include the signing key in the appropriate extension.
Note that because server descriptors are signed with the node's ed25519 signing key, this creates a bidirectional relationship between the two keys, so that nodes can't be put in families without their consent.
Changes to router descriptors
We add a new entry to server descriptors:
"family-cert" NL
"-----BEGIN FAMILY CERT-----" NL
cert
"-----END FAMILY CERT-----".
This entry contains a base64-encoded certificate as described above. It may appear any number of times; authorities MAY reject descriptors that include it more than three times.
Parties MUST reject any descriptor containing a family-cert entry if any of the following are true:
- the certificate is of the wrong type
- the certificate is invalid
- the certificate is expired
- the certified key in the certificate does not match the relay's ed25519 identity key
Changes to microdescriptors
We add a new entry to microdescriptors: family-ids
.
This line contains one or more space-separated strings identifying the families to which the node belongs. These strings are called "family IDs." These strings MUST be sorted in lexicographic order.
When a family ID is derived from a KP_familyid_ed
, it
is constructed by encoding it in base64,
and then adding the prefix "ed25519:".
Clients SHOULD accept unrecognized key formats.
Changes to voting algorithm
We allocate a new consensus method number for voting on these keys.
When generating microdescriptors using a suitable consensus method, the authorities include a "family-ids" line if the underlying server descriptor contains any valid family-cert lines. For each valid family-cert in the server descriptor, they add a base-64-encoded string of that family-cert's signing key.
See also "deriving family lists from family-ids?" below for an interesting but more difficult extension mechanism that I would not recommend.
Relay configuration
There are several ways that we could configure relays to let them include family certificates in their descriptors.
The easiest would be putting the private family key on each relay, so that the relays could generate their own certificates. This is easy to configure, but slightly risky: if the private key is compromised on any relay, anybody can claim membership in the family. That isn't so very bad, however -- all the relays would need to do in this event would be to move to a new private family key.
A more orthodox method would be to keep the private key somewhere offline, and use it to generate a certificate for each relay in the family as needed. These certificates should be made with long-enough lifetimes, and relays should warn when they are going to expire soon.
We currently plan to implement the first method; we may implement the second if there is demand.
Changes to relay behavior
It is possible we will not implement this section, but instead ask that relay operators continue maintaining both their legacy family lines and their new family certs for some migration period.
Each relay should track which other relays they have seen using the same family key as itself. When generating a router descriptor, each relay should list all of these relays on the legacy 'family' line. This keeps the "family" lines up-to-date with family keys for compliant relays.
Relays should continue listing relays in their family lines if they have seen a relay with that identity using the same family key at any time in the last 7 days.
The presence of this line should be configured by a network
parameter, derive-family-list
.
Relays whose family lines do not stay at least mostly in sync with their family keys should be marked invalid by the authorities.
Client behavior
Clients should treat node A and node B as belonging to the same family if ANY of these is true:
-
The client has descriptors for A and B, and A's descriptor lists B in its family line, and B's descriptor lists A in its family line.
-
Client A has descriptors for A and B, and they contain matching entries in their
family-ids
orfamily-cert
. (Note that a family-cert key may match a base64-encoded entry in the family-keys entry.)
Migration
For some time, existing relays and clients will not support family certificates. Because of this, we try to make sure above the well-behaved relays will list the same entries in both places.
Here is a rough timeline of a deployment plan:
- Stage 1: Deploy capabilities.
- Implement all specified behavior in Arti and C Tor clients.
- Implement all specified behavior in relays.
- Implement a consensus method
to publish
family-ids
in microdescriptors instead offamily
lines. - Before a major release, test the above in chutney.
- Stage 2: Wait for upgrade.
- Encourage relay operators to begin using family-certs, possibly with a warning if a family is configured but family-certs are not.
- Deploy the new consensus method from above.
- Wait for enough clients and relays to upgrade, and for enough relays to migrate to family-certs. (See "How long to wait" below.)
- At some point, dis-recommend all client and relay versions without family-certs support; recommend the subprotocol version indicating support for this protocol.
- Stage 3:
- Optionally, configure the authorities
to reject any relay versions without family-certs support if they list a
family
. - Optionally, block any relays that still configure a
family
entry but which do not has a family cert. - Disable
use-family-lists
,publish-family-list
, andderive-family-list
(if implemented) in the consensus parameters.
- Optionally, configure the authorities
to reject any relay versions without family-certs support if they list a
How long to wait
We will need to reach a decision about
how long to wait for clients and relays to upgrade
before we start "Stage 3" above,
and change microdescriptors to include family-ids
.
Reasons for a faster timeline are:
- We derive no actual benefit from this change until
we stop publishing legacy
family
lines. - The longer we spend in stage 2, the longer relay operators will need to maintain both kinds of family configuration.
Reasons for a slower timeline are:
- Any clients that don't upgrade will get reduced security when we do Stage 3.
- Any relays that don't upgrade will, if they are in families, need to be rejected until they upgrade.
Security
Listing families remains as voluntary in this design as in today's Tor, though bad-relay hunters can continue to look for families that have not adopted a family key.
A hostile relay family could list a "family" line that did not match its "family-certs" values. However, the only reason to do so would be in order to launch a client partitioning attack, which is probably less valuable than the kinds of attacks that they could run by simply not listing families at all.
A note on bridges
Note that with this design, we get bridge families "for free": if a bridge lists a family-cert in its router descriptor, clients will treat it as belonging to a family with any relay or bridge that lists the same family-cert.
Note however that supporting bridge families would require some refinement to our path building strategy: see https://gitlab.torproject.org/tpo/core/tor/-/issues/40935#note_3090023.
Appendix: deriving family lists from family-ids?
Note: We do not currently plan to implement this.
As an alternative, we might declare that authorities should keep family lines in sync with family-certs. Here is a design sketch of how we might do that, but I don't think it's actually a good idea, since it would require major changes to the data flow of the voting system.
In this design, authorties would include a "family-ids" line in each router section in their votes corresponding to a relay with any family-cert. When generating final microdescriptors using this method, the authorities would use these lines to add entries to the microdescriptors' family lines:
-
For every relay appearing in a routerstatus's family-ids, the relays calculate a consensus family-ids value by listing including all those keys that are listed by a majority of those voters listing the same router with the same descriptor. (This is the algorithm we use for voting on other values derived from the descriptor.)
-
The authorities then compute a set of "expanded families": one for each family key. Each "expanded family" is a set containing every router in the consensus associated with that key in its consensus family-ids value.
-
The authorities discard all "expanded families" of size 1 or smaller.
-
Every router listed for the "expanded family" has every other router added to the "family" line in its microdescriptor. (The "family" line is then re-canonicalized according to the rules of proposal 298 to remove its )
-
Note that the final microdescriptor consensus will include the digest of the derived microdescriptor in step 4, rather than the digest of the microdescriptor listed in the original votes. (This calculation is deterministic.)
The problem with this approach is that authorities would have to fetch microdescriptors they do not have in order to replace their family lines. Currently, voting never requires an authority to fetch a microdescriptor from another authority. If we implement vote compression and diffs as in the Walking Onions proposal, however, we might suppose that votes could include microdescriptors directly.
Still, this is likely more complexity than we want for a transition mechanism.
Appendix: Deriving family-ids from families??
Note: We do not currently plan to implement this.
We might also imagine that authorities could infer which families exist from the graph of family relationships, and then include synthetic "family-ids" entries for routers that belong to the same family.
This has two challenges: first, to compute these synthetic family keys, the authorities would need to have the same graph of family relationships to begin with, which once again would require them to include the complete list of families in their votes.
Secondly, finding all the families is equivalent to finding all maximal cliques in a graph. This problem is NP-hard in its general case. Although polynomial solutions exist for nice well-behaved graphs, we'd still need to worry about hostile relays including strange family relationships in order to drive the algorithm into its exponential cases.
Appendix: How to rotate family keys
If a relay operator needs to change a family key (as they would do, for example, if a family key were compromised) the process is this:
- Create a new family key, and use it (along with the old key) on every relay in the family.
- Wait a few days.
- Remove the old family key.
Appendix: New assigned values
We need a new assigned value for the certificate type used for family signing keys. (Reserved: 0x0C)
We need a new consensus method for placing family-ids
lines in
microdescriptors. (Reserved: 35)
We need a new subprotocol version to indicate receptive support for family certs and family ids. (Reserved: Desc=5)
Appendix: New network parameters
-
derive-family-list
: If 1, relays should derive family lines from observed family keys. If 0, they do not. Min: 0, Max: 1. Default: 1. -
publish-family-list
: If 1, relays should include any configured or derived family list in their published descriptors. If 0, they should not. Min: 0, Max: 1. Default: 1. -
use-family-ids
: If 1, clients should consider family IDs (including those from microdescriptors'family-ids
and those from router descriptors'family-cert
entries) when building paths. If 0, they should not. Min: 0, Max: 1. Default: 1. -
use-family-lists
: If 1, clients should consider legacy family lists (includingfamily
entries in router descriptors and microdescriptors) when building paths. If 0, they should not. Min: 0, Max: 1. Default: 1.
Appendix: Terminology
-
"Family key" -- an ed25519 key used to prove membership in a relay family (
KS_familyid_ed
) -
"Family cert" -- an ed25519 cert proving one relay's membership in a relay family. Signs
KP_relayid_ed
withKS_familyid_ed
. -
"Family list" -- a legacy list of family members, as it appears in the
family
entry of a microdesc or routerdesc. -
"Family ID" -- a short string identifying a family, appearing in a microdescriptor. Derived from family keys by the authorities.