Onion service proof-of-work: Scheme v1, Equi-X and Blake2b

Implementations

For our v1 proof-of-work function we use the Equi-X asymmetric client puzzle algorithm by tevador. The concept and the C implementation were developed specifically for our use case by tevador, based on a survey of existing work and an analysis of Tor's requirements.

Equi-X is an asymmetric PoW function based on Equihash<60,3>, using HashX as the underlying layer. It features lightning fast verification speed, and also aims to minimize the asymmetry between CPU and GPU. Furthermore, it's designed for this particular use-case and hence cryptocurrency miners are not incentivized to make optimized ASICs for it.

At this point there is no formal specification for Equi-X or the underlying HashX function. We have two actively maintained implementations of both components, which we subject to automated cross-compatibility and fuzz testing:

A fork of tevador's implementation is maintained within the C tor repository.

This is the src/ext/equix subdirectory. Currently this contains important fixes for security, portability, and testability which have not been merged upstream! This implementation is released under the LGPL license. When tor is built with the required --enable-gpl option this code will be statically linked.
As part of Arti, a new Rust re-implementation was written based loosely on tevador's original.

This is the equix crate. This implementation currently has somewhat lower verification performance than the original but otherwise offers equivalent features.

Algorithm overview

The overall scheme consists of several layers that provide different pieces of this functionality:

At the lowest layers, Blake2b and siphash are used as hashing and PRNG algorithms that are well suited to common 64-bit CPUs.
A custom hash function family, HashX, randomizes its implementation for each new seed value. These functions are tuned to utilize the pipelined integer performance on a modern 64-bit CPU. This layer provides the strongest ASIC resistance, since a hardware reimplementation would need to include a CPU-like pipelined execution unit to keep up.
The Equi-X layer itself builds on HashX and adds an algorithmic puzzle that's designed to be strongly asymmetric and to require RAM to solve efficiently.
The PoW protocol itself builds on this Equi-X function with a particular construction of the challenge input and particular constraints on the allowed Blake2b hash of the solution. This layer provides a linearly adjustable effort that we can verify.
At this point, all further layers are part of the common protocol. Above the level of individual PoW handshakes, the client and service form a closed-loop system that adjusts the effort of future handshakes.

Equi-X itself provides two functions that will be used in this proposal:

equix_solve(challenge) which solves a puzzle instance, returning a variable number of solutions per invocation depending on the specific challenge value.
equix_verify(challenge, solution) which verifies a puzzle solution quickly. Verification still depends on executing the HashX function, but far fewer times than when searching for a solution.

For the purposes of this proposal, all cryptographic algorithms are assumed to produce and consume byte strings, even if internally they operate on some other data type like 64-bit words. This is conventionally little endian order for Blake2b, which contrasts with Tor's typical use of big endian. HashX itself is configured with an 8-byte output but its input is a single 64-bit word of undefined byte order, of which only the low 16 bits are used by Equi-X in its solution output. We treat Equi-X solution arrays as byte arrays using their packed little endian 16-bit representation.

Linear effort adjustment

The underlying Equi-X puzzle has an approximately fixed computational cost. Adjustable effort comes from the construction of the overlying Blake2b layer, which requires clients to test a variable number of Equi-X solutions in order to find answers which also satisfy this layer's effort constraint.

It's common for proof-of-work systems to define an exponential effort function based on a particular number of leading zero bits or equivalent. For the benefit of our effort control system, it's quite useful if we have a linear scale instead. We use the first 32 bits of a hashed version of the Equi-X solution as a uniformly distributed random value.

Conceptually we could define a function:

unsigned effort(uint8_t *token)

which takes as its argument a hashed solution, interprets it as a bitstring, and returns the quotient of dividing a bitstring of 1s by it.

So for example:

effort(00000001100010101101) = 11111111111111111111
                                 / 00000001100010101101

or the same in decimal:

effort(6317) = 1048575 / 6317 = 165.

In practice we can avoid even having to perform this division, performing just one multiply instead to see if a request's claimed effort is supported by the smallness of the resulting 32-bit hash prefix. This assumes we send the desired effort explicitly as part of each PoW solution. We do want to force clients to pick a specific effort before looking for a solution, otherwise a client could opportunistically claim a very large effort any time a lucky hash prefix comes up. Thus the effort is communicated explicitly in our protocol, and it forms part of the concatenated Equi-X challenge.

Parameter descriptor

This whole protocol starts with the service encoding its parameters in a pow-params line within the 'encrypted' (inner) part of the v3 descriptor. The second layer plaintext format describes it canonically. The parameters offered are:

scheme, always v1 for the algorithm described here
seed-b64, a periodically updated 32-byte random seed, base64 encoded
suggested-effort, the latest output from the service-side effort controller
expiration-time, a timestamp when we plan to replace the seed.

Seed expiration and rotation allows used nonces to expire from the anti-replay memory. At every seed rotation, a new expiration time is chosen uniformly at random from the recommended range:

At the earliest, 105 minutes in the future
At the latest, 2 hours in the future (15 minutes later)

The service SHOULD refresh its seed when expiration-time passes. The service SHOULD keep its previous seed in memory and accept PoWs using it to avoid race-conditions with clients that have an old seed. The service SHOULD avoid generating two consequent seeds that have a common 4 bytes prefix; see the usage of seed headings below in the introduction extension.

Client computes a solution

If a client receives a descriptor with pow-params, it should assume that the service is prepared to receive PoW solutions as part of the introduction protocol.

The client parses the descriptor and extracts the PoW parameters. It makes sure that the expiration-time has not expired. If it has, the descriptor may be out of date. Clients SHOULD fetch a fresh descriptor if the descriptor is stale and the seed is expired.

Inputs to the solver:

Effort E, the client-side effort choice made based on the server's suggested-effort and the client's connection attempt history. This is a 32-bit unsigned integer.
Constant personalization string P, equal to the following nul-terminated ASCII text: "Tor hs intro v1\0".
Identity string ID, a 32-byte value unique to the specific onion service. This is the blinded public ID key KP_hs_blind_id.
Seed C, a 32-byte random value decoded from seed-b64 above.
Initial nonce N, a 16-byte value generated using a secure random generator.

The solver itself is iterative; the following steps are repeated until they succeed:

Construct the challenge string by concatenating P || ID || C || N || htonl(E).
Calculate a candidate proof S by passing this challenge to Equi-X.

S = equix_solve(P || ID || C || N || htonl(E))
Calculate a 32-bit check value by interpreting a 32-bit Blake2b hash of the concatenated challenge and solution as an integer in network byte order.

R = ntohl(blake2b_32(P || ID || C || N || htonl(E) || S))
Check if 32-bit multiplication of R * E would overflow

If R * E overflows (the result would be greater than UINT32_MAX) the solver must retry with another nonce value. The client interprets N as a 16-byte little-endian integer, increments it by 1, and goes back to step 1.

If there is no overflow (the result is less than or equal to UINT32_MAX) this is a valid solution. The client can submit final nonce N, effort E, the first 4 bytes of seed C, and proof S.

Note that the Blake2b hash includes the output length parameter in its initial state vector, so a blake2b_32 is not equivalent to the prefix of a blake2b_512. We calculate the 32-bit Blake2b specifically, and interpret it in network byte order as an unsigned integer.

At the end of the above procedure, the client should have calculated a proof S and final nonce N that satisfies both the Equi-X proof conditions and the Blake2b effort test. The time taken, on average, is linearly proportional with the target effort E parameter.

The algorithm as described is suitable for single-threaded computation. Optionally, a client may choose multiple nonces and attempt several solutions in parallel on separate CPU cores. The specific choice of nonce is entirely up to the client, so parallelization choices like this do not impact the network protocol's interoperability at all.

Client sends its proof in an INTRO1 extension

Now that the client has an answer to the puzzle it's time to encode it into an INTRODUCE1 message. To do so the client adds an extension to the encrypted portion of the INTRODUCE1 message by using the EXTENSIONS field. The encrypted portion of the INTRODUCE1 message only gets read by the onion service and is ignored by the introduction point.

This extension includes the chosen nonce and effort in full, as well as the actual Equi-X proof. Clients provide only the first 4 bytes of the seed, enough to disambiguate between multiple recent seeds offered by the service.

This format is defined canonically as the proof-of-work extension to INTRODUCE1.

Service verifies PoW and handles the introduction

When a service receives an INTRODUCE1 with the PROOF_OF_WORK extension, it should check its configuration on whether proof-of-work is enabled on the service. If it's not enabled, the extension SHOULD BE ignored. If enabled, even if the suggested effort is currently zero, the service follows the procedure detailed in this section.

If the service requires the PROOF_OF_WORK extension but received an INTRODUCE1 message without any embedded proof-of-work, the service SHOULD consider this message as a zero-effort introduction for the purposes of the priority queue.

To verify the client's proof-of-work the service MUST do the following steps:

Find a valid seed C that starts with POW_SEED. Fail if no such seed exists.
Fail if N = POW_NONCE is present in the replay protection data structure.
Construct the challenge string as above by concatenating P || ID || C || N || htonl(E). In this case, E and N are values provided by the client.
Calculate R = ntohl(blake2b_32(P || ID || C || N || htonl(E) || S)), as above
Fail if the the effort test overflows (R * E > UINT32_MAX).
Fail if Equi-X reports that the proof S is malformed or not applicable (equix_verify(P || ID || C || N || htonl(E), S) != EQUIX_OK)
If both the Blake2b and Equi-X tests pass, the request can be enqueued with priority E.

It's a minor performance optimization for services to compute the effort test before invoking equix_verify. Blake2b verification is cheaper than Equi-X verification, so this ordering slightly raises the minimum effort required to perform a top-half attack.

If any of these steps fail the service MUST ignore this introduction request and abort the protocol.

In this document we call the above steps the "top half" of introduction handling. If all the steps of the "top half" have passed, then the circuit is added to the introduction queue.

Tor Specifications