Filename: 362-update-pow-control-loop.md
Title: Updating the Proof-of-Work Control Loop
Author: Wesley Aptekar-Cassels
Created: 22 April 2025
Status: Open
Ticket: torspec#329

Introduction

The Tor protocol provides a proof-of-work based DDoS protection mechanism, as described in proposal 327. That proposal describes several pieces that come together into a proof-of-work scheme:

A set of algorithms for generating and solving proof-of-work challenges
A method for onion services to publish such challenges
A method for clients to provide solutions for the challenges
A control loop algorithm for services to adjust the suggested difficulty based on load

This proposal suggests a new algorithm for adjusting suggested difficulty, while keeping all other aspects of the current proof-of-work scheme the same. As such, it only affects onion services; clients do not know or care whether a service is using the updated algorithm.

Problem statement

The current PoW control loop for suggested effort is vulnerable to attacks that allow attackers to artificially inflate the suggested effort value, turning PoW into a DOS vector itself. These attacks are in a currently unpublished draft paper. Additionally, the current update algorithm is somewhat complex and not rooted in fundamental control theory principles. This proposal is for a simpler algorithm which should avoid the attacks while being easier to analyze.

Goals for an improved algorithm

Control loop target

The aim of the control loop is to find a value for suggested effort such that the enqueue rate of requests with at least the current suggested effort is less than or equal to the dequeue rate for the service. Ideally, the lowest possible value that meets this requirement should be chosen.

This is the goal, because when we reach it, it means that all clients that use the suggested effort value will be served.

This means that we need to choose whether to increase or decrease the suggested effort based on whether the enqueue rate is greater than or less than the dequeue rate.

Simple configuration

We should aim to provide as few configuration knobs as possible, while still giving operators the flexibility they need to configure services for different load scenarios.

Impact of a request should be time-independent

A request coming in should have the same impact on the change in suggested effort regardless of when in the control loop update period that request came in. A majority of the attacks on the current control loop are because it does not meet this requirement.

Smart trimming of queue

The current approach mass-trims the request queue by removing half of the elements whenever a threshold is reached. This seems to be a somewhat heavy-handed approach. Part of the intention seems to be to avoid a situation where many old low-effort requests remain in the queue (which have likely timed out on the client side), and potentially push newer low-effort requests out of the queue. However, it seems more desirable to track this directly, removing requests from the queue based on a timeout, rather than purely based on their position in the queue.

Proposed algorithm

Queue management

Requests are processed highest effort first, and if there are multiple requests with the same effort, the oldest request is processed first.

If the queue is full when an item is added, the lowest-effort request is removed from the queue. If there are multiple requests with the same effort, the oldest request is removed.

There is a maximum age set for all queue items, based on the new consensus parameter HiddenServiceProofOfWorkV1ServiceIntroTimeoutSeconds. Once an item is older than that age, it is removed (this may be done as a periodic task, to make implementation easier).

The depth of the queue will be configurable by the onion service operator, with a default set according to either a reasonable hardcoded default, the amount of available memory, or based on the speed at which the service can process requests, at the discretion of the implementation. Onion service operators should increase this value if they have a bursty traffic pattern, are seeing dropped requests, and have memory to spare.

The onion service operator can also configure a "decay adjustment" value, between 0 and 99 (although implementations should likely limit the maximum value to 75), which has a default set based on the new consensus parameter HiddenServiceProofOfWorkV1ServiceDefaultDecayAdjustment, and can be set to a higher value to reduce the speed at which the suggested effort will be decreased once the service is no longer under load.

State

num_enqueued_gte_suggested - Number of requests that were enqueued during the current update period, and had an effort greater than or equal to the suggested effort.
num_dequeued - Number of requests that were dequeued during this update period.
idle_time - Amount of time during this update period that we spent with no requests in the queue. This can be tracked by keeping track of the time that the queue last went between empty and non-empty, and updating idle_time in the queue push/pop function when the queue becomes empty or receives a new item while empty.
total_effort - Sum of all effort values that were validated and enqueued this update period.
suggested_effort - Current/previous suggested effort value to be published.

Algorithm

At the end of each update period, the suggested effort is modified as follows:

if num_dequeued == 0:
    return
avg_request_time = update_period_time / num_dequeued
adjusted_idle_time = max(idle_time - (avg_request_time * queue_depth), 0)
busy_fraction = 1 - (adjusted_idle_time / update_period_time)
if busy_fraction == 0:
    suggested_effort = suggested_effort * (config_decay_adjustment / 100)
    return
theoretical_num_dequeued = num_dequeued * (1 / busy_fraction)
if num_enqueued_gte_suggested >= theoretical_num_dequeued:
    suggested_effort = max((total_effort / num_dequeued), suggested_effort + 1)
else:
    decay = num_enqueued_gte_suggested / theoretical_num_dequeued
    adjusted_decay = decay + ((1 - decay) * (config_decay_adjustment / 100))
    suggested_effort = suggested_effort * adjusted_decay

Explanation

The num_enqueued_gte_suggested and num_dequeued are tracked to keep track of whether we are meeting our goal of keeping the dequeue rate above the enqueue rate. However, just keeping track of those would run into problems, since it doesn't allow us to distinguish between the number of enqueues and dequeues being equal since we're just keeping up vs being equal because we are sitting idle most of the time. In order to track that, we keep track of how much time we're sitting idle, and calculate a "theoretical" dequeue rate, which is what the dequeue rate would be if the queue was non-empty for the entire update period. Instead of using the idle time directly, we also look at the number of requests in the queue and the rate at which we are processing them, and subtract the amount of time we expect it to take to clear the queue from the idle time. This ensures that if we are slammed with many requests right at the end of a otherwise-idle update period, the requests that were received right at the end of the period are considered when adjusting the suggested effort. This allows better control over reduction of the suggested effort.

If we need to increase the suggested effort, we use the same algorithm as in the existing algorithm. This relies on the assumption that the total amount of compute in the system is fixed, which, while incorrect, seems to be a reasonable approximation for this purpose.

In the case where we need to decrease the effort, we base the amount that we reduce it by on the amount of spare capacity that we have to dequeue requests. This is in contrast to the current algorithm, which reduces the value by ⅔rds for every decrease. Doing it in this proportional way should result in less oscillation of the suggested effort value, at the cost of perhaps being slightly slower to decrease the suggested effort in some circumstances.

Limiting maximum effort

The service should limit the maximum suggested effort to the same value that the client maximum is, currently set to 10,000. It should also cap all incoming requests at this maximum effort. This prevents attackers from having an advantage compared to regular clients by adjusting the client-side maximum.

The maximum suggested effort value should be added as a consensus parameter, rather than being hardcoded.

Analysis of known attacks

The "Onions Got Puzzled" paper describes four possible attack strategies, analyzed here in relation to this proposed algorithm. Because the paper is not yet public, the attacks are discussed only briefly.

Strategy 1: End-rush

This strategy relies on figuring out when the update interval ends, and sending a rush of requests then to artificially inflate the queue size, and thus the suggested effort as well.

This strategy is ineffective against the proposed algorithm, since the proposed algorithm does not care about the number of requests that are in the queue at the end of the update period specifically.

Strategy 2: Temporary-turmoil

This strategy exploits a discrepancy in the ratio of enqueued requests and handled requests used by the existing algorithm, which allows a very large number of very cheap requests to inflate the suggested difficulty by forcing the queue to be trimmed, since enqueued-then-trimmed requests are still counted for the suggested difficulty calculation.

This strategy is ineffective for the proposed algorithm, since low-effort requests do not significantly increase the suggested effort, and triggering queue-trimming does not cause an update in the suggested difficulty. Timing a series of low-effort requests in a particular way does not allow the attacker to change the effect of those requests on the suggested difficulty.

Using a large number of low-effort requests still could potentially be an effective way to force an increase of the suggested effort, but assuming the assumptions behind the puzzle algorithm hold, it should be no more effective than sending the same total effort as a small number of expensive requests.

Strategy 3

This strategy exploits behaviour of the C Tor codebase that is not described in the spec. As such, the solution is beyond the scope of this proposal.

Strategy 4: Maintenance

This strategy allows maintaining a high suggested effort at a low cost by sending low-effort requests, since the existing algorithm only uses the number of requests to decrease the suggested effort.

This strategy is ineffective for the proposed algorithm, since the proposed algorithm uses the total effort in calculating the new effort when decreasing the suggested effort.

Appendices

New consensus parameters

HiddenServiceProofOfWorkV1MaxEffort - Maximum effort clients should send, currently 10,000. Services should cap higher efforts at this value. Minimum: 0. Maximum: UINT32_MAX. Default: 10,000.
HiddenServiceProofOfWorkV1ServiceIntroTimeoutSeconds - The maximum age in seconds for items in the intro queue. Minimum: 1. Maximum: UINT64_MAX. Default: 300.
HiddenServiceProofOfWorkV1ServiceDefaultDecayAdjustment - The default decay adjustment value. Minimum: 0. Maximum: 99. Default: 0.

Tor design proposals