Proposals for changes in the Tor protocols

This "book" is a list of proposals that people have made over the years, (dating back to 2007) for protocol changes in Tor. Some of these proposals are already implemented or rejected; others are under active discussion.

If you're looking for a specific proposal, you can find it, by filename, in the summary bar on the left, or at this index. You can also see a list of Tor protocols by their status at BY_STATUS.md.

For information on creating a new proposal, you would ideally look at 001-process.txt. That file is a bit out-of-date, though, and you should probably just contact the developers.

Tor proposals by number

Here we have a set of proposals for changes to the Tor protocol. Some of these proposals are implemented; some are works in progress; and some will never be implemented.

Below are a list of proposals sorted by their proposal number. See BY_STATUS.md for a list of proposals sorted by status.

Tor proposals by status

Here we have a set of proposals for changes to the Tor protocol. Some of these proposals are implemented; some are works in progress; and some will never be implemented.

Below are a list of proposals sorted by status. See BY_INDEX.md for a list of proposals sorted by number.

Active proposals by status

OPEN proposals: under discussion

These are proposals that we think are likely to be complete, and ripe for discussion.

ACCEPTED proposals: slated for implementation

These are the proposals that we agree we'd like to implement. They might or might not have a specific timeframe planned for their implementation.

FINISHED proposals: implemented, specs not merged

These proposals are implemented in some version of Tor; the proposals themselves still need to be merged into the specifications proper.

META proposals: about the proposal process

These proposals describe ongoing policies and changes to the proposals process.

INFORMATIONAL proposals: not actually specifications

These proposals describe a process or project, but aren't actually proposed changes in the Tor specifications.

Preliminary proposals

DRAFT proposals: incomplete works

These proposals have been marked as a draft by their author or the editors, indicating that they aren't yet in a complete form. They're still open for discussion.

NEEDS-REVISION proposals: ideas that we can't implement as-is

These proposals have some promise, but we can't implement them without certain changes.

NEEDS-RESEARCH proposals: blocking on research

These proposals are interesting ideas, but there's more research that would need to happen before we can know whether to implement them or not, or to fill in certain details.

(There are no proposals in this category)

Inactive proposals by status

CLOSED proposals: implemented and specified

These proposals have been implemented in some version of Tor, and the changes from the proposals have been merged into the specifications as necessary.

RESERVE proposals: saving for later

These proposals aren't anything we plan to implement soon, but for one reason or another we think they might be a good idea in the future. We're keeping them around as a reference in case we someday confront the problems that they try to solve.

SUPERSEDED proposals: replaced by something else

These proposals were obsoleted by a later proposal before they were implemented.

DEAD, REJECTED, OBSOLETE proposals: not in our plans

These proposals are not on-track for discussion or implementation. Either discussion has stalled out (the proposal is DEAD), the proposal has been considered and not adopted (the proposal is REJECTED), or the proposal addresses an issue or a solution that is no longer relevant (the proposal is OBSOLETE).

Filename: 000-index.txt
Title: Index of Tor Proposals
Author: Nick Mathewson
Created: 26-Jan-2007
Status: Meta

Overview:

   This document provides an index to Tor proposals.

   This is an informational document.

   Everything in this document below the line of '=' signs is automatically
   generated by reindex.py; do not edit by hand.

============================================================
Proposals by number:

000  Index of Tor Proposals [META]
001  The Tor Proposal Process [META]
098  Proposals that should be written [OBSOLETE]
099  Miscellaneous proposals [OBSOLETE]
100  Tor Unreliable Datagram Extension Proposal [DEAD]
101  Voting on the Tor Directory System [CLOSED]
102  Dropping "opt" from the directory format [CLOSED]
103  Splitting identity key from regularly used signing key [CLOSED]
104  Long and Short Router Descriptors [CLOSED]
105  Version negotiation for the Tor protocol [CLOSED]
106  Checking fewer things during TLS handshakes [CLOSED]
107  Uptime Sanity Checking [CLOSED]
108  Base "Stable" Flag on Mean Time Between Failures [CLOSED]
109  No more than one server per IP address [CLOSED]
110  Avoiding infinite length circuits [CLOSED]
111  Prioritizing local traffic over relayed traffic [CLOSED]
112  Bring Back Pathlen Coin Weight [SUPERSEDED]
113  Simplifying directory authority administration [SUPERSEDED]
114  Distributed Storage for Tor Hidden Service Descriptors [CLOSED]
115  Two Hop Paths [DEAD]
116  Two hop paths from entry guards [DEAD]
117  IPv6 exits [CLOSED]
118  Advertising multiple ORPorts at once [SUPERSEDED]
119  New PROTOCOLINFO command for controllers [CLOSED]
120  Shutdown descriptors when Tor servers stop [DEAD]
121  Hidden Service Authentication [CLOSED]
122  Network status entries need a new Unnamed flag [CLOSED]
123  Naming authorities automatically create bindings [CLOSED]
124  Blocking resistant TLS certificate usage [SUPERSEDED]
125  Behavior for bridge users, bridge relays, and bridge authorities [CLOSED]
126  Getting GeoIP data and publishing usage summaries [CLOSED]
127  Relaying dirport requests to Tor download site / website [OBSOLETE]
128  Families of private bridges [DEAD]
129  Block Insecure Protocols by Default [CLOSED]
130  Version 2 Tor connection protocol [CLOSED]
131  Help users to verify they are using Tor [OBSOLETE]
132  A Tor Web Service For Verifying Correct Browser Configuration [OBSOLETE]
133  Incorporate Unreachable ORs into the Tor Network [RESERVE]
134  More robust consensus voting with diverse authority sets [REJECTED]
135  Simplify Configuration of Private Tor Networks [CLOSED]
136  Mass authority migration with legacy keys [CLOSED]
137  Keep controllers informed as Tor bootstraps [CLOSED]
138  Remove routers that are not Running from consensus documents [CLOSED]
139  Download consensus documents only when it will be trusted [CLOSED]
140  Provide diffs between consensuses [CLOSED]
141  Download server descriptors on demand [OBSOLETE]
142  Combine Introduction and Rendezvous Points [DEAD]
143  Improvements of Distributed Storage for Tor Hidden Service Descriptors [SUPERSEDED]
144  Increase the diversity of circuits by detecting nodes belonging the same provider [OBSOLETE]
145  Separate "suitable as a guard" from "suitable as a new guard" [SUPERSEDED]
146  Add new flag to reflect long-term stability [SUPERSEDED]
147  Eliminate the need for v2 directories in generating v3 directories [REJECTED]
148  Stream end reasons from the client side should be uniform [CLOSED]
149  Using data from NETINFO cells [SUPERSEDED]
150  Exclude Exit Nodes from a circuit [CLOSED]
151  Improving Tor Path Selection [CLOSED]
152  Optionally allow exit from single-hop circuits [CLOSED]
153  Automatic software update protocol [SUPERSEDED]
154  Automatic Software Update Protocol [SUPERSEDED]
155  Four Improvements of Hidden Service Performance [CLOSED]
156  Tracking blocked ports on the client side [SUPERSEDED]
157  Make certificate downloads specific [CLOSED]
158  Clients download consensus + microdescriptors [CLOSED]
159  Exit Scanning [INFORMATIONAL]
160  Authorities vote for bandwidth offsets in consensus [CLOSED]
161  Computing Bandwidth Adjustments [CLOSED]
162  Publish the consensus in multiple flavors [CLOSED]
163  Detecting whether a connection comes from a client [SUPERSEDED]
164  Reporting the status of server votes [OBSOLETE]
165  Easy migration for voting authority sets [REJECTED]
166  Including Network Statistics in Extra-Info Documents [CLOSED]
167  Vote on network parameters in consensus [CLOSED]
168  Reduce default circuit window [REJECTED]
169  Eliminate TLS renegotiation for the Tor connection handshake [SUPERSEDED]
170  Configuration options regarding circuit building [SUPERSEDED]
171  Separate streams across circuits by connection metadata [CLOSED]
172  GETINFO controller option for circuit information [RESERVE]
173  GETINFO Option Expansion [OBSOLETE]
174  Optimistic Data for Tor: Server Side [CLOSED]
175  Automatically promoting Tor clients to nodes [REJECTED]
176  Proposed version-3 link handshake for Tor [CLOSED]
177  Abstaining from votes on individual flags [RESERVE]
178  Require majority of authorities to vote for consensus parameters [CLOSED]
179  TLS certificate and parameter normalization [CLOSED]
180  Pluggable transports for circumvention [CLOSED]
181  Optimistic Data for Tor: Client Side [CLOSED]
182  Credit Bucket [OBSOLETE]
183  Refill Intervals [CLOSED]
184  Miscellaneous changes for a v3 Tor link protocol [CLOSED]
185  Directory caches without DirPort [SUPERSEDED]
186  Multiple addresses for one OR or bridge [CLOSED]
187  Reserve a cell type to allow client authorization [CLOSED]
188  Bridge Guards and other anti-enumeration defenses [RESERVE]
189  AUTHORIZE and AUTHORIZED cells [OBSOLETE]
190  Bridge Client Authorization Based on a Shared Secret [OBSOLETE]
191  Bridge Detection Resistance against MITM-capable Adversaries [OBSOLETE]
192  Automatically retrieve and store information about bridges [OBSOLETE]
193  Safe cookie authentication for Tor controllers [CLOSED]
194  Mnemonic .onion URLs [SUPERSEDED]
195  TLS certificate normalization for Tor 0.2.4.x [DEAD]
196  Extended ORPort and TransportControlPort [CLOSED]
197  Message-based Inter-Controller IPC Channel [REJECTED]
198  Restore semantics of TLS ClientHello [CLOSED]
199  Integration of BridgeFinder and BridgeFinderHelper [OBSOLETE]
200  Adding new, extensible CREATE, EXTEND, and related cells [CLOSED]
201  Make bridges report statistics on daily v3 network status requests [RESERVE]
202  Two improved relay encryption protocols for Tor cells [META]
203  Avoiding censorship by impersonating an HTTPS server [OBSOLETE]
204  Subdomain support for Hidden Service addresses [CLOSED]
205  Remove global client-side DNS caching [CLOSED]
206  Preconfigured directory sources for bootstrapping [CLOSED]
207  Directory guards [CLOSED]
208  IPv6 Exits Redux [CLOSED]
209  Tuning the Parameters for the Path Bias Defense [OBSOLETE]
210  Faster Headless Consensus Bootstrapping [SUPERSEDED]
211  Internal Mapaddress for Tor Configuration Testing [RESERVE]
212  Increase Acceptable Consensus Age [NEEDS-REVISION]
213  Remove stream-level sendmes from the design [DEAD]
214  Allow 4-byte circuit IDs in a new link protocol [CLOSED]
215  Let the minimum consensus method change with time [CLOSED]
216  Improved circuit-creation key exchange [CLOSED]
217  Tor Extended ORPort Authentication [CLOSED]
218  Controller events to better understand connection/circuit usage [CLOSED]
219  Support for full DNS and DNSSEC resolution in Tor [NEEDS-REVISION]
220  Migrate server identity keys to Ed25519 [CLOSED]
221  Stop using CREATE_FAST [CLOSED]
222  Stop sending client timestamps [CLOSED]
223  Ace: Improved circuit-creation key exchange [RESERVE]
224  Next-Generation Hidden Services in Tor [CLOSED]
225  Strawman proposal: commit-and-reveal shared rng [SUPERSEDED]
226  "Scalability and Stability Improvements to BridgeDB: Switching to a Distributed Database System and RDBMS" [RESERVE]
227  Include package fingerprints in consensus documents [CLOSED]
228  Cross-certifying identity keys with onion keys [CLOSED]
229  Further SOCKS5 extensions [REJECTED]
230  How to change RSA1024 relay identity keys [OBSOLETE]
231  Migrating authority RSA1024 identity keys [OBSOLETE]
232  Pluggable Transport through SOCKS proxy [CLOSED]
233  Making Tor2Web mode faster [REJECTED]
234  Adding remittance field to directory specification [REJECTED]
235  Stop assigning (and eventually supporting) the Named flag [CLOSED]
236  The move to a single guard node [CLOSED]
237  All relays are directory servers [CLOSED]
238  Better hidden service stats from Tor relays [CLOSED]
239  Consensus Hash Chaining [OPEN]
240  Early signing key revocation for directory authorities [OPEN]
241  Resisting guard-turnover attacks [REJECTED]
242  Better performance and usability for the MyFamily option [SUPERSEDED]
243  Give out HSDir flag only to relays with Stable flag [CLOSED]
244  Use RFC5705 Key Exporting in our AUTHENTICATE calls [CLOSED]
245  Deprecating and removing the TAP circuit extension protocol [SUPERSEDED]
246  Merging Hidden Service Directories and Introduction Points [REJECTED]
247  Defending Against Guard Discovery Attacks using Vanguards [SUPERSEDED]
248  Remove all RSA identity keys [NEEDS-REVISION]
249  Allow CREATE cells with >505 bytes of handshake data [SUPERSEDED]
250  Random Number Generation During Tor Voting [CLOSED]
251  Padding for netflow record resolution reduction [CLOSED]
252  Single Onion Services [SUPERSEDED]
253  Out of Band Circuit HMACs [DEAD]
254  Padding Negotiation [CLOSED]
255  Controller features to allow for load-balancing hidden services [RESERVE]
256  Key revocation for relays and authorities [RESERVE]
257  Refactoring authorities and making them more isolated from the net [META]
258  Denial-of-service resistance for directory authorities [DEAD]
259  New Guard Selection Behaviour [OBSOLETE]
260  Rendezvous Single Onion Services [FINISHED]
261  AEZ for relay cryptography [OBSOLETE]
262  Re-keying live circuits with new cryptographic material [RESERVE]
263  Request to change key exchange protocol for handshake v1.2 [OBSOLETE]
264  Putting version numbers on the Tor subprotocols [CLOSED]
265  Load Balancing with Overhead Parameters [OPEN]
266  Removing current obsolete clients from the Tor network [SUPERSEDED]
267  Tor Consensus Transparency [OPEN]
268  New Guard Selection Behaviour [OBSOLETE]
269  Transitionally secure hybrid handshakes [NEEDS-REVISION]
270  RebelAlliance: A Post-Quantum Secure Hybrid Handshake Based on NewHope [OBSOLETE]
271  Another algorithm for guard selection [CLOSED]
272  Listed routers should be Valid, Running, and treated as such [CLOSED]
273  Exit relay pinning for web services [RESERVE]
274  Rotate onion keys less frequently [CLOSED]
275  Stop including meaningful "published" time in microdescriptor consensus [CLOSED]
276  Report bandwidth with lower granularity in consensus documents [DEAD]
277  Detect multiple relay instances running with same ID [OPEN]
278  Directory Compression Scheme Negotiation [CLOSED]
279  A Name System API for Tor Onion Services [NEEDS-REVISION]
280  Privacy-Preserving Statistics with Privcount in Tor [SUPERSEDED]
281  Downloading microdescriptors in bulk [RESERVE]
282  Remove "Named" and "Unnamed" handling from consensus voting [ACCEPTED]
283  Move IPv6 ORPorts from microdescriptors to the microdesc consensus [CLOSED]
284  Hidden Service v3 Control Port [CLOSED]
285  Directory documents should be standardized as UTF-8 [ACCEPTED]
286  Controller APIs for hibernation access on mobile [REJECTED]
287  Reduce circuit lifetime without overloading the network [OPEN]
288  Privacy-Preserving Statistics with Privcount in Tor (Shamir version) [RESERVE]
289  Authenticating sendme cells to mitigate bandwidth attacks [CLOSED]
290  Continuously update consensus methods [META]
291  The move to two guard nodes [FINISHED]
292  Mesh-based vanguards [CLOSED]
293  Other ways for relays to know when to publish [CLOSED]
294  TLS 1.3 Migration [DRAFT]
295  Using ADL for relay cryptography (solving the crypto-tagging attack) [OPEN]
296  Have Directory Authorities expose raw bandwidth list files [CLOSED]
297  Relaxing the protover-based shutdown rules [CLOSED]
298  Putting family lines in canonical form [CLOSED]
299  Preferring IPv4 or IPv6 based on IP Version Failure Count [SUPERSEDED]
300  Walking Onions: Scaling and Saving Bandwidth [INFORMATIONAL]
301  Don't include package fingerprints in consensus documents [CLOSED]
302  Hiding onion service clients using padding [CLOSED]
303  When and how to remove support for protocol versions [OPEN]
304  Extending SOCKS5 Onion Service Error Codes [CLOSED]
305  ESTABLISH_INTRO Cell DoS Defense Extension [CLOSED]
306  A Tor Implementation of IPv6 Happy Eyeballs [OPEN]
307  Onion Balance Support for Onion Service v3 [RESERVE]
308  Counter Galois Onion: A New Proposal for Forward-Secure Relay Cryptography [SUPERSEDED]
309  Optimistic SOCKS Data [OPEN]
310  Towards load-balancing in Prop 271 [CLOSED]
311  Tor Relay IPv6 Reachability [ACCEPTED]
312  Tor Relay Automatic IPv6 Address Discovery [ACCEPTED]
313  Tor Relay IPv6 Statistics [ACCEPTED]
314  Allow Markdown for proposal format [CLOSED]
315  Updating the list of fields required in directory documents [CLOSED]
316  FlashFlow: A Secure Speed Test for Tor (Parent Proposal) [DRAFT]
317  Improve security aspects of DNS name resolution [NEEDS-REVISION]
318  Limit protover values to 0-63 [CLOSED]
319  RELAY_FRAGMENT cells [OBSOLETE]
320  Removing TAP usage from v2 onion services [REJECTED]
321  Better performance and usability for the MyFamily option (v2) [ACCEPTED]
322  Extending link specifiers to include the directory port [OPEN]
323  Specification for Walking Onions [OPEN]
324  RTT-based Congestion Control for Tor [FINISHED]
325  Packed relay cells: saving space on small commands [OBSOLETE]
326  The "tor-relay" Well-Known Resource Identifier [OPEN]
327  A First Take at PoW Over Introduction Circuits [CLOSED]
328  Make Relays Report When They Are Overloaded [CLOSED]
329  Overcoming Tor's Bottlenecks with Traffic Splitting [FINISHED]
330  Modernizing authority contact entries [OPEN]
331  Res tokens: Anonymous Credentials for Onion Service DoS Resilience [DRAFT]
332  Ntor protocol with extra data, version 3 [CLOSED]
333  Vanguards lite [CLOSED]
334  A Directory Authority Flag To Mark Relays As Middle-only [SUPERSEDED]
335  An authority-only design for MiddleOnly [CLOSED]
336  Randomized schedule for guard retries [CLOSED]
337  A simpler way to decide, "Is this guard usable?" [CLOSED]
338  Use an 8-byte timestamp in NETINFO cells [ACCEPTED]
339  UDP traffic over Tor [ACCEPTED]
340  Packed and fragmented relay messages [OPEN]
341  A better algorithm for out-of-sockets eviction [OPEN]
342  Decoupling hs_interval and SRV lifetime [DRAFT]
343  CAA Extensions for the Tor Rendezvous Specification [OPEN]
344  Prioritizing Protocol Information Leaks in Tor [OPEN]
345  Migrating the tor specifications to mdbook [CLOSED]
346  Clarifying and extending the use of protocol versioning [OPEN]
347  Domain separation for certificate signing keys [OPEN]
348  UDP Application Support in Tor [OPEN]
349  Client-Side Command Acceptance Validation [DRAFT]
350  A phased plan to remove TAP onion keys [ACCEPTED]
351  Making SOCKS5 authentication extensions extensible [CLOSED]
352  Handling Complex DNS Traffic for VPN usage in Tor [DRAFT]
353  Requiring secure relay identities in EXTEND2 [DRAFT]


Proposals by status:

 DRAFT:
   294  TLS 1.3 Migration
   316  FlashFlow: A Secure Speed Test for Tor (Parent Proposal)
   331  Res tokens: Anonymous Credentials for Onion Service DoS Resilience
   342  Decoupling hs_interval and SRV lifetime
   349  Client-Side Command Acceptance Validation
   352  Handling Complex DNS Traffic for VPN usage in Tor
   353  Requiring secure relay identities in EXTEND2
 NEEDS-REVISION:
   212  Increase Acceptable Consensus Age [for 0.2.4.x+]
   219  Support for full DNS and DNSSEC resolution in Tor [for 0.2.5.x]
   248  Remove all RSA identity keys
   269  Transitionally secure hybrid handshakes
   279  A Name System API for Tor Onion Services
   317  Improve security aspects of DNS name resolution
 OPEN:
   239  Consensus Hash Chaining
   240  Early signing key revocation for directory authorities
   265  Load Balancing with Overhead Parameters [for arti-dirauth]
   267  Tor Consensus Transparency
   277  Detect multiple relay instances running with same ID [for 0.3.??]
   287  Reduce circuit lifetime without overloading the network
   295  Using ADL for relay cryptography (solving the crypto-tagging attack)
   303  When and how to remove support for protocol versions
   306  A Tor Implementation of IPv6 Happy Eyeballs
   309  Optimistic SOCKS Data
   322  Extending link specifiers to include the directory port
   323  Specification for Walking Onions
   326  The "tor-relay" Well-Known Resource Identifier
   330  Modernizing authority contact entries
   340  Packed and fragmented relay messages
   341  A better algorithm for out-of-sockets eviction
   343  CAA Extensions for the Tor Rendezvous Specification
   344  Prioritizing Protocol Information Leaks in Tor
   346  Clarifying and extending the use of protocol versioning
   347  Domain separation for certificate signing keys
   348  UDP Application Support in Tor
 ACCEPTED:
   282  Remove "Named" and "Unnamed" handling from consensus voting [for arti-dirauth]
   285  Directory documents should be standardized as UTF-8 [for arti-dirauth]
   311  Tor Relay IPv6 Reachability
   312  Tor Relay Automatic IPv6 Address Discovery
   313  Tor Relay IPv6 Statistics
   321  Better performance and usability for the MyFamily option (v2)
   338  Use an 8-byte timestamp in NETINFO cells
   339  UDP traffic over Tor
   350  A phased plan to remove TAP onion keys
 META:
   000  Index of Tor Proposals
   001  The Tor Proposal Process
   202  Two improved relay encryption protocols for Tor cells
   257  Refactoring authorities and making them more isolated from the net
   290  Continuously update consensus methods
 FINISHED:
   260  Rendezvous Single Onion Services [in 0.2.9.3-alpha]
   291  The move to two guard nodes
   324  RTT-based Congestion Control for Tor
   329  Overcoming Tor's Bottlenecks with Traffic Splitting
 CLOSED:
   101  Voting on the Tor Directory System [in 0.2.0.x]
   102  Dropping "opt" from the directory format [in 0.2.0.x]
   103  Splitting identity key from regularly used signing key [in 0.2.0.x]
   104  Long and Short Router Descriptors [in 0.2.0.x]
   105  Version negotiation for the Tor protocol [in 0.2.0.x]
   106  Checking fewer things during TLS handshakes [in 0.2.0.x]
   107  Uptime Sanity Checking [in 0.2.0.x]
   108  Base "Stable" Flag on Mean Time Between Failures [in 0.2.0.x]
   109  No more than one server per IP address [in 0.2.0.x]
   110  Avoiding infinite length circuits [for 0.2.3.x] [in 0.2.1.3-alpha, 0.2.3.11-alpha]
   111  Prioritizing local traffic over relayed traffic [in 0.2.0.x]
   114  Distributed Storage for Tor Hidden Service Descriptors [in 0.2.0.x]
   117  IPv6 exits [for 0.2.4.x] [in 0.2.4.7-alpha]
   119  New PROTOCOLINFO command for controllers [in 0.2.0.x]
   121  Hidden Service Authentication [in 0.2.1.x]
   122  Network status entries need a new Unnamed flag [in 0.2.0.x]
   123  Naming authorities automatically create bindings [in 0.2.0.x]
   125  Behavior for bridge users, bridge relays, and bridge authorities [in 0.2.0.x]
   126  Getting GeoIP data and publishing usage summaries [in 0.2.0.x]
   129  Block Insecure Protocols by Default [in 0.2.0.x]
   130  Version 2 Tor connection protocol [in 0.2.0.x]
   135  Simplify Configuration of Private Tor Networks [for 0.2.1.x] [in 0.2.1.2-alpha]
   136  Mass authority migration with legacy keys [in 0.2.0.x]
   137  Keep controllers informed as Tor bootstraps [in 0.2.1.x]
   138  Remove routers that are not Running from consensus documents [in 0.2.1.2-alpha]
   139  Download consensus documents only when it will be trusted [in 0.2.1.x]
   140  Provide diffs between consensuses [in 0.3.1.1-alpha]
   148  Stream end reasons from the client side should be uniform [in 0.2.1.9-alpha]
   150  Exclude Exit Nodes from a circuit [in 0.2.1.3-alpha]
   151  Improving Tor Path Selection [in 0.2.2.2-alpha]
   152  Optionally allow exit from single-hop circuits [in 0.2.1.6-alpha]
   155  Four Improvements of Hidden Service Performance [in 0.2.1.x]
   157  Make certificate downloads specific [for 0.2.4.x]
   158  Clients download consensus + microdescriptors [in 0.2.3.1-alpha]
   160  Authorities vote for bandwidth offsets in consensus [for 0.2.1.x]
   161  Computing Bandwidth Adjustments [for 0.2.1.x]
   162  Publish the consensus in multiple flavors [in 0.2.3.1-alpha]
   166  Including Network Statistics in Extra-Info Documents [for 0.2.2]
   167  Vote on network parameters in consensus [in 0.2.2]
   171  Separate streams across circuits by connection metadata [in 0.2.3.3-alpha]
   174  Optimistic Data for Tor: Server Side [in 0.2.3.1-alpha]
   176  Proposed version-3 link handshake for Tor [for 0.2.3]
   178  Require majority of authorities to vote for consensus parameters [in 0.2.3.9-alpha]
   179  TLS certificate and parameter normalization [for 0.2.3.x]
   180  Pluggable transports for circumvention [in 0.2.3.x]
   181  Optimistic Data for Tor: Client Side [in 0.2.3.3-alpha]
   183  Refill Intervals [in 0.2.3.5-alpha]
   184  Miscellaneous changes for a v3 Tor link protocol [for 0.2.3.x]
   186  Multiple addresses for one OR or bridge [for 0.2.4.x+]
   187  Reserve a cell type to allow client authorization [for 0.2.3.x]
   193  Safe cookie authentication for Tor controllers
   196  Extended ORPort and TransportControlPort [in 0.2.5.2-alpha]
   198  Restore semantics of TLS ClientHello [for 0.2.4.x]
   200  Adding new, extensible CREATE, EXTEND, and related cells [in 0.2.4.8-alpha]
   204  Subdomain support for Hidden Service addresses
   205  Remove global client-side DNS caching [in 0.2.4.7-alpha.]
   206  Preconfigured directory sources for bootstrapping [in 0.2.4.7-alpha]
   207  Directory guards [for 0.2.4.x]
   208  IPv6 Exits Redux [for 0.2.4.x] [in 0.2.4.7-alpha]
   214  Allow 4-byte circuit IDs in a new link protocol [in 0.2.4.11-alpha]
   215  Let the minimum consensus method change with time [in 0.2.6.1-alpha]
   216  Improved circuit-creation key exchange [in 0.2.4.8-alpha]
   217  Tor Extended ORPort Authentication [for 0.2.5.x]
   218  Controller events to better understand connection/circuit usage [in 0.2.5.2-alpha]
   220  Migrate server identity keys to Ed25519 [in 0.3.0.1-alpha]
   221  Stop using CREATE_FAST [for 0.2.5.x]
   222  Stop sending client timestamps [in 0.2.4.18]
   224  Next-Generation Hidden Services in Tor [in 0.3.2.1-alpha]
   227  Include package fingerprints in consensus documents [in 0.2.6.3-alpha]
   228  Cross-certifying identity keys with onion keys
   232  Pluggable Transport through SOCKS proxy [in 0.2.6]
   235  Stop assigning (and eventually supporting) the Named flag [in 0.2.6, 0.2.7]
   236  The move to a single guard node
   237  All relays are directory servers [for 0.2.7.x]
   238  Better hidden service stats from Tor relays
   243  Give out HSDir flag only to relays with Stable flag
   244  Use RFC5705 Key Exporting in our AUTHENTICATE calls [in 0.3.0.1-alpha]
   250  Random Number Generation During Tor Voting
   251  Padding for netflow record resolution reduction [in 0.3.1.1-alpha]
   254  Padding Negotiation
   264  Putting version numbers on the Tor subprotocols [in 0.2.9.4-alpha]
   271  Another algorithm for guard selection [in 0.3.0.1-alpha]
   272  Listed routers should be Valid, Running, and treated as such [in 0.2.9.3-alpha, 0.2.9.4-alpha]
   274  Rotate onion keys less frequently [in 0.3.1.1-alpha]
   275  Stop including meaningful "published" time in microdescriptor consensus [for 0.3.1.x-alpha] [in 0.4.8.1-alpha]
   278  Directory Compression Scheme Negotiation [in 0.3.1.1-alpha]
   283  Move IPv6 ORPorts from microdescriptors to the microdesc consensus [for 0.3.3.x] [in 0.3.3.1-alpha]
   284  Hidden Service v3 Control Port
   289  Authenticating sendme cells to mitigate bandwidth attacks [in 0.4.1.1-alpha]
   292  Mesh-based vanguards
   293  Other ways for relays to know when to publish [for 0.3.5] [in 0.4.0.1-alpha]
   296  Have Directory Authorities expose raw bandwidth list files [in 0.4.0.1-alpha]
   297  Relaxing the protover-based shutdown rules [for 0.3.5.x] [in 0.4.0.x]
   298  Putting family lines in canonical form [for 0.3.6.x] [in 0.4.0.1-alpha]
   301  Don't include package fingerprints in consensus documents
   302  Hiding onion service clients using padding [in 0.4.1.1-alpha]
   304  Extending SOCKS5 Onion Service Error Codes
   305  ESTABLISH_INTRO Cell DoS Defense Extension
   310  Towards load-balancing in Prop 271
   314  Allow Markdown for proposal format
   315  Updating the list of fields required in directory documents [in 0.4.5.1-alpha]
   318  Limit protover values to 0-63 [in 0.4.5.1-alpha]
   327  A First Take at PoW Over Introduction Circuits
   328  Make Relays Report When They Are Overloaded
   332  Ntor protocol with extra data, version 3
   333  Vanguards lite [in 0.4.7.1-alpha]
   335  An authority-only design for MiddleOnly [in 0.4.7.2-alpha]
   336  Randomized schedule for guard retries
   337  A simpler way to decide, "Is this guard usable?"
   345  Migrating the tor specifications to mdbook
   351  Making SOCKS5 authentication extensions extensible [in Arti 1.2.8, Tor 0.4.9.1-alpha]
 SUPERSEDED:
   112  Bring Back Pathlen Coin Weight
   113  Simplifying directory authority administration
   118  Advertising multiple ORPorts at once
   124  Blocking resistant TLS certificate usage
   143  Improvements of Distributed Storage for Tor Hidden Service Descriptors
   145  Separate "suitable as a guard" from "suitable as a new guard"
   146  Add new flag to reflect long-term stability
   149  Using data from NETINFO cells
   153  Automatic software update protocol
   154  Automatic Software Update Protocol
   156  Tracking blocked ports on the client side
   163  Detecting whether a connection comes from a client
   169  Eliminate TLS renegotiation for the Tor connection handshake
   170  Configuration options regarding circuit building
   185  Directory caches without DirPort
   194  Mnemonic .onion URLs
   210  Faster Headless Consensus Bootstrapping
   225  Strawman proposal: commit-and-reveal shared rng
   242  Better performance and usability for the MyFamily option
   245  Deprecating and removing the TAP circuit extension protocol
   247  Defending Against Guard Discovery Attacks using Vanguards
   249  Allow CREATE cells with >505 bytes of handshake data
   252  Single Onion Services
   266  Removing current obsolete clients from the Tor network
   280  Privacy-Preserving Statistics with Privcount in Tor
   299  Preferring IPv4 or IPv6 based on IP Version Failure Count
   308  Counter Galois Onion: A New Proposal for Forward-Secure Relay Cryptography
   334  A Directory Authority Flag To Mark Relays As Middle-only
 DEAD:
   100  Tor Unreliable Datagram Extension Proposal
   115  Two Hop Paths
   116  Two hop paths from entry guards
   120  Shutdown descriptors when Tor servers stop
   128  Families of private bridges
   142  Combine Introduction and Rendezvous Points
   195  TLS certificate normalization for Tor 0.2.4.x
   213  Remove stream-level sendmes from the design
   253  Out of Band Circuit HMACs
   258  Denial-of-service resistance for directory authorities
   276  Report bandwidth with lower granularity in consensus documents
 REJECTED:
   134  More robust consensus voting with diverse authority sets
   147  Eliminate the need for v2 directories in generating v3 directories [for 0.2.4.x]
   165  Easy migration for voting authority sets
   168  Reduce default circuit window
   175  Automatically promoting Tor clients to nodes
   197  Message-based Inter-Controller IPC Channel
   229  Further SOCKS5 extensions
   233  Making Tor2Web mode faster
   234  Adding remittance field to directory specification
   241  Resisting guard-turnover attacks
   246  Merging Hidden Service Directories and Introduction Points
   286  Controller APIs for hibernation access on mobile
   320  Removing TAP usage from v2 onion services
 OBSOLETE:
   098  Proposals that should be written
   099  Miscellaneous proposals
   127  Relaying dirport requests to Tor download site / website
   131  Help users to verify they are using Tor
   132  A Tor Web Service For Verifying Correct Browser Configuration
   141  Download server descriptors on demand
   144  Increase the diversity of circuits by detecting nodes belonging the same provider
   164  Reporting the status of server votes
   173  GETINFO Option Expansion
   182  Credit Bucket
   189  AUTHORIZE and AUTHORIZED cells
   190  Bridge Client Authorization Based on a Shared Secret
   191  Bridge Detection Resistance against MITM-capable Adversaries
   192  Automatically retrieve and store information about bridges [for 0.2.[45].x]
   199  Integration of BridgeFinder and BridgeFinderHelper
   203  Avoiding censorship by impersonating an HTTPS server
   209  Tuning the Parameters for the Path Bias Defense [for 0.2.4.x+]
   230  How to change RSA1024 relay identity keys [for 0.2.?]
   231  Migrating authority RSA1024 identity keys [for 0.2.?]
   259  New Guard Selection Behaviour
   261  AEZ for relay cryptography
   263  Request to change key exchange protocol for handshake v1.2
   268  New Guard Selection Behaviour
   270  RebelAlliance: A Post-Quantum Secure Hybrid Handshake Based on NewHope
   319  RELAY_FRAGMENT cells
   325  Packed relay cells: saving space on small commands
 RESERVE:
   133  Incorporate Unreachable ORs into the Tor Network
   172  GETINFO controller option for circuit information
   177  Abstaining from votes on individual flags [for 0.2.4.x]
   188  Bridge Guards and other anti-enumeration defenses
   201  Make bridges report statistics on daily v3 network status requests [for 0.2.4.x]
   211  Internal Mapaddress for Tor Configuration Testing [for 0.2.4.x+]
   223  Ace: Improved circuit-creation key exchange
   226  "Scalability and Stability Improvements to BridgeDB: Switching to a Distributed Database System and RDBMS"
   255  Controller features to allow for load-balancing hidden services
   256  Key revocation for relays and authorities
   262  Re-keying live circuits with new cryptographic material
   273  Exit relay pinning for web services [for n/a]
   281  Downloading microdescriptors in bulk
   288  Privacy-Preserving Statistics with Privcount in Tor (Shamir version)
   307  Onion Balance Support for Onion Service v3
 INFORMATIONAL:
   159  Exit Scanning
   300  Walking Onions: Scaling and Saving Bandwidth
Filename: 001-process.txt
Title: The Tor Proposal Process
Author: Nick Mathewson
Created: 30-Jan-2007
Status: Meta

Overview:

   This document describes how to change the Tor specifications, how Tor
   proposals work, and the relationship between Tor proposals and the
   specifications.

   This is an informational document.

Motivation:

   Previously, our process for updating the Tor specifications was maximally
   informal: we'd patch the specification (sometimes forking first, and
   sometimes not), then discuss the patches, reach consensus, and implement
   the changes.

   This had a few problems.

   First, even at its most efficient, the old process would often have the
   spec out of sync with the code.  The worst cases were those where
   implementation was deferred: the spec and code could stay out of sync for
   versions at a time.

   Second, it was hard to participate in discussion, since you had to know
   which portions of the spec were a proposal, and which were already
   implemented.

   Third, it littered the specifications with too many inline comments.
     [This was a real problem -NM]
       [Especially when it went to multiple levels! -NM]
         [XXXX especially when they weren't signed and talked about that
          thing that you can't remember after a year]

How to change the specs now:

   First, somebody writes a proposal document.  It should describe the change
   that should be made in detail, and give some idea of how to implement it.
   Once it's fleshed out enough, it becomes a proposal.

   Like an RFC, every proposal gets a number.  Unlike RFCs, proposals can
   change over time and keep the same number, until they are finally
   accepted or rejected.  The history for each proposal
   will be stored in the Tor repository.

   Once a proposal is in the repository, we should discuss and improve it
   until we've reached consensus that it's a good idea, and that it's
   detailed enough to implement.  When this happens, we implement the
   proposal and incorporate it into the specifications.  Thus, the specs
   remain the canonical documentation for the Tor protocol: no proposal is
   ever the canonical documentation for an implemented feature.

   (This process is pretty similar to the Python Enhancement Process, with
   the major exception that Tor proposals get re-integrated into the specs
   after implementation, whereas PEPs _become_ the new spec.)

   {It's still okay to make small changes directly to the spec if the code
   can be
   written more or less immediately, or cosmetic changes if no code change is
   required.  This document reflects the current developers' _intent_, not
   a permanent promise to always use this process in the future: we reserve
   the right to get really excited and run off and implement something in a
   caffeine-or-m&m-fueled all-night hacking session.}

How new proposals get added:

  Once an idea has been proposed on the development list, a properly formatted
  (see below) draft exists, and rough consensus within the active development
  community exists that this idea warrants consideration, the proposal editors
  will officially add the proposal.

  To get your proposal in, send it to the tor-dev@lists.torproject.org mailing
  list.

What should go in a proposal:

   Every proposal should have a header containing these fields:
     Filename, Title, Author, Created, Status.

   These fields are optional but recommended:
     Target, Implemented-In, Ticket**.

   The Target field should describe which version the proposal is hoped to be
   implemented in (if it's Open or Accepted).  The Implemented-In field
   should describe which version the proposal was implemented in (if it's
   Finished or Closed).  The Ticket field should be a ticket number referring
   to Tor's canonical bug tracker (e.g. "#7144" refers to
   https://bugs.torproject.org/7144) or to a publicly accessible URI where one
   may subscribe to updates and/or retrieve information on implementation
   status.

   ** Proposals with assigned numbers of prop#283 and higher are REQUIRED to
      have a Ticket field if the Status is OPEN, ACCEPTED, CLOSED, or FINISHED.

   The body of the proposal should start with an Overview section explaining
   what the proposal's about, what it does, and about what state it's in.

   After the Overview, the proposal becomes more free-form.  Depending on its
   length and complexity, the proposal can break into sections as
   appropriate, or follow a short discursive format.  Every proposal should
   contain at least the following information before it is "ACCEPTED",
   though the information does not need to be in sections with these names.

      Motivation: What problem is the proposal trying to solve?  Why does
        this problem matter?  If several approaches are possible, why take this
        one?

      Design: A high-level view of what the new or modified features are, how
        the new or modified features work, how they interoperate with each
        other, and how they interact with the rest of Tor.  This is the main
        body of the proposal.  Some proposals will start out with only a
        Motivation and a Design, and wait for a specification until the
        Design seems approximately right.

      Security implications: What effects the proposed changes might have on
        anonymity, how well understood these effects are, and so on.

      Specification: A detailed description of what needs to be added to the
        Tor specifications in order to implement the proposal.  This should
        be in about as much detail as the specifications will eventually
        contain: it should be possible for independent programmers to write
        mutually compatible implementations of the proposal based on its
        specifications.

      Compatibility: Will versions of Tor that follow the proposal be
        compatible with versions that do not?  If so, how will compatibility
        be achieved?  Generally, we try to not drop compatibility if at
        all possible; we haven't made a "flag day" change since May 2004,
        and we don't want to do another one.

      Implementation: If the proposal will be tricky to implement in Tor's
        current architecture, the document can contain some discussion of how
        to go about making it work.  Actual patches should go on public git
        branches, or be uploaded to trac.

      Performance and scalability notes: If the feature will have an effect
        on performance (in RAM, CPU, bandwidth) or scalability, there should
        be some analysis on how significant this effect will be, so that we
        can avoid really expensive performance regressions, and so we can
        avoid wasting time on insignificant gains.

How to format proposals:

   Proposals may be written in plain text (like this one), or in Markdown.
   If using Markdown, the header must be wrapped in triple-backtick ("```")
   lines.  Whenever possible, we prefer the Commonmark dialect of Markdown.

Proposal status:

   Open: A proposal under discussion.

   Accepted: The proposal is complete, and we intend to implement it.
      After this point, substantive changes to the proposal should be
      avoided, and regarded as a sign of the process having failed
      somewhere.

   Finished: The proposal has been accepted and implemented.  After this
      point, the proposal should not be changed.

   Closed: The proposal has been accepted, implemented, and merged into the
      main specification documents.  The proposal should not be changed after
      this point.

   Rejected: We're not going to implement the feature as described here,
      though we might do some other version.  See comments in the document
      for details.  The proposal should not be changed after this point;
      to bring up some other version of the idea, write a new proposal.

   Draft: This isn't a complete proposal yet; there are definite missing
      pieces.  (Despite the existence of this status, the proposal editors
      may decline to accept incomplete proposals: please consider asking for
      help if you aren't sure how to solve an open issue.)  Proposals that
      remain in the Draft status for too long are likely to be marked as Dead
      or Obsolete.

   Needs-Revision: The idea for the proposal is a good one, but the proposal
      as it stands has serious problems that keep it from being accepted.
      See comments in the document for details.

   Dead: The proposal hasn't been touched in a long time, and it doesn't look
      like anybody is going to complete it soon.  It can become "Open" again
      if it gets a new proponent.

   Needs-Research: There are research problems that need to be solved before
      it's clear whether the proposal is a good idea.

   Meta: This is not a proposal, but a document about proposals.

   Reserve: This proposal is not something we're currently planning to
      implement, but we might want to resurrect it some day if we decide to
      do something like what it proposes.

   Informational: This proposal is the last word on what it's doing.
      It isn't going to turn into a spec unless somebody copy-and-pastes
      it into a new spec for a new subsystem.

   Obsolete: This proposal was flawed and has been superseded by another
     proposal. See comments in the document for details.

   The editors maintain the correct status of proposals, based on rough
   consensus and their own discretion.

Proposal numbering:

   Numbers 000-099 are reserved for special and meta-proposals.  100 and up
   are used for actual proposals.  Numbers aren't recycled.
Filename: 098-todo.txt
Title: Proposals that should be written
Author: Nick Mathewson, Roger Dingledine
Created: 26-Jan-2007
Status: Obsolete

{Obsolete: This document has been replaced by the tor-spec issue tracker.}

Overview:

   This document lists ideas that various people have had for improving the
   Tor protocol.  These should be implemented and specified if they're
   trivial, or written up as proposals if they're not.

   This is an active document, to be edited as proposals are written and as
   we come up with new ideas for proposals.  We should take stuff out as it
   seems irrelevant.


For some later protocol version.

  - It would be great to get smarter about identity and linkability.
    It's not crazy to say, "Never use the same circuit for my SSH
    connections and my web browsing."  How far can/should we take this?
    See ideas/xxx-separate-streams-by-port.txt for a start.

  - Fix onionskin handshake scheme to be more mainstream, less nutty.
    Can we just do
        E(HMAC(g^x), g^x) rather than just E(g^x) ?
    No, that has the same flaws as before. We should send
        E(g^x, C) with random C and expect g^y, HMAC_C(K=g^xy).
    Better ask Ian; probably Stephen too.

  - Length on CREATE and friends

  - Versioning on circuits and create cells, so we have a clear path
    to improve the circuit protocol.

  - SHA1 is showing its age.  We should get a design for upgrading our
    hash once the AHS competition is done, or even sooner.

  - Not being able to upgrade ciphersuites or increase key lengths is
    lame.
  - Paul has some ideas about circuit creation; read his PET paper once it's
    out.

Any time:

  - Some ideas for revising the directory protocol:
    - Extend the "r" line in network-status to give a set of buckets (say,
      comma-separated) for that router.
      - Buckets are deterministic based on IP address.
      - Then clients can choose a bucket (or set of buckets) to
        download and use.
    - We need a way for the authorities to declare that nodes are in a
      family.  Also, it kinda sucks that family declarations use O(N^2) space
      in the descriptors.
  - REASON_CONNECTFAILED should include an IP.
  - Spec should incorporate some prose from tor-design to be more readable.
  - Spec when we should rotate which keys
  - Spec how to publish descriptors less often
  - Describe pros and cons of non-deterministic path lengths

  - We should use a variable-length path length by default -- 3 +/- some
    distribution. Need to think harder about allowing values less than 3,
    and there's a tradeoff between having a wide variance and performance.

  - Clients currently use certs during TLS.  Is this wise?  It does make it
    easier for servers to tell which NATted client is which. We could use a
    seprate set of certs for each guard, I suppose, but generating so many
    certs could get expensive.  Omitting them entirely would make OP->OR
    easier to tell from OR->OR.

Things that should change...

B.1. ... but which will require backward-incompatible change

  - Circuit IDs should be longer.
  . IPv6 everywhere.
  - Maybe, keys should be longer.
    - Maybe, key-length should be adjustable.  How to do this without
      making anonymity suck?
  - Drop backward compatibility.
  - We should use a 128-bit subgroup of our DH prime.
  - Handshake should use HMAC.
  - Multiple cell lengths.
  - Ability to split circuits across paths (If this is useful.)
  - SENDME windows should be dynamic.

  - Directory
     - Stop ever mentioning socks ports

B.1. ... and that will require no changes

   - Advertised outbound IP?
   - Migrate streams across circuits.
   - Fix bug 469 by limiting the number of simultaneous connections per IP.

B.2. ... and that we have no idea how to do.

   - UDP (as transport)
   - UDP (as content)
   - Use a better AES mode that has built-in integrity checking,
     doesn't grow with the number of hops, is not patented, and
     is implemented and maintained by smart people.

Let onion keys be not just RSA but maybe DH too, for Paul's reply onion
design.

Filename: 099-misc.txt
Title: Miscellaneous proposals
Author: Various
Created: 26-Jan-2007
Status: Obsolete

{This document is obsolete; we only used it once, and we have implemented
its only idea.)

Overview:

   This document is for small proposal ideas that are about one paragraph in
   length.  From here, ideas can be rejected outright, expanded into full
   proposals, or specified and implemented as-is.

Proposals

1. Directory compression.

  Gzip would be easier to work with than zlib; bzip2 would result in smaller
  data lengths.  [Concretely, we're looking at about 10-15% space savings at
  the expense of 3-5x longer compression time for using bzip2.]  Doing
  on-the-fly gzip requires zlib 1.2 or later; doing bzip2 requires bzlib.
  Pre-compressing status documents in multiple formats would force us to use
  more memory to hold them.

  Status: Open

  -- Nick Mathewson


Filename: 100-tor-spec-udp.txt
Title: Tor Unreliable Datagram Extension Proposal
Author: Marc Liberatore
Created: 23 Feb 2006
Status: Dead

Overview:

   This is a modified version of the Tor specification written by Marc
   Liberatore to add UDP support to Tor.  For each TLS link, it adds a
   corresponding DTLS link: control messages and TCP data flow over TLS, and
   UDP data flows over DTLS.

   This proposal is not likely to be accepted as-is; see comments at the end
   of the document.


Contents

0. Introduction

  Tor is a distributed overlay network designed to anonymize low-latency
  TCP-based applications.  The current tor specification supports only
  TCP-based traffic.  This limitation prevents the use of tor to anonymize
  other important applications, notably voice over IP software.  This document
  is a proposal to extend the tor specification to support UDP traffic.

  The basic design philosophy of this extension is to add support for
  tunneling unreliable datagrams through tor with as few modifications to the
  protocol as possible.  As currently specified, tor cannot directly support
  such tunneling, as connections between nodes are built using transport layer
  security (TLS) atop TCP.  The latency incurred by TCP is likely unacceptable
  to the operation of most UDP-based application level protocols.

  Thus, we propose the addition of links between nodes using datagram
  transport layer security (DTLS).  These links allow packets to traverse a
  route through tor quickly, but their unreliable nature requires minor
  changes to the tor protocol.  This proposal outlines the necessary
  additions and changes to the tor specification to support UDP traffic.

  We note that a separate set of DTLS links between nodes creates a second
  overlay, distinct from the that composed of TLS links.  This separation and
  resulting decrease in each anonymity set's size will make certain attacks
  easier.  However, it is our belief that VoIP support in tor will
  dramatically increase its appeal, and correspondingly, the size of its user
  base, number of deployed nodes, and total traffic relayed.  These increases
  should help offset the loss of anonymity that two distinct networks imply.

1. Overview of Tor-UDP and its complications

  As described above, this proposal extends the Tor specification to support
  UDP with as few changes as possible.  Tor's overlay network is managed
  through TLS based connections; we will re-use this control plane to set up
  and tear down circuits that relay UDP traffic.  These circuits be built atop
  DTLS, in a fashion analogous to how Tor currently sends TCP traffic over
  TLS.

  The unreliability of DTLS circuits creates problems for Tor at two levels:

      1. Tor's encryption of the relay layer does not allow independent
      decryption of individual records. If record N is not received, then
      record N+1 will not decrypt correctly, as the counter for AES/CTR is
      maintained implicitly.

      2. Tor's end-to-end integrity checking works under the assumption that
      all RELAY cells are delivered.  This assumption is invalid when cells
      are sent over DTLS.

  The fix for the first problem is straightforward: add an explicit sequence
  number to each cell.  To fix the second problem, we introduce a
  system of nonces and hashes to RELAY packets.

  In the following sections, we mirror the layout of the Tor Protocol
  Specification, presenting the necessary modifications to the Tor protocol as
  a series of deltas.

2. Connections

  Tor-UDP uses DTLS for encryption of some links.  All DTLS links must have
  corresponding TLS links, as all control messages are sent over TLS.  All
  implementations MUST support the DTLS ciphersuite "[TODO]".

  DTLS connections are formed using the same protocol as TLS connections.
  This occurs upon request, following a CREATE_UDP or CREATE_FAST_UDP cell,
  as detailed in section 4.6.

  Once a paired TLS/DTLS connection is established, the two sides send cells
  to one another.  All but two types of cells are sent over TLS links.  RELAY
  cells containing the commands RELAY_UDP_DATA and RELAY_UDP_DROP, specified
  below, are sent over DTLS links.  [Should all cells still be 512 bytes long?
  Perhaps upon completion of a preliminary implementation, we should do a
  performance evaluation for some class of UDP traffic, such as VoIP. - ML]
  Cells may be sent embedded in TLS or DTLS records of any size or divided
  across such records.  The framing of these records MUST NOT leak any more
  information than the above differentiation on the basis of cell type.  [I am
  uncomfortable with this leakage, but don't see any simple, elegant way
  around it. -ML]

  As with TLS connections, DTLS connections are not permanent.

3. Cell format

  Each cell contains the following fields:

        CircID                                [2 bytes]
        Command                               [1 byte]
        Sequence Number                       [2 bytes]
        Payload (padded with 0 bytes)         [507 bytes]
                                         [Total size: 512 bytes]

  The 'Command' field holds one of the following values:
       0 -- PADDING         (Padding)                     (See Sec 6.2)
       1 -- CREATE          (Create a circuit)            (See Sec 4)
       2 -- CREATED         (Acknowledge create)          (See Sec 4)
       3 -- RELAY           (End-to-end data)             (See Sec 5)
       4 -- DESTROY         (Stop using a circuit)        (See Sec 4)
       5 -- CREATE_FAST     (Create a circuit, no PK)     (See Sec 4)
       6 -- CREATED_FAST    (Circuit created, no PK)      (See Sec 4)
       7 -- CREATE_UDP      (Create a UDP circuit)        (See Sec 4)
       8 -- CREATED_UDP     (Acknowledge UDP create)      (See Sec 4)
       9 -- CREATE_FAST_UDP (Create a UDP circuit, no PK) (See Sec 4)
      10 -- CREATED_FAST_UDP(UDP circuit created, no PK)  (See Sec 4)

  The sequence number allows for AES/CTR decryption of RELAY cells
  independently of one another; this functionality is required to support
  cells sent over DTLS.  The sequence number is described in more detail in
  section 4.5.

  [Should the sequence number only appear in RELAY packets?  The overhead is
  small, and I'm hesitant to force more code paths on the implementor. -ML]
  [There's already a separate relay header that has other material in it,
  so it wouldn't be the end of the world to move it there if it's
  appropriate. -RD]

  [Having separate commands for UDP circuits seems necessary, unless we can
  assume a flag day event for a large number of tor nodes. -ML]

4. Circuit management

4.2. Setting circuit keys

  Keys are set up for UDP circuits in the same fashion as for TCP circuits.
  Each UDP circuit shares keys with its corresponding TCP circuit.

  [If the keys are used for both TCP and UDP connections, how does it
  work to mix sequence-number-less cells with sequenced-numbered cells --
  how do you know you have the encryption order right? -RD]

4.3. Creating circuits

  UDP circuits are created as TCP circuits, using the *_UDP cells as
  appropriate.

4.4. Tearing down circuits

  UDP circuits are torn down as TCP circuits, using the *_UDP cells as
  appropriate.

4.5. Routing relay cells

  When an OR receives a RELAY cell, it checks the cell's circID and
  determines whether it has a corresponding circuit along that
  connection.  If not, the OR drops the RELAY cell.

  Otherwise, if the OR is not at the OP edge of the circuit (that is,
  either an 'exit node' or a non-edge node), it de/encrypts the payload
  with AES/CTR, as follows:
       'Forward' relay cell (same direction as CREATE):
           Use Kf as key; decrypt, using sequence number to synchronize
           ciphertext and keystream.
       'Back' relay cell (opposite direction from CREATE):
           Use Kb as key; encrypt, using sequence number to synchronize
           ciphertext and keystream.
  Note that in counter mode, decrypt and encrypt are the same operation.
  [Since the sequence number is only 2 bytes, what do you do when it
  rolls over? -RD]

  Each stream encrypted by a Kf or Kb has a corresponding unique state,
  captured by a sequence number; the originator of each such stream chooses
  the initial sequence number randomly, and increments it only with RELAY
  cells.  [This counts cells; unlike, say, TCP, tor uses fixed-size cells, so
  there's no need for counting bytes directly.  Right? - ML]
  [I believe this is true. You'll find out for sure when you try to
  build it. ;) -RD]

  The OR then decides whether it recognizes the relay cell, by
  inspecting the payload as described in section 5.1 below.  If the OR
  recognizes the cell, it processes the contents of the relay cell.
  Otherwise, it passes the decrypted relay cell along the circuit if
  the circuit continues.  If the OR at the end of the circuit
  encounters an unrecognized relay cell, an error has occurred: the OR
  sends a DESTROY cell to tear down the circuit.

  When a relay cell arrives at an OP, the OP decrypts the payload
  with AES/CTR as follows:
        OP receives data cell:
           For I=N...1,
               Decrypt with Kb_I, using the sequence number as above.  If the
               payload is recognized (see section 5.1), then stop and process
               the payload.

  For more information, see section 5 below.

4.6. CREATE_UDP and CREATED_UDP cells

  Users set up UDP circuits incrementally.  The procedure is similar to that
  for TCP circuits, as described in section 4.1.  In addition to the TLS
  connection to the first node, the OP also attempts to open a DTLS
  connection.  If this succeeds, the OP sends a CREATE_UDP cell, with a
  payload in the same format as a CREATE cell.  To extend a UDP circuit past
  the first hop, the OP sends an EXTEND_UDP relay cell (see section 5) which
  instructs the last node in the circuit to send a CREATE_UDP cell to extend
  the circuit.

  The relay payload for an EXTEND_UDP relay cell consists of:
         Address                       [4 bytes]
         TCP port                      [2 bytes]
         UDP port                      [2 bytes]
         Onion skin                    [186 bytes]
         Identity fingerprint          [20 bytes]

  The address field and ports denote the IPV4 address and ports of the next OR
  in the circuit.

  The payload for a CREATED_UDP cell or the relay payload for an
  RELAY_EXTENDED_UDP cell is identical to that of the corresponding CREATED or
  RELAY_EXTENDED cell.  Both circuits are established using the same key.

  Note that the existence of a UDP circuit implies the
  existence of a corresponding TCP circuit, sharing keys, sequence numbers,
  and any other relevant state.

4.6.1 CREATE_FAST_UDP/CREATED_FAST_UDP cells

  As above, the OP must successfully connect using DTLS before attempting to
  send a CREATE_FAST_UDP cell.  Otherwise, the procedure is the same as in
  section 4.1.1.

5. Application connections and stream management

5.1. Relay cells

  Within a circuit, the OP and the exit node use the contents of RELAY cells
  to tunnel end-to-end commands, TCP connections ("Streams"), and UDP packets
  across circuits.  End-to-end commands and UDP packets can be initiated by
  either edge; streams are initiated by the OP.

  The payload of each unencrypted RELAY cell consists of:
        Relay command           [1 byte]
        'Recognized'            [2 bytes]
        StreamID                [2 bytes]
        Digest                  [4 bytes]
        Length                  [2 bytes]
        Data                    [498 bytes]

  The relay commands are:
        1 -- RELAY_BEGIN        [forward]
        2 -- RELAY_DATA         [forward or backward]
        3 -- RELAY_END          [forward or backward]
        4 -- RELAY_CONNECTED    [backward]
        5 -- RELAY_SENDME       [forward or backward]
        6 -- RELAY_EXTEND       [forward]
        7 -- RELAY_EXTENDED     [backward]
        8 -- RELAY_TRUNCATE     [forward]
        9 -- RELAY_TRUNCATED    [backward]
       10 -- RELAY_DROP         [forward or backward]
       11 -- RELAY_RESOLVE      [forward]
       12 -- RELAY_RESOLVED     [backward]
       13 -- RELAY_BEGIN_UDP    [forward]
       14 -- RELAY_DATA_UDP     [forward or backward]
       15 -- RELAY_EXTEND_UDP   [forward]
       16 -- RELAY_EXTENDED_UDP [backward]
       17 -- RELAY_DROP_UDP     [forward or backward]

  Commands labelled as "forward" must only be sent by the originator
  of the circuit. Commands labelled as "backward" must only be sent by
  other nodes in the circuit back to the originator. Commands marked
  as either can be sent either by the originator or other nodes.

  The 'recognized' field in any unencrypted relay payload is always set to
  zero. 

  The 'digest' field can have two meanings.  For all cells sent over TLS
  connections (that is, all commands and all non-UDP RELAY data), it is
  computed as the first four bytes of the running SHA-1 digest of all the
  bytes that have been sent reliably and have been destined for this hop of
  the circuit or originated from this hop of the circuit, seeded from Df or Db
  respectively (obtained in section 4.2 above), and including this RELAY
  cell's entire payload (taken with the digest field set to zero).  Cells sent
  over DTLS connections do not affect this running digest.  Each cell sent
  over DTLS (that is, RELAY_DATA_UDP and RELAY_DROP_UDP) has the digest field
  set to the SHA-1 digest of the current RELAY cells' entire payload, with the
  digest field set to zero.  Coupled with a randomly-chosen streamID, this
  provides per-cell integrity checking on UDP cells.
  [If you drop malformed UDP relay cells but don't close the circuit,
  then this 8 bytes of digest is not as strong as what we get in the
  TCP-circuit side. Is this a problem? -RD]

  When the 'recognized' field of a RELAY cell is zero, and the digest
  is correct, the cell is considered "recognized" for the purposes of
  decryption (see section 4.5 above).

  (The digest does not include any bytes from relay cells that do
  not start or end at this hop of the circuit. That is, it does not
  include forwarded data. Therefore if 'recognized' is zero but the
  digest does not match, the running digest at that node should
  not be updated, and the cell should be forwarded on.)

  All RELAY cells pertaining to the same tunneled TCP stream have the
  same streamID.  Such streamIDs are chosen arbitrarily by the OP.  RELAY
  cells that affect the entire circuit rather than a particular
  stream use a StreamID of zero.

  All RELAY cells pertaining to the same UDP tunnel have the same streamID.
  This streamID is chosen randomly by the OP, but cannot be zero.

  The 'Length' field of a relay cell contains the number of bytes in
  the relay payload which contain real payload data. The remainder of
  the payload is padded with NUL bytes.

  If the RELAY cell is recognized but the relay command is not
  understood, the cell must be dropped and ignored. Its contents
  still count with respect to the digests, though. [Before
  0.1.1.10, Tor closed circuits when it received an unknown relay
  command. Perhaps this will be more forward-compatible. -RD]

5.2.1.  Opening UDP tunnels and transferring data

  To open a new anonymized UDP connection, the OP chooses an open
  circuit to an exit that may be able to connect to the destination
  address, selects a random streamID not yet used on that circuit,
  and constructs a RELAY_BEGIN_UDP cell with a payload encoding the address
  and port of the destination host.  The payload format is:

        ADDRESS | ':' | PORT | [00]

  where  ADDRESS can be a DNS hostname, or an IPv4 address in
  dotted-quad format, or an IPv6 address surrounded by square brackets;
  and where PORT is encoded in decimal.

  [What is the [00] for? -NM]
  [It's so the payload is easy to parse out with string funcs -RD]

  Upon receiving this cell, the exit node resolves the address as necessary.
  If the address cannot be resolved, the exit node replies with a RELAY_END
  cell.  (See 5.4 below.)  Otherwise, the exit node replies with a
  RELAY_CONNECTED cell, whose payload is in one of the following formats:
      The IPv4 address to which the connection was made [4 octets]
      A number of seconds (TTL) for which the address may be cached [4 octets]
   or
      Four zero-valued octets [4 octets]
      An address type (6)     [1 octet]
      The IPv6 address to which the connection was made [16 octets]
      A number of seconds (TTL) for which the address may be cached [4 octets]
  [XXXX Versions of Tor before 0.1.1.6 ignore and do not generate the TTL
  field.  No version of Tor currently generates the IPv6 format.]

  The OP waits for a RELAY_CONNECTED cell before sending any data.
  Once a connection has been established, the OP and exit node
  package UDP data in RELAY_DATA_UDP cells, and upon receiving such
  cells, echo their contents to the corresponding socket.
  RELAY_DATA_UDP cells sent to unrecognized streams are dropped.

  Relay RELAY_DROP_UDP cells are long-range dummies; upon receiving such
  a cell, the OR or OP must drop it.

5.3. Closing streams

  UDP tunnels are closed in a fashion corresponding to TCP connections.

6. Flow Control

  UDP streams are not subject to flow control.

7.2. Router descriptor format.

The items' formats are as follows:
   "router" nickname address ORPort SocksPort DirPort UDPPort

      Indicates the beginning of a router descriptor.  "address" must be
      an IPv4 address in dotted-quad format. The last three numbers
      indicate the TCP ports at which this OR exposes
      functionality. ORPort is a port at which this OR accepts TLS
      connections for the main OR protocol; SocksPort is deprecated and
      should always be 0; DirPort is the port at which this OR accepts
      directory-related HTTP connections; and UDPPort is a port at which
      this OR accepts DTLS connections for UDP data.  If any port is not
      supported, the value 0 is given instead of a port number.

Other sections:

What changes need to happen to each node's exit policy to support this? -RD

Switching to UDP means managing the queues of incoming packets better,
so we don't miss packets. How does this interact with doing large public
key operations (handshakes) in the same thread? -RD

========================================================================
COMMENTS
========================================================================

[16 May 2006]

I don't favor this approach; it makes packet traffic partitioned from
stream traffic end-to-end.  The architecture I'd like to see is:

  A *All* Tor-to-Tor traffic is UDP/DTLS, unless we need to fall back on
    TCP/TLS for firewall penetration or something.  (This also gives us an
    upgrade path for routing through legacy servers.)

  B Stream traffic is handled with end-to-end per-stream acks/naks and
    retries.  On failure, the data is retransmitted in a new RELAY_DATA cell;
    a cell isn't retransmitted.

We'll need to do A anyway, to fix our behavior on packet-loss.  Once we've
done so, B is more or less inevitable, and we can support end-to-end UDP
traffic "for free".

(Also, there are some details that this draft spec doesn't address.  For
example, what happens when a UDP packet doesn't fit in a single cell?)

-NM
Filename: 101-dir-voting.txt
Title: Voting on the Tor Directory System
Author: Nick Mathewson
Created: Nov 2006
Status: Closed
Implemented-In: 0.2.0.x

Overview

  This document describes a consensus voting scheme for Tor directories;
  instead of publishing different network statuses, directories would vote on
  and publish a single "consensus" network status document.

  This is an open proposal.

Proposal:

0. Scope and preliminaries

  This document describes a consensus voting scheme for Tor directories.
  Once it's accepted, it should be merged with dir-spec.txt.  Some
  preliminaries for authority and caching support should be done during
  the 0.1.2.x series; the main deployment should come during the 0.2.0.x
  series.

0.1. Goals and motivation: voting.

  The current directory system relies on clients downloading separate
  network status statements from the caches signed by each directory.
  Clients download a new statement every 30 minutes or so, choosing to
  replace the oldest statement they currently have.

  This creates a partitioning problem: different clients have different
  "most recent" networkstatus sources, and different versions of each
  (since authorities change their statements often).

  It also creates a scaling problem: most of the downloaded networkstatus
  are probably quite similar, and the redundancy grows as we add more
  authorities.

  So if we have clients only download a single multiply signed consensus
  network status statement, we can:
       - Save bandwidth.
       - Reduce client partitioning
       - Reduce client-side and cache-side storage
       - Simplify client-side voting code (by moving voting away from the
         client)

  We should try to do this without:
       - Assuming that client-side or cache-side clocks are more correct
         than we assume now.
       - Assuming that authority clocks are perfectly correct.
       - Degrading badly if a few authorities die or are offline for a bit.

  We do not have to perform well if:
      - No clique of more than half the authorities can agree about who
        the authorities are.

1. The idea.

  Instead of publishing a network status whenever something changes,
  each authority instead publishes a fresh network status only once per
  "period" (say, 60 minutes).  Authorities either upload this network
  status (or "vote") to every other authority, or download every other
  authority's "vote" (see 3.1 below for discussion on push vs pull).

  After an authority has (or has become convinced that it won't be able to
  get) every other authority's vote, it deterministically computes a
  consensus networkstatus, and signs it.  Authorities download (or are
  uploaded; see 3.1) one another's signatures, and form a multiply signed
  consensus.  This multiply-signed consensus is what caches cache and what
  clients download.

  If an authority is down, authorities vote based on what they *can*
  download/get uploaded.

  If an authority is "a little" down and only some authorities can reach
  it, authorities try to get its info from other authorities.

  If an authority computes the vote wrong, its signature isn't included on
  the consensus.

  Clients use a consensus if it is "trusted": signed by more than half the
  authorities they recognize. If clients can't find any such consensus,
  they use the most recent trusted consensus they have. If they don't
  have any trusted consensus, they warn the user and refuse to operate
  (and if DirServers is not the default, beg the user to adapt the list
  of authorities).

2. Details.

2.0. Versioning

  All documents generated here have version "3" given in their
  network-status-version entries.

2.1. Vote specifications

  Votes in v3 are similar to v2 network status documents.  We add these
  fields to the preamble:

     "vote-status" -- the word "vote".

     "valid-until" -- the time when this authority expects to publish its
        next vote.

     "known-flags" -- a space-separated list of flags that will sometimes
        be included on "s" lines later in the vote.

     "dir-source" -- as before, except the "hostname" part MUST be the
        authority's nickname, which MUST be unique among authorities, and
        MUST match the nickname in the "directory-signature" entry.

  Authorities SHOULD cache their most recently generated votes so they
  can persist them across restarts.  Authorities SHOULD NOT generate
  another document until valid-until has passed.

  Router entries in the vote MUST be sorted in ascending order by router
  identity digest.  The flags in "s" lines MUST appear in alphabetical
  order.

  Votes SHOULD be synchronized to half-hour publication intervals (one
  hour? XXX say more; be more precise.)

  XXXX some way to request older networkstatus docs?

2.2. Consensus directory specifications

  Consensuses are like v3 votes, except for the following fields:

     "vote-status" -- the word "consensus".

     "published" is the latest of all the published times on the votes.

     "valid-until" is the earliest of all the valid-until times on the
       votes.

     "dir-source" and "fingerprint" and "dir-signing-key" and "contact"
       are included for each authority that contributed to the vote.

     "vote-digest" for each authority that contributed to the vote,
       calculated as for the digest in the signature on the vote. [XXX
       re-English this sentence]

     "client-versions" and "server-versions" are sorted in ascending
       order based on version-spec.txt.

     "dir-options" and "known-flags" are not included.
[XXX really? why not list the ones that are used in the consensus?
For example, right now BadExit is in use, but no servers would be
labelled BadExit, and it's still worth knowing that it was considered
by the authorities. -RD]

  The fields MUST occur in the following order:
     "network-status-version"
     "vote-status"
     "published"
     "valid-until"
     For each authority, sorted in ascending order of nickname, case-
     insensitively:
         "dir-source", "fingerprint", "contact", "dir-signing-key",
         "vote-digest".
     "client-versions"
     "server-versions"

  The signatures at the end of the document appear as multiple instances
  of directory-signature, sorted in ascending order by nickname,
  case-insensitively.

  A router entry should be included in the result if it is included by more
  than half of the authorities (total authorities, not just those whose votes
  we have).  A router entry has a flag set if it is included by more than
  half of the authorities who care about that flag.  [XXXX this creates an
  incentive for attackers to DOS authorities whose votes they don't like.
  Can we remember what flags people set the last time we saw them? -NM]
  [Which 'we' are we talking here? The end-users never learn which
  authority sets which flags. So you're thinking the authorities
  should record the last vote they saw from each authority and if it's
  within a week or so, count all the flags that it advertised as 'no'
  votes? Plausible. -RD]

  The signature hash covers from the "network-status-version" line through
  the characters "directory-signature" in the first "directory-signature"
  line.

  Consensus directories SHOULD be rejected if they are not signed by more
  than half of the known authorities.

2.2.1. Detached signatures

  Assuming full connectivity, every authority should compute and sign the
  same consensus directory in each period.  Therefore, it isn't necessary to
  download the consensus computed by each authority; instead, the authorities
  only push/fetch each others' signatures.  A "detached signature" document
  contains a single "consensus-digest" entry and one or more
  directory-signature entries. [XXXX specify more.]

2.3. URLs and timelines

2.3.1. URLs and timeline used for agreement

  An authority SHOULD publish its vote immediately at the start of each voting
  period.  It does this by making it available at
     http://<hostname>/tor/status-vote/current/authority.z
  and sending it in an HTTP POST request to each other authority at the URL
     http://<hostname>/tor/post/vote

  If, N minutes after the voting period has begun, an authority does not have
  a current statement from another authority, the first authority retrieves
  the other's statement.

  Once an authority has a vote from another authority, it makes it available
  at
      http://<hostname>/tor/status-vote/current/<fp>.z
  where <fp> is the fingerprint of the other authority's identity key.

  The consensus network status, along with as many signatures as the server
  currently knows, should be available at
      http://<hostname>/tor/status-vote/current/consensus.z
  All of the detached signatures it knows for consensus status should be
  available at:
      http://<hostname>/tor/status-vote/current/consensus-signatures.z

  Once an authority has computed and signed a consensus network status, it
  should send its detached signature to each other authority in an HTTP POST
  request to the URL:
      http://<hostname>/tor/post/consensus-signature


  [XXXX Store votes to disk.]

2.3.2. Serving a consensus directory

  Once the authority is done getting signatures on the consensus directory,
  it should serve it from:
      http://<hostname>/tor/status/consensus.z

  Caches SHOULD download consensus directories from an authority and serve
  them from the same URL.

2.3.3. Timeline and synchronization

  [XXXX]

2.4. Distributing routerdescs between authorities

  Consensus will be more meaningful if authorities take steps to make sure
  that they all have the same set of descriptors _before_ the voting
  starts.  This is safe, since all descriptors are self-certified and
  timestamped: it's always okay to replace a signed descriptor with a more
  recent one signed by the same identity.

  In the long run, we might want some kind of sophisticated process here.
  For now, since authorities already download one another's networkstatus
  documents and use them to determine what descriptors to download from one
  another, we can rely on this existing mechanism to keep authorities up to
  date.

  [We should do a thorough read-through of dir-spec again to make sure
  that the authorities converge on which descriptor to "prefer" for
  each router. Right now the decision happens at the client, which is
  no longer the right place for it. -RD]

3. Questions and concerns

3.1. Push or pull?

  The URLs above define a push mechanism for publishing votes and consensus
  signatures via HTTP POST requests, and a pull mechanism for downloading
  these documents via HTTP GET requests.  As specified, every authority will
  post to every other.  The "download if no copy has been received" mechanism
  exists only as a fallback.

4. Migration

     * It would be cool if caches could get ready to download consensus
       status docs, verify enough signatures, and serve them now.  That way
       once stuff works all we need to do is upgrade the authorities.  Caches
       don't need to verify the correctness of the format so long as it's
       signed (or maybe multisigned?).  We need to make sure that caches back
       off very quickly from downloading consensus docs until they're
       actually implemented.

Filename: 102-drop-opt.txt
Title: Dropping "opt" from the directory format
Author: Nick Mathewson
Created: Jan 2007
Status: Closed
Implemented-In: 0.2.0.x

Overview:

  This document proposes a change in the format used to transmit router and
  directory information.

  This proposal has been accepted, implemented, and merged into dir-spec.txt.

Proposal:

  The "opt" keyword in Tor's directory formats was originally intended to
  mean, "it is okay to ignore this entry if you don't understand it"; the
  default behavior has been "discard a routerdesc if it contains entries you
  don't recognize."

  But so far, every new flag we have added has been marked 'opt'.  It would
  probably make sense to change the default behavior to "ignore unrecognized
  fields", and add the statement that clients SHOULD ignore fields they don't
  recognize.  As a meta-principle, we should say that clients and servers
  MUST NOT have to understand new fields in order to use directory documents
  correctly.

  Of course, this will make it impossible to say, "The format has changed a
  lot; discard this quietly if you don't understand it." We could do that by
  adding a version field.

Status:

     * We stopped requiring it as of 0.1.2.5-alpha.  We'll stop generating it
       once earlier formats are obsolete.


Filename: 103-multilevel-keys.txt
Title: Splitting identity key from regularly used signing key
Author: Nick Mathewson
Created: Jan 2007
Status: Closed
Implemented-In: 0.2.0.x

Overview:

  This document proposes a change in the way identity keys are used, so that
  highly sensitive keys can be password-protected and seldom loaded into RAM.

  It presents options; it is not yet a complete proposal.

Proposal:

  Replacing a directory authority's identity key in the event of a compromise
  would be tremendously annoying.  We'd need to tell every client to switch
  their configuration, or update to a new version with an uploaded list.  So
  long as some weren't upgraded, they'd be at risk from whoever had
  compromised the key.

  With this in mind, it's a shame that our current protocol forces us to
  store identity keys unencrypted in RAM.  We need some kind of signing key
  stored unencrypted, since we need to generate new descriptors/directories
  and rotate link and onion keys regularly.  (And since, of course, we can't
  ask server operators to be on-hand to enter a passphrase every time we
  want to rotate keys or sign a descriptor.)

  The obvious solution seems to be to have a signing-only key that lives
  indefinitely (months or longer) and signs descriptors and link keys, and a
  separate identity key that's used to sign the signing key.  Tor servers
  could run in one of several modes:
    1. Identity key stored encrypted.  You need to pick a passphrase when
       you enable this mode, and re-enter this passphrase every time you
       rotate the signing key.
    1'. Identity key stored separate.  You save your identity key to a
       floppy, and use the floppy when you need to rotate the signing key.
    2. All keys stored unencrypted.  In this case, we might not want to even
       *have* a separate signing key.  (We'll need to support no-separate-
       signing-key mode anyway to keep old servers working.)
    3. All keys stored encrypted. You need to enter a passphrase to start
       Tor.
  (Of course, we might not want to implement all of these.)

  Case 1 is probably most usable and secure, if we assume that people don't
  forget their passphrases or lose their floppies.  We could mitigate this a
  bit by encouraging people to PGP-encrypt their passphrases to themselves,
  or keep a cleartext copy of their secret key secret-split into a few
  pieces, or something like that.

  Migration presents another difficulty, especially with the authorities.  If
  we use the current set of identity keys as the new identity keys, we're in
  the position of having sensitive keys that have been stored on
  media-of-dubious-encryption up to now.  Also, we need to keep old clients
  (who will expect descriptors to be signed by the identity keys they know
  and love, and who will not understand signing keys) happy.

A possible solution:

  One thing to consider is that router identity keys are not very sensitive:
  if an OR disappears and reappears with a new key, the network treats it as
  though an old router had disappeared and a new one had joined the network.
  The Tor network continues unharmed; this isn't a disaster.

  Thus, the ideas above are mostly relevant for authorities.

  The most straightforward solution for the authorities is probably to take
  advantage of the protocol transition that will come with proposal 101, and
  introduce a new set of signing _and_ identity keys used only to sign votes
  and consensus network-status documents.  Signing and identity keys could be
  delivered to users in a separate, rarely changing "keys" document, so that
  the consensus network-status documents wouldn't need to include N signing
  keys, N identity keys, and N certifications.

  Note also that there is no reason that the identity/signing keys used by
  directory authorities would necessarily have to be the same as the identity
  keys those authorities use in their capacity as routers.  Decoupling these
  keys would give directory authorities the following set of keys:

       Directory authority identity:
           Highly confidential; stored encrypted and/or offline.  Used to
           identity directory authorities.  Shipped with clients.  Used to
           sign Directory authority signing keys.

       Directory authority signing key:
           Stored online, accessible to regular Tor process.  Used to sign
           votes and consensus directories.  Downloaded as part of a "keys"
           document.

           [Administrators SHOULD rotate their signing keys every month or
           two, just to keep in practice and keep from forgetting the
           password to the authority identity.]

       V1-V2 directory authority identity:
           Stored online, never changed.  Used to sign legacy network-status
           and directory documents.

       Router identity:
           Stored online, seldom changed.  Used to sign server descriptors
           for this authority in its role as a router.  Implicitly certified
           by being listed in network-status documents.

       Onion key, link key:
           As in tor-spec.txt


Extensions to Proposal 101.

  Define a new document type, "Key certificate".  It contains the
  following fields, in order:

    "dir-key-certificate-version": As network-status-version.  Must be
         "3".
    "fingerprint": Hex fingerprint, with spaces, based on the directory
         authority's identity key.
    "dir-identity-key": The long-term identity key for this authority.
    "dir-key-published": The time when this directory's signing key was
         last changed.
    "dir-key-expires": A time after which this key is no longer valid.
    "dir-signing-key": As in proposal 101.
    "dir-key-certification": A signature of the above fields, in order.
         The signed material extends from the beginning of
         "dir-key-certicate-version" through the newline after
         "dir-key-certification".  The identity key is used to generate
         this signature.

      These elements together constitute a "key certificate".  These are
      generated offline when starting a v3 authority.  Private identity
      keys SHOULD be stored offline, encrypted, or both.  A running
      authority only needs access to the signing key.

      Unlike other keys currently used by Tor, the authority identity
      keys and directory signing keys MAY be longer than 1024 bits.
      (They SHOULD be 2048 bits or longer; they MUST NOT be shorter than
      1024.)

  Vote documents change as follows:

      A key certificate MUST be included in-line in every vote document.  With
      the exception of "fingerprint", its elements MUST NOT appear in consensus
      documents.

  Consensus network statuses change as follows:

      Remove dir-signing-key.

      Change "directory-signature" to take a fingerprint of the authority's
      identity key and a fingerprint of the authority's current signing key
      rather than the authority's nickname.

      Change "dir-source" to take the a fingerprint of the authority's
      identity key rather than the authority's nickname or hostname.

  Add a new document type:

      A "keys" document contains all currently known key certificates.
      All authorities serve it at

          http://<hostname>/tor/status/keys.z

      Caches and clients download the keys document whenever they receive a
      consensus vote that uses a key they do not recognize.  Caches download
      from authorities; clients download from caches.

  Processing votes:

      When receiving a vote, authorities check to see if the key
      certificate for the voter is different from the one they have.  If
      the key certificate _is_ different, and its dir-key-published is
      more recent than the most recently known one, and it is
      well-formed and correctly signed with the correct identity key,
      then authorities remember it as the new canonical key certificate
      for that voter.

  A key certificate is invalid if any of the following hold:
      * The version is unrecognized.
      * The fingerprint does not match the identity key.
      * The identity key or the signing key is ill-formed.
      * The published date is very far in the past or future.

      * The signature is not a valid signature of the key certificate
        generated with the identity key.

  When processing the signatures on consensus, clients and caches act as
  follows:

      1. Only consider the directory-signature entries whose identity
         key hashes match trusted authorities.

      2. If any such entries have signing key hashes that match unknown
         signing keys, download a new keys document.

      3. For every entry with a known (identity key,signing key) pair,
         check the signature on the document.

      4. If the document has been signed by more than half of the
         authorities the client recognizes, treat the consensus as
         correctly signed.

         If not, but the number entries with known identity keys but
         unknown signing keys might be enough to make the consensus
         correctly signed, do not use the consensus, but do not discard
         it until we have a new keys document.
Filename: 104-short-descriptors.txt
Title: Long and Short Router Descriptors
Author: Nick Mathewson
Created: Jan 2007
Status: Closed
Implemented-In: 0.2.0.x

Overview:

  This document proposes moving unused-by-clients information from regular
  router descriptors into a new "extra info" router descriptor.

Proposal:

  Some of the costliest fields in the current directory protocol are ones
  that no client actually uses.  In particular, the "read-history" and
  "write-history" fields are used only by the authorities for monitoring the
  status of the network.  If we took them out, the size of a compressed list
  of all the routers would fall by about 60%.  (No other disposable field
  would save much more than 2%.)

  We propose to remove these fields from descriptors, and and have them
  uploaded as a part of a separate signed "extra info" to the authorities.
  This document will be signed.  A hash of this document will be included in
  the regular descriptors.

  (We considered another design, where routers would generate and upload a
  short-form and a long-form descriptor.  Only the short-form descriptor would
  ever be used by anybody for routing.  The long-form descriptor would be
  used only for analytics and other tools.   We decided against this because
  well-behaved tools would need to download short-form descriptors too (as
  these would be the only ones indexed), and hence get redundant info. Badly
  behaved tools would download only long-form descriptors, and expose
  themselves to partitioning attacks.)

Other disposable fields:

  Clients don't need these fields, but removing them doesn't help bandwidth
  enough to be worthwhile.
    contact (save about 1%)
    fingerprint (save about 3%)

  We could represent these fields more succinctly, but removing them would
  only save 1%.  (!)
    reject
    accept
  (Apparently, exit polices are highly compressible.)

  [Does size-on-disk matter to anybody? Some clients and servers don't
   have much disk, or have really slow disk (e.g. USB). And we don't
   store caches compressed right now. -RD]

Specification:

  1. Extra Info Format.

    An "extra info" descriptor contains the following fields:

    "extra-info" Nickname Fingerprint
        Identifies what router this is an extra info descriptor for.
        Fingerprint is encoded in hex (using upper-case letters), with
        no spaces.

    "published" As currently documented in dir-spec.txt.  It MUST match the
        "published" field of the descriptor published at the same time.

    "read-history"
    "write-history"
        As currently documented in dir-spec.txt.  Optional.

    "router-signature" NL Signature NL

        A signature of the PKCS1-padded hash of the entire extra info
        document, taken from the beginning of the "extra-info" line, through
        the newline after the "router-signature" line.  An extra info
        document is not valid unless the signature is performed with the
        identity key whose digest matches FINGERPRINT.

    The "extra-info" field is required and MUST appear first.  The
    router-signature field is required and MUST appear last.  All others are
    optional.  As for other documents, unrecognized fields must be ignored.

  2. Existing formats

     Implementations that use "read-history" and "write-history" SHOULD
     continue accepting router descriptors that contain them.  (Prior to
     0.2.0.x, this information was encoded in ordinary router descriptors;
     in any case they have always been listed as opt, so they should be
     accepted anyway.)

     Add these fields to router descriptors:

       "extra-info-digest" Digest
          "Digest" is a hex-encoded digest (using upper-case characters)
          of the router's extra-info document, as signed in the router's
          extra-info.  (If this field is absent, no extra-info-digest
          exists.)

       "caches-extra-info"
          Present if this router is a directory cache that provides
          extra-info documents, or an authority that handles extra-info
          documents.

     (Since implementations before 0.1.2.5-alpha required that the "opt"
     keyword precede any unrecognized entry, these keys MUST be preceded
     with "opt" until 0.1.2.5-alpha is obsolete.)

  3. New communications rules

     Servers SHOULD generate and upload one extra-info document after each
     descriptor they generate and upload; no more, no less.  Servers MUST
     upload the new descriptor before they upload the new extra-info.

     Authorities receiving an extra-info document SHOULD verify all of the
     following:
       * They have a router descriptor for some server with a matching
         nickname and identity fingerprint.
       * That server's identity key has been used to sign the extra-info
         document.
       * The extra-info-digest field in the router descriptor matches
         the digest of the extra-info document.
       * The published fields in the two documents match.

     Authorities SHOULD drop extra-info documents that do not meet these
     criteria.

     Extra-info documents MAY be uploaded as part of the same HTTP post as
     the router descriptor, or separately.  Authorities MUST accept both
     methods.

     Authorities SHOULD try to fetch extra-info documents from one another if
     they do not have one matching the digest declared in a router
     descriptor.

     Caches that are running locally with a tool that needs to use extra-info
     documents MAY download and store extra-info documents.  They should do
     so when they notice that the recommended descriptor has an
     extra-info-digest not matching any extra-info document they currently
     have.  (Caches not running on a host that needs to use extra-info
     documents SHOULD NOT download or cache them.)

  4. New URLs

     http://<hostname>/tor/extra/d/...
     http://<hostname>/tor/extra/fp/...
     http://<hostname>/tor/extra/all[.z]
        (As for /tor/server/ URLs: supports fetching extra-info documents
        by their digest, by the fingerprint of their servers, or all
        at once. When serving by fingerprint, we serve the extra-info
        that corresponds to the descriptor we would serve by that
        fingerprint. Only directory authorities are guaranteed to support
        these URLs.)

     http://<hostname>/tor/extra/authority[.z]
        (The extra-info document for this router.)

     Extra-info documents are uploaded to the same URLs as regular
     router descriptors.

Migration:

  For extra info approach:
     * First:
       * Authorities should accept extra info, and support serving it.
       * Routers should upload extra info once authorities accept it.
       * Caches should support an option to download and cache it, once
         authorities serve it.
       * Tools should be updated to use locally cached information.
         These tools include:
           lefkada's exit.py script.
           tor26's noreply script and general directory cache.
           https://nighteffect.us/tns/ for its graphs
           and check with or-talk for the rest, once it's time.

     * Set a cutoff time for including bandwidth in router descriptors, so
       that tools that use bandwidth info know that they will need to fetch
       extra info documents.

     * Once tools that want bandwidth info support fetching extra info:
       * Have routers stop including bandwidth info in their router
         descriptors.
Filename: 105-handshake-revision.txt
Title: Version negotiation for the Tor protocol.
Author: Nick Mathewson, Roger Dingledine
Created: Jan 2007
Status: Closed
Implemented-In: 0.2.0.x

Overview:

  This document was extracted from a modified version of tor-spec.txt that we
  had written before the proposal system went into place.  It adds two new
  cells types to the Tor link connection setup handshake: one used for
  version negotiation, and another to prevent MITM attacks.

  This proposal is partially implemented, and partially proceded by
  proposal 130.

Motivation: Tor versions

   Our *current* approach to versioning the Tor protocol(s) has been as
   follows:
     - All changes must be backward compatible.
     - It's okay to add new cell types, if they would be ignored by previous
       versions of Tor.
     - It's okay to add new data elements to cells, if they would be
       ignored by previous versions of Tor.
     - For forward compatibility, Tor must ignore cell types it doesn't
       recognize, and ignore data in those cells it doesn't expect.
     - Clients can inspect the version of Tor declared in the platform line
       of a router's descriptor, and use that to learn whether a server
       supports a given feature.  Servers, however, aren't assumed to all
       know about each other, and so don't know the version of who they're
       talking to.

   This system has these problems:
     - It's very hard to change fundamental aspects of the protocol, like the
       cell format, the link protocol, any of the various encryption schemes,
       and so on.
     - The router-to-router link protocol has remained more-or-less frozen
       for a long time, since we can't easily have an OR use new features
       unless it knows the other OR will understand them.

   We need to resolve these problems because:
     - Our cipher suite is showing its age: SHA1/AES128/RSA1024/DH1024 will
       not seem like the best idea for all time.
     - There are many ideas circulating for multiple cell sizes; while it's
       not obvious whether these are safe, we can't do them at all without a
       mechanism to permit them.
     - There are many ideas circulating for alternative circuit building and
       cell relay rules: they don't work unless they can coexist in the
       current network.
     - If our protocol changes a lot, it's hard to describe any coherent
       version of it: we need to say "the version that Tor versions W through
       X use when talking to versions Y through Z".  This makes analysis
       harder.

Motivation: Preventing MITM attacks

   TLS prevents a man-in-the-middle attacker from reading or changing the
   contents of a communication.  It does not, however, prevent such an
   attacker from observing timing information.  Since timing attacks are some
   of the most effective against low-latency anonymity nets like Tor, we
   should take more care to make sure that we're not only talking to who
   we think we're talking to, but that we're using the network path we
   believe we're using.

Motivation: Signed clock information

   It's very useful for Tor instances to know how skewed they are relative
   to one another.  The only way to find out currently has been to download
   directory information, and check the Date header--but this is not
   authenticated, and hence subject to modification on the wire.  Using
   BEGIN_DIR to create an authenticated directory stream through an existing
   circuit is better, but that's an extra step and it might be nicer to
   learn the information in the course of the regular protocol.

Proposal:

1.0. Version numbers

   The node-to-node TLS-based "OR connection" protocol and the multi-hop
   "circuit" protocol are versioned quasi-independently.

   Of course, some dependencies will continue to exist: Certain versions
   of the circuit protocol may require a minimum version of the connection
   protocol to be used.  The connection protocol affects:
     - Initial connection setup, link encryption, transport guarantees,
       etc.
     - The allowable set of cell commands
     - Allowable formats for cells.

   The circuit protocol determines:
     - How circuits are established and maintained
     - How cells are decrypted and relayed
     - How streams are established and maintained.

   Version numbers are incremented for backward-incompatible protocol changes
   only.  Backward-compatible changes are generally implemented by adding
   additional fields to existing structures; implementations MUST ignore
   fields they do not expect.  Unused portions of cells MUST be set to zero.

   Though versioning the protocol will make it easier to maintain backward
   compatibility with older versions of Tor, we will nevertheless continue to
   periodically drop support for older protocols,
      - to keep the implementation from growing without bound,
      - to limit the maintenance burden of patching bugs in obsolete Tors,
      - to limit the testing burden of verifying that many old protocol
        versions continue to be implemented properly, and
      - to limit the exposure of the network to protocol versions that are
        expensive to support.

   The Tor protocol as implemented through the 0.1.2.x Tor series will be
   called "version 1" in its link protocol and "version 1" in its relay
   protocol.  Versions of the Tor protocol so old as to be incompatible with
   Tor 0.1.2.x can be considered to be version 0 of each, and are not
   supported.

2.1. VERSIONS cells

   When a Tor connection is established, both parties normally send a
   VERSIONS cell before sending any other cells.  (But see below.)

         VersionsLen          [2 byte]
         Versions             [VersionsLen bytes]

   "Versions" is a sequence of VersionsLen bytes.  Each value between 1 and
   127 inclusive represents a single version; current implementations MUST
   ignore other bytes.  Parties should list all of the versions which they
   are able and willing to support.  Parties can only communicate if they
   have some connection protocol version in common.

   Version 0.2.0.x-alpha and earlier don't understand VERSIONS cells,
   and therefore don't support version negotiation.  Thus, waiting until
   the other side has sent a VERSIONS cell won't work for these servers:
   if the other side sends no cells back, it is impossible to tell
   whether they
   have sent a VERSIONS cell that has been stalled, or whether they have
   dropped our own VERSIONS cell as unrecognized.  Therefore, we'll
   change the TLS negotiation parameters so that old parties can still
   negotiate, but new parties can recognize each other.  Immediately
   after a TLS connection has been established, the parties check
   whether the other side negotiated the connection in an "old" way or a
   "new" way.  If either party negotiated in the "old" way, we assume a
   v1 connection.  Otherwise, both parties send VERSIONS cells listing
   all their supported versions.  Upon receiving the other party's
   VERSIONS cell, the implementation begins using the highest-valued
   version common to both cells.  If the first cell from the other party
   has a recognized command, and is _not_ a VERSIONS cell, we assume a
   v1 protocol.

   (For more detail on the TLS protocol change, see forthcoming draft
   proposals from Steven Murdoch.)

   Implementations MUST discard VERSIONS cells that are not the first
   recognized cells sent on a connection.

   The VERSIONS cell must be sent as a v1 cell (2 bytes of circuitID, 1
   byte of command, 509 bytes of payload).

   [NOTE: The VERSIONS cell is assigned the command number 7.]

2.2. MITM-prevention and time checking

   If we negotiate a v2 connection or higher, the second cell we send SHOULD
   be a NETINFO cell.  Implementations SHOULD NOT send NETINFO cells at other
   times.

   A NETINFO cell contains:
         Timestamp              [4 bytes]
         Other OR's address     [variable]
         Number of addresses    [1 byte]
         This OR's addresses    [variable]

   Timestamp is the OR's current Unix time, in seconds since the epoch.  If
   an implementation receives time values from many ORs that
   indicate that its clock is skewed, it SHOULD try to warn the
   administrator. (We leave the definition of 'many' intentionally vague
   for now.)

   Before believing the timestamp in a NETINFO cell, implementations
   SHOULD compare the time at which they received the cell to the time
   when they sent their VERSIONS cell.  If the difference is very large,
   it is likely that the cell was delayed long enough that its
   contents are out of date.

   Each address contains Type/Length/Value as used in Section 6.4 of
   tor-spec.txt.  The first address is the one that the party sending
   the NETINFO cell believes the other has -- it can be used to learn
   what your IP address is if you have no other hints.
   The rest of the addresses are the advertised addresses of the party
   sending the NETINFO cell -- we include them
   to block a man-in-the-middle attack on TLS that lets an attacker bounce
   traffic through his own computers to enable timing and packet-counting
   attacks.

   A Tor instance should use the other Tor's reported address
   information as part of logic to decide whether to treat a given
   connection as suitable for extending circuits to a given address/ID
   combination.  When we get an extend request, we use an
   existing OR connection if the ID matches, and ANY of the following
   conditions hold:
       - The IP matches the requested IP.
       - We know that the IP we're using is canonical because it was
         listed in the NETINFO cell.
       - We know that the IP we're using is canonical because it was
         listed in the server descriptor.

   [NOTE: The NETINFO cell is assigned the command number 8.]

Discussion: Versions versus feature lists

   Many protocols negotiate lists of available features instead of (or in
   addition to) protocol versions.  While it's possible that some amount of
   feature negotiation could be supported in a later Tor, we should prefer to
   use protocol versions whenever possible, for reasons discussed in
   the "Anonymity Loves Company" paper.

Discussion: Bytes per version, versions per cell

   This document provides for a one-byte count of how many versions a Tor
   supports, and allows one byte per version.  Thus, it can only support only
   254 more versions of the protocol beyond the unallocated v0 and the
   current v1.  If we ever need to split the protocol into 255 incompatible
   versions, we've probably screwed up badly somewhere.

   Nevertheless, here are two ways we could support more versions:
     - Change the version count to a two-byte field that counts the number of
       _bytes_ used, and use a UTF8-style encoding: versions 0 through 127
       take one byte to encode, versions 128 through 2047 take two bytes to
       encode, and so on.  We wouldn't need to parse any version higher than
       127 right now, since all bytes used to encode higher versions would
       have their high bit set.

       We'd still have a limit of 380 simultaneously versions that could be
       declared in any version.  This is probably okay.

     - Decide that if we need to support more versions, we can add a
       MOREVERSIONS cell that gets sent before the VERSIONS cell.  The spec
       above requires Tors to ignore unrecognized cell types that they get
       before the first VERSIONS cell, and still allows version negotiation
       to
       succeed.

   [Resolution: Reserve the high bit and the v0 value for later use.  If
    we ever have more live versions than we can fit in a cell, we've made a
    bad design decision somewhere along the line.]

Discussion: Reducing round-trips

   It might be appealing to see if we can cram more information in the
   initial VERSIONS cell.  For example, the contents of NETINFO will pretty
   soon be sent by everybody before any more information is exchanged, but
   decoupling them from the version exchange increases round-trips.

   Instead, we could speculatively include handshaking information at
   the end of a VERSIONS cell, wrapped in a marker to indicate, "if we wind
   up speaking VERSION 2, here's the NETINFO I'll send.  Otherwise, ignore
   this."  This could be extended to opportunistically reduce round trips
   when possible for future versions when we guess the versions right.

   Of course, we'd need to be careful about using a feature like this:
     - We don't want to include things that are expensive to compute,
       like PK signatures or proof-of-work.
     - We don't want to speculate as a mobile client: it may leak our
       experience with the server in question.

Discussion: Advertising versions in routerdescs and networkstatuses.

   In network-statuses:

     The networkstatus "v" line now has the format:
        "v" IMPLEMENTATION IMPL-VERSION "Link" LINK-VERSION-LIST
            "Circuit" CIRCUIT-VERSION-LIST NL

     LINK-VERSION-LIST and CIRCUIT-VERSION-LIST are comma-separated lists of
     supported version numbers.  IMPLEMENTATION is the name of the
     implementation of the Tor protocol (e.g., "Tor"), and IMPL-VERSION is the
     version of the implementation.

     Examples:
        v Tor 0.2.5.1-alpha Link 1,2,3 Circuit 2,5

        v OtherOR 2000+ Link 3 Circuit 5

     Implementations that release independently of the Tor codebase SHOULD NOT
     use "Tor" as the value of their IMPLEMENTATION.

     Additional fields on the "v" line MUST be ignored.

   In router descriptors:

     The router descriptor should contain a line of the form,
       "protocols" "Link" LINK-VERSION-LIST "Circuit" CIRCUIT_VERSION_LIST

     Additional fields on the "protocols" line MUST be ignored.

     [Versions of Tor before 0.1.2.5-alpha rejected router descriptors with
     unrecognized items; the protocols line should be preceded with an "opt"
     until these Tors are obsolete.]

Security issues:

   Client partitioning is the big danger when we introduce new versions; if a
   client supports some very unusual set of protocol versions, it will stand
   out from others no matter where it goes.  If a server supports an unusual
   version, it will get a disproportionate amount of traffic from clients who
   prefer that version.  We can mitigate this somewhat as follows:

     - Do not have clients prefer any protocol version by default until that
       version is widespread.  (First introduce the new version to servers,
       and have clients admit to using it only when configured to do so for
       testing.  Then, once many servers are running the new protocol
       version, enable its use by default.)

     - Do not multiply protocol versions needlessly.

     - Encourage protocol implementors to implement the same protocol version
       sets as some popular version of Tor.

     - Disrecommend very old/unpopular versions of Tor via the directory
       authorities' RecommmendedVersions mechanism, even if it is still
       technically possible to use them.

Filename: 106-less-tls-constraint.txt
Title: Checking fewer things during TLS handshakes
Author: Nick Mathewson
Created: 9-Feb-2007
Status: Closed
Implemented-In: 0.2.0.x

Overview:

    This document proposes that we relax our requirements on the context of
    X.509 certificates during initial TLS handshakes.

Motivation:

    Later, we want to try harder to avoid protocol fingerprinting attacks.
    This means that we'll need to make our connection handshake look closer
    to a regular HTTPS connection: one certificate on the server side and
    zero certificates on the client side.  For now, about the best we
    can do is to stop requiring things during handshake that we don't
    actually use.

What we check now, and where we check it:

 tor_tls_check_lifetime:
    peer has certificate
    notBefore <= now <= notAfter

 tor_tls_verify:
    peer has at least one certificate
    There is at least one certificate in the chain
    At least one of the certificates in the chain is not the one used to
        negotiate the connection.  (The "identity cert".)
    The certificate _not_ used to negotiate the connection has signed the
        link cert

 tor_tls_get_peer_cert_nickname:
    peer has a certificate.
    certificate has a subjectName.
    subjectName has a commonName.
    commonName consists only of characters in LEGAL_NICKNAME_CHARACTERS. [2]

 tor_tls_peer_has_cert:
    peer has a certificate.

 connection_or_check_valid_handshake:
    tor_tls_peer_has_cert [1]
    tor_tls_get_peer_cert_nickname [1]
    tor_tls_verify [1]
    If nickname in cert is a known, named router, then its identity digest
        must be as expected.
    If we initiated the connection, then we got the identity digest we
        expected.

 USEFUL THINGS WE COULD DO:

 [1] We could just not force clients to have any certificate at all, let alone
     an identity certificate.  Internally to the code, we could assign the
     identity_digest field of these or_connections to a random number, or even
     not add them to the identity_digest->or_conn map.
 [so if somebody connects with no certs, we let them. and mark them as
 a client and don't treat them as a server. great. -rd]

 [2] Instead of using a restricted nickname character set that makes our
     commonName structure look unlike typical SSL certificates, we could treat
     the nickname as extending from the start of the commonName up to but not
     including the first non-nickname character.

     Alternatively, we could stop checking commonNames entirely.  We don't
     actually _do_ anything based on the nickname in the certificate, so
     there's really no harm in letting every router have any commonName it
     wants.
 [this is the better choice -rd]
 [agreed. -nm]

REMAINING WAYS TO RECOGNIZE CLIENT->SERVER CONNECTIONS:

 Assuming that we removed the above requirements, we could then (in a later
 release) have clients not send certificates, and sometimes and started
 making our DNs a little less formulaic, client->server OR connections would
 still be recognizable by:
    having a two-certificate chain sent by the server
    using a particular set of ciphersuites
    traffic patterns
    probing the server later

OTHER IMPLICATIONS:

 If we stop verifying the above requirements:

    It will be slightly (but only slightly) more common to connect to a non-Tor
    server running TLS, and believe that you're talking to a Tor server (until
    you send the first cell).

    It will be far easier for non-Tor SSL clients to accidentally connect to
    Tor servers and speak HTTPS or whatever to them.

 If, in a later release, we have clients not send certificates, and we make
 DNs less recognizable:

    If clients don't send certs, servers don't need to verify them: win!

    If we remove these restrictions, it will be easier for people to write
    clients to fuzz our protocol: sorta win!

    If clients don't send certs, they look slightly less like servers.

OTHER SPEC CHANGES:

 When a client doesn't give us an identity, we should never extend any
 circuits to it (duh), and we should allow it to set circuit ID however it
 wants.
Filename: 107-uptime-sanity-checking.txt
Title: Uptime Sanity Checking
Author: Kevin Bauer & Damon McCoy
Created: 8-March-2007
Status: Closed
Implemented-In: 0.2.0.x

Overview:

   This document describes how to cap the uptime that is used when computing
   which routers are marked as stable such that highly stable routers cannot
   be displaced by malicious routers that report extremely high uptime
   values.

   This is similar to how bandwidth is capped at 1.5MB/s.

Motivation:

   It has been pointed out that an attacker can displace all stable nodes and
   entry guard nodes by reporting high uptimes. This is an easy fix that will
   prevent highly stable nodes from being displaced.

Security implications:

   It should decrease the effectiveness of routing attacks that report high
   uptimes while not impacting the normal routing algorithms.

Specification:

  So we could patch Section 3.1 of dir-spec.txt to say:

   "Stable" -- A router is 'Stable' if it is running, valid, not
   hibernating, and either its uptime is at least the median uptime for
   known running, valid, non-hibernating routers, or its uptime is at
   least 30 days. Routers are never called stable if they are running
   a version of Tor known to drop circuits stupidly.  (0.1.1.10-alpha
   through 0.1.1.16-rc are stupid this way.)

Compatibility:

   There should be no compatibility issues due to uptime capping.

Implementation:

   Implemented and merged into dir-spec in 0.2.0.0-alpha-dev (r9788).

Discussion:

   Initially, this proposal set the maximum at 60 days, not 30; the 30 day
   limit and spec wording was suggested by Roger in an or-dev post on 9 March
   2007.

   This proposal also led to 108-mtbf-based-stability.txt

Filename: 108-mtbf-based-stability.txt
Title: Base "Stable" Flag on Mean Time Between Failures
Author: Nick Mathewson
Created: 10-Mar-2007
Status: Closed
Implemented-In: 0.2.0.x

Overview:

   This document proposes that we change how directory authorities set the
   stability flag from inspection of a router's declared Uptime to the
   authorities' perceived mean time between failure for the router.

Motivation:

   Clients prefer nodes that the authorities call Stable.  This flag is (as
   of 0.2.0.0-alpha-dev) set entirely based on the node's declared value for
   uptime.  This creates an opportunity for malicious nodes to declare
   falsely high uptimes in order to get more traffic.

Spec changes:

   Replace the current rule for setting the Stable flag with:

   "Stable" -- A router is 'Stable' if it is active and its observed Stability
   for the past month is at or above the median Stability for active routers.
   Routers are never called stable if they are running a version of Tor
   known to drop circuits stupidly. (0.1.1.10-alpha through 0.1.1.16-rc
   are stupid this way.)

   Stability shall be defined as the weighted mean length of the runs
   observed by a given directory authority.  A run begins when an authority
   decides that the server is Running, and ends when the authority decides
   that the server is not Running.  In-progress runs are counted when
   measuring Stability.  When calculating the mean, runs are weighted by
   $\alpha ^ t$, where $t$ is time elapsed since the end of the run, and
   $0 < \alpha < 1$.  Time when an authority is down do not count to the
   length of the run.

Rejected Alternative:

   "A router's Stability shall be defined as the sum of $\alpha ^ d$ for every
   $d$ such that the router was considered reachable for the entire day
   $d$ days ago.

   This allows a simpler implementation: every day, we multiply
   yesterday's Stability by alpha, and if the router was observed to be
   available every time we looked today, we add 1.

   Instead of "day", we could pick an arbitrary time unit.  We should
   pick alpha to be high enough that long-term stability counts, but low
   enough that the distant past is eventually forgotten.  Something
   between .8 and .95 seems right.

   (By requiring that routers be up for an entire day to get their
   stability increased, instead of counting fractions of a day, we
   capture the notion that stability is more like "probability of
   staying up for the next hour" than it is like "probability of being
   up at some randomly chosen time over the next hour."  The former
   notion of stability is far more relevant for long-lived circuits.)

Limitations:

   Authorities can have false positives and false negatives when trying to
   tell whether a router is up or down.  So long as these aren't terribly
   wrong, and so long as they aren't significantly biased, we should be able
   to use them to estimate stability pretty well.

   Probing approaches like the above could miss short incidents of
   downtime.  If we use the router's declared uptime, we could detect
   these: but doing so would penalize routers who reported their uptime
   accurately.

Implementation:

   For now, the easiest way to store this information at authorities
   would probably be in some kind of periodically flushed flat file.
   Later, we could move to Berkeley db or something if we really had to.

   For each router, an authority will need to store:
     The router ID.
     Whether the router is up.
     The time when the current run started, if the router is up.
     The weighted sum length of all previous runs.
     The time at which the weighted sum length was last weighted down.

   Servers should probe at random intervals to test whether servers are
   running.
Filename: 109-no-sharing-ips.txt
Title: No more than one server per IP address
Author: Kevin Bauer & Damon McCoy
Created: 9-March-2007
Status: Closed
Implemented-In: 0.2.0.x

Overview:
  This document describes a solution to a Sybil attack vulnerability in the
  directory servers. Currently, it is possible for a single IP address to
  host an arbitrarily high number of Tor routers. We propose that the
  directory servers limit the number of Tor routers that may be registered at
  a particular IP address to some small (fixed) number, perhaps just one Tor
  router per IP address.

  While Tor never uses more than one server from a given /16 in the same
  circuit, an attacker with multiple servers in the same place is still
  dangerous because he can get around the per-server bandwidth cap that is
  designed to prevent a single server from attracting too much of the overall
  traffic.

Motivation:
  Since it is possible for an attacker to register an arbitrarily large
  number of Tor routers, it is possible for malicious parties to do this
  as part of a traffic analysis attack.

Security implications:
  This countermeasure will increase the number of IP addresses that an
  attacker must control in order to carry out traffic analysis.

Specification:

  For each IP address, each directory authority tracks the number of routers
  using that IP address, along with their total observed bandwidth.  If there
  are more than MAX_SERVERS_PER_IP servers at some IP, the authority should
  "disable" all but MAX_SERVERS_PER_IP servers.  When choosing which servers
  to disable, the authority should first disable non-Running servers in
  increasing order of observed bandwidth, and then should disable Running
  servers in increasing order of bandwidth.

  [[  We don't actually do this part here. -NM

  If the total observed
  bandwidth of the remaining non-"disabled" servers exceeds MAX_BW_PER_IP,
  the authority should "disable" some of the remaining servers until only one
  server remains, or until the remaining observed bandwidth of non-"disabled"
  servers is under MAX_BW_PER_IP.
  ]]

  Servers that are "disabled" MUST be marked as non-Valid and non-Running.

  MAX_SERVERS_PER_IP is 3.

  MAX_BW_PER_IP is 8 MB per s.

Compatibility:

  Upon inspection of a directory server, we found that the following IP
  addresses have more than one Tor router:

  Scruples    68.5.113.81     ip68-5-113-81.oc.oc.cox.net     443
  WiseUp      68.5.113.81     ip68-5-113-81.oc.oc.cox.net     9001
  Unnamed     62.1.196.71     pc01-megabyte-net-arkadiou.megabyte.gr  9001
  Unnamed     62.1.196.71     pc01-megabyte-net-arkadiou.megabyte.gr  9001
  Unnamed     62.1.196.71     pc01-megabyte-net-arkadiou.megabyte.gr  9001
  aurel       85.180.62.138   e180062138.adsl.alicedsl.de     9001
  sokrates    85.180.62.138   e180062138.adsl.alicedsl.de     9001
  moria1      18.244.0.188    moria.mit.edu   9001
  peacetime   18.244.0.188    moria.mit.edu   9100

  There may exist compatibility issues with this proposed fix.  Reasons why
  more than one server would share an IP address include:

  * Testing. moria1, moria2, peacetime, and other morias all run on one
    computer at MIT, because that way we get testing. Moria1 and moria2 are
    run by Roger, and peacetime is run by Nick.
  * NAT. If there are several servers but they port-forward through the same
    IP address, ... we can hope that the operators coordinate with each
    other. Also, we should recognize that while they help the network in
    terms of increased capacity, they don't help as much as they could in
    terms of location diversity. But our approach so far has been to take
    what we can get.
  * People who have more than 1.5MB/s and want to help out more. For
    example, for a while Tonga was offering 10MB/s and its Tor server
    would only make use of a bit of it. So Roger suggested that he run
    two Tor servers, to use more.

[Note Roger's tweak to this behavior, in
http://archives.seul.org/or/cvs/Oct-2007/msg00118.html]

Filename: 110-avoid-infinite-circuits.txt
Title: Avoiding infinite length circuits
Author: Roger Dingledine
Created: 13-Mar-2007
Status: Closed
Target: 0.2.3.x
Implemented-In: 0.2.1.3-alpha, 0.2.3.11-alpha

History:

  Revised 28 July 2008 by nickm: set K.
  Revised 3 July 2008 by nickm: rename from relay_extend to
     relay_early.  Revise to current migration plan.  Allow K cells
     over circuit lifetime, not just at start.

Overview:

  Right now, an attacker can add load to the Tor network by extending a
  circuit an arbitrary number of times. Every cell that goes down the
  circuit then adds N times that amount of load in overall bandwidth
  use. This vulnerability arises because servers don't know their position
  on the path, so they can't tell how many nodes there are before them
  on the path.

  We propose a new set of relay cells that are distinguishable by
  intermediate hops as permitting extend cells. This approach will allow
  us to put an upper bound on circuit length relative to the number of
  colluding adversary nodes; but there are some downsides too.

Motivation:

  The above attack can be used to generally increase load all across the
  network, or it can be used to target specific servers: by building a
  circuit back and forth between two victim servers, even a low-bandwidth
  attacker can soak up all the bandwidth offered by the fastest Tor
  servers.

  The general attacks could be used as a demonstration that Tor isn't
  perfect (leading to yet more media articles about "breaking" Tor), and
  the targetted attacks will come into play once we have a reputation
  system -- it will be trivial to DoS a server so it can't pass its
  reputation checks, in turn impacting security.

Design:

  We should split RELAY cells into two types: RELAY and RELAY_EARLY.

  Only K (say, 10) Relay_early cells can be sent across a circuit, and
  only relay_early cells are allowed to contain extend requests. We
  still support obscuring the length of the circuit (if more research
  shows us what to do), because Alice can choose how many of the K to
  mark as relay_early. Note that relay_early cells *can* contain any
  sort of data cell; so in effect it's actually the relay type cells
  that are restricted. By default, she would just send the first K
  data cells over the stream as relay_early cells, regardless of their
  actual type.

  (Note that a circuit that is out of relay_early cells MUST NOT be
  cannibalized later, since it can't extend.  Note also that it's always okay
  to use regular RELAY cells when sending non-EXTEND commands targetted at
  the first hop of a circuit, since there is no intermediate hop to try to
  learn the relay command type.)

  Each intermediate server would pass on the same type of cell that it
  received (either relay or relay_early), and the cell's destination
  will be able to learn whether it's allowed to contain an Extend request.

  If an intermediate server receives more than K relay_early cells, or
  if it sees a relay cell that contains an extend request, then it
  tears down the circuit (protocol violation).

Security implications:

  The upside is that this limits the bandwidth amplification factor to
  K: for an individual circuit to become arbitrary-length, the attacker
  would need an adversary-controlled node every K hops, and at that
  point the attack is no worse than if the attacker creates N/K separate
  K-hop circuits.

  On the other hand, we want to pick a large enough value of K that we
  don't mind the cap.

  If we ever want to take steps to hide the number of hops in the circuit
  or a node's position in the circuit, this design probably makes that
  more complex.

Migration:

  In 0.2.0, servers speaking v2 or later of the link protocol accept
  RELAY_EARLY cells, and pass them on.  If the next OR in the circuit
  is not speaking the v2 link protocol, the server relays the cell as
  a RELAY cell.

  In 0.2.1.3-alpha, clients begin using RELAY_EARLY cells on v2
  connections.  This functionality can be safely backported to
  0.2.0.x.  Clients should pick a random number betweeen (say) K and
  K-2 to send.

  In 0.2.1.3-alpha, servers close any circuit in which more than K
  relay_early cells are sent.

  Once all versions the do not send RELAY_EARLY cells are obsolete,
  servers can begin to reject any EXTEND requests not sent in a
  RELAY_EARLY cell.

Parameters:

  Let K = 8, for no terribly good reason.

Spec:

  [We can formalize this part once we think the design is a good one.]

Acknowledgements:

  This design has been kicking around since Christian Grothoff and I came
  up with it at PET 2004. (Nathan Evans, Christian Grothoff's student,
  is working on implementing a fix based on this design in the summer
  2007 timeframe.)

Filename: 111-local-traffic-priority.txt
Title: Prioritizing local traffic over relayed traffic
Author: Roger Dingledine
Created: 14-Mar-2007
Status: Closed
Implemented-In: 0.2.0.x

Overview:

  We describe some ways to let Tor users operate as a relay and enforce
  rate limiting for relayed traffic without impacting their locally
  initiated traffic.

Motivation:

  Right now we encourage people who use Tor as a client to configure it
  as a relay too ("just click the button in Vidalia"). Most of these users
  are on asymmetric links, meaning they have a lot more download capacity
  than upload capacity. But if they enable rate limiting too, suddenly
  they're limited to the same download capacity as upload capacity. And
  they have to enable rate limiting, or their upstream pipe gets filled
  up, starts dropping packets, and now their net connection doesn't work
  even for non-Tor stuff. So they end up turning off the relaying part
  so they can use Tor (and other applications) again.

  So far this hasn't mattered that much: most of our fast relays are
  being operated only in relay mode, so the rate limiting makes sense
  for them. But if we want to be able to attract many more relays in
  the future, we need to let ordinary users act as relays too.

  Further, as we begin to deploy the blocking-resistance design and we
  rely on ordinary users to click the "Tor for Freedom" button, this
  limitation will become a serious stumbling block to getting volunteers
  to act as bridges.

The problem:

  Tor implements its rate limiting on the 'read' side by only reading
  a certain number of bytes from the network in each second. If it has
  emptied its token bucket, it doesn't read any more from the network;
  eventually TCP notices and stalls until we resume reading. But if we
  want to have two classes of service, we can't know what class a given
  incoming cell will be until we look at it, at which point we've already
  read it.

Some options:

  Option 1: read when our token bucket is full enough, and if it turns
  out that what we read was local traffic, then add the tokens back into
  the token bucket. This will work when local traffic load alternates
  with relayed traffic load; but it's a poor option in general, because
  when we're receiving both local and relayed traffic, there are plenty
  of cases where we'll end up with an empty token bucket, and then we're
  back where we were before.

  More generally, notice that our problem is easy when a given TCP
  connection either has entirely local circuits or entirely relayed
  circuits. In fact, even if they are both present, if one class is
  entirely idle (none of its circuits have sent or received in the past
  N seconds), we can ignore that class until it wakes up again. So it
  only gets complex when a single connection contains active circuits
  of both classes.

  Next, notice that local traffic uses only the entry guards, whereas
  relayed traffic likely doesn't. So if we're a bridge handling just
  a few users, the expected number of overlapping connections would be
  almost zero, and even if we're a full relay the number of overlapping
  connections will be quite small.

  Option 2: build separate TCP connections for local traffic and for
  relayed traffic. In practice this will actually only require a few
  extra TCP connections: we would only need redundant TCP connections
  to at most the number of entry guards in use.

  However, this approach has some drawbacks. First, if the remote side
  wants to extend a circuit to you, how does it know which TCP connection
  to send it on? We would need some extra scheme to label some connections
  "client-only" during construction. Perhaps we could do this by seeing
  whether any circuit was made via CREATE_FAST; but this still opens
  up a race condition where the other side sends a create request
  immediately. The only ways I can imagine to avoid the race entirely
  are to specify our preference in the VERSIONS cell, or to add some
  sort of "nope, not this connection, why don't you try another rather
  than failing" response to create cells, or to forbid create cells on
  connections that you didn't initiate and on which you haven't seen
  any circuit creation requests yet -- this last one would lead to a bit
  more connection bloat but doesn't seem so bad. And we already accept
  this race for the case where directory authorities establish new TCP
  connections periodically to check reachability, and then hope to hang
  up on them soon after. (In any case this issue is moot for bridges,
  since each destination will be one-way with respect to extend requests:
  either receiving extend requests from bridge users or sending extend
  requests to the Tor server, never both.)

  The second problem with option 2 is that using two TCP connections
  reveals that there are two classes of traffic (and probably quickly
  reveals which is which, based on throughput). Now, it's unclear whether
  this information is already available to the other relay -- he would
  easily be able to tell that some circuits are fast and some are rate
  limited, after all -- but it would be nice to not add even more ways to
  leak that information. Also, it's less clear that an external observer
  already has this information if the circuits are all bundled together,
  and for this case it's worth trying to protect it.

  Option 3: tell the other side about our rate limiting rules. When we
  establish the TCP connection, specify the different policy classes we
  have configured. Each time we extend a circuit, specify which policy
  class that circuit should be part of. Then hope the other side obeys
  our wishes. (If he doesn't, hang up on him.) Besides the design and
  coordination hassles involved in this approach, there's a big problem:
  our rate limiting classes apply to all our connections, not just
  pairwise connections. How does one server we're connected to know how
  much of our bucket has already been spent by another? I could imagine
  a complex and inefficient "ok, now you can send me those two more cells
  that you've got queued" protocol. I'm not sure how else we could do it.

  (Gosh. How could UDP designs possibly be compatible with rate limiting
  with multiple bucket sizes?)

  Option 4: put both classes of circuits over a single connection, and
  keep track of the last time we read or wrote a high-priority cell. If
  it's been less than N seconds, give the whole connection high priority,
  else give the whole connection low priority.

  Option 5: put both classes of circuits over a single connection, and
  play a complex juggling game by periodically telling the remote side
  what rate limits to set for that connection, so you end up giving
  priority to the right connections but still stick to roughly your
  intended bandwidthrate and relaybandwidthrate.

  Option 6: ?

Prognosis:

  Nick really didn't like option 2 because of the partitioning questions.

  I've put option 4 into place as of Tor 0.2.0.3-alpha.

  In terms of implementation, it will be easy: just add a time_t to
  or_connection_t that specifies client_used (used by the initiator
  of the connection to rate limit it differently depending on how
  recently the time_t was reset). We currently update client_used
  in three places:
    - command_process_relay_cell() when we receive a relay cell for
      an origin circuit.
    - relay_send_command_from_edge() when we send a relay cell for
      an origin circuit.
    - circuit_deliver_create_cell() when send a create cell.
  We could probably remove the third case and it would still work,
  but hey.

Filename: 112-bring-back-pathlencoinweight.txt
Title: Bring Back Pathlen Coin Weight
Author: Mike Perry
Created:
Status: Superseded
Superseded-By: 115


Overview:

  The idea is that users should be able to choose a weight which
  probabilistically chooses their path lengths to be 2 or 3 hops. This
  weight will essentially be a biased coin that indicates an
  additional hop (beyond 2) with probability P. The user should be
  allowed to choose 0 for this weight to always get 2 hops and 1 to
  always get 3.

  This value should be modifiable from the controller, and should be
  available from Vidalia.


Motivation:

  The Tor network is slow and overloaded. Increasingly often I hear
  stories about friends and friends of friends who are behind firewalls,
  annoying censorware, or under surveillance that interferes with their
  productivity and Internet usage, or chills their speech. These people
  know about Tor, but they choose to put up with the censorship because
  Tor is too slow to be usable for them. In fact, to download a fresh,
  complete copy of levine-timing.pdf for the Anonymity Implications
  section of this proposal over Tor took me 3 tries.

  There are many ways to improve the speed problem, and of course we
  should and will implement as many as we can. Johannes's GSoC project
  and my reputation system are longer term, higher-effort things that
  will still provide benefit independent of this proposal.

  However, reducing the path length to 2 for those who do not need the
  (questionable) extra anonymity 3 hops provide not only improves
  their Tor experience but also reduces their load on the Tor network by
  33%, and can be done in less than 10 lines of code. That's not just
  Win-Win, it's Win-Win-Win.

  Furthermore, when blocking resistance measures insert an extra relay
  hop into the equation, 4 hops will certainly be completely unusable
  for these users, especially since it will be considerably more
  difficult to balance the load across a dark relay net than balancing
  the load on Tor itself (which today is still not without its flaws).


Anonymity Implications:

  It has long been established that timing attacks against mixed
  networks are extremely effective, and that regardless of path
  length, if the adversary has compromised your first and last
  hop of your path, you can assume they have compromised your
  identity for that connection.

  In [1], it is demonstrated that for all but the slowest, lossiest
  networks, error rates for false positives and false negatives were
  very near zero. Only for constant streams of traffic over slow and
  (more importantly) extremely lossy network links did the error rate
  hit 20%. For loss rates typical to the Internet, even the error rate
  for slow nodes with constant traffic streams was 13%.

  When you take into account that most Tor streams are not constant,
  but probably much more like their "HomeIP" dataset, which consists
  mostly of web traffic that exists over finite intervals at specific
  times, error rates drop to fractions of 1%, even for the "worst"
  network nodes.

  Therefore, the user has little benefit from the extra hop, assuming
  the adversary does timing correlation on their nodes. The real
  protection is the probability of getting both the first and last hop,
  and this is constant whether the client chooses 2 hops, 3 hops, or 42.

  Partitioning attacks form another concern. Since Tor uses telescoping
  to build circuits, it is possible to tell a user is constructing only
  two hop paths at the entry node. It is questionable if this data is
  actually worth anything though, especially if the majority of users
  have easy access to this option, and do actually choose their path
  lengths semi-randomly.

  Nick has postulated that exits may also be able to tell that you are
  using only 2 hops by the amount of time between sending their
  RELAY_CONNECTED cell and the first bit of RELAY_DATA traffic they
  see from the OP. I doubt that they will be able to make much use
  of this timing pattern, since it will likely vary widely depending
  upon the type of node selected for that first hop, and the user's
  connection rate to that first hop. It is also questionable if this
  data is worth anything, especially if many users are using this
  option (and I imagine many will).

  Perhaps most seriously, two hop paths do allow malicious guards
  to easily fail circuits if they do not extend to their colluding peers
  for the exit hop. Since guards can detect the number of hops in a
  path, they could always fail the 3 hop circuits and focus on
  selectively failing the two hop ones until a peer was chosen.

  I believe currently guards are rotated if circuits fail, which does
  provide some protection, but this could be changed so that an entry
  guard is completely abandoned after a certain ratio of extend or
  general circuit failures with respect to non-failed circuits. This 
  could possibly be gamed to increase guard turnover, but such a game 
  would be much more noticeable than an individual guard failing circuits, 
  though, since it would affect all clients, not just those who chose 
  a particular guard.


Why not fix Pathlen=2?:

  The main reason I am not advocating that we always use 2 hops is that
  in some situations, timing correlation evidence by itself may not be
  considered as solid and convincing as an actual, uninterrupted, fully
  traced path. Are these timing attacks as effective on a real network
  as they are in simulation? Would an extralegal adversary or authoritarian
  government even care? In the face of these situation-dependent unknowns,
  it should be up to the user to decide if this is a concern for them or not.

  It should probably also be noted that even a false positive
  rate of 1% for a 200k concurrent-user network could mean that for a
  given node, a given stream could be confused with something like 10
  users, assuming ~200 nodes carry most of the traffic (ie 1000 users
  each). Though of course to really know for sure, someone needs to do
  an attack on a real network, unfortunately.


Implementation:

  new_route_len() can be modified directly with a check of the
  PathlenCoinWeight option (converted to percent) and a call to
  crypto_rand_int(0,100) for the weighted coin.

  The entry_guard_t structure could have num_circ_failed and
  num_circ_succeeded members such that if it exceeds N% circuit 
  extend failure rate to a second hop, it is removed from the entry list. 
  N should be sufficiently high to avoid churn from normal Tor circuit 
  failure as determined by TorFlow scans.

  The Vidalia option should be presented as a boolean, to minimize confusion
  for the user. Something like a radiobutton with:
 
   * "I use Tor for Censorship Resistance, not Anonymity. Speed is more
      important to me than Anonymity."
   * "I use Tor for Anonymity. I need extra protection at the cost of speed."
  
  and then some explanation in the help for exactly what this means, and
  the risks involved with eliminating the adversary's need for timing attacks 
  wrt to false positives, etc.

Migration:

  Phase one: Experiment with the proper ratio of circuit failures
  used to expire garbage or malicious guards via TorFlow.

  Phase two: Re-enable config and modify new_route_len() to add an
  extra hop if coin comes up "heads".

  Phase three: Make radiobutton in Vidalia, along with help entry
  that explains in layman's terms the risks involved.


[1] http://www.cs.umass.edu/~mwright/papers/levine-timing.pdf
Filename: 113-fast-authority-interface.txt
Title: Simplifying directory authority administration
Author: Nick Mathewson
Created:
Status: Superseded

Overview

The problem:

  Administering a directory authority is a pain: you need to go through
  emails and manually add new nodes as "named".  When bad things come up,
  you need to mark nodes (or whole regions) as invalid, badexit, etc.

  This means that mostly, authority admins don't: only 2/4 current authority
  admins actually bind names or list bad exits, and those two have often
  complained about how annoying it is to do so.

  Worse, name binding is a common path, but it's a pain in the neck: nobody
  has done it for a couple of months.

Digression: who knows what?

  It's trivial for Tor to automatically keep track of all of the
  following information about a server:
    name, fingerprint, IP, last-seen time, first-seen time, declared
    contact.

  All we need to have the administrator set is:
    - Is this name/fingerprint pair bound?
    - Is this fingerprint/IP a bad exit?
    - Is this fingerprint/IP an invalid node?
    - Is this fingerprint/IP to be rejected?

  The workflow for authority admins has two parts:
    - Periodically, go through tor-ops and add new names.  This doesn't
      need to be done urgently.
    - Less often, mark badly behaved serves as badly behaved.  This is more
      urgent.

Possible solution #1: Web-interface for name binding.

  Deprecate use of the tor-ops mailing list; instead, have operators go to a
  webform and enter their server info.  This would put the information in a
  standardized format, thus allowing quick, nearly-automated approval and
  reply.

Possible solution #2: Self-binding names.

  Peter Palfrader has proposed that names be assigned automatically to nodes
  that have been up and running and valid for a while.

Possible solution #3: Self-maintaining approved-routers file

  Mixminion alpha has a neat feature where whenever a new server is seen,
  a stub line gets added to a configuration file.  For Tor, it could look
  something like this:

    ## First seen with this key on 2007-04-21 13:13:14
    ## Stayed up for at least 12 hours on IP 192.168.10.10
    #RouterName AAAABBBBCCCCDDDDEFEF

  (Note that the implementation needs to parse commented lines to make sure
  that it doesn't add duplicates, but that's not so hard.)

  To add a router as named, administrators would only need to uncomment the
  entry.  This automatically maintained file could be kept separately from a
  manually maintained one.

  This could be combined with solution #2, such that Tor would do the hard
  work of uncommenting entries for routers that should get Named, but
  operators could override its decisions.

Possible solution #4: A separate mailing list for authority operators.

  Right now, the tor-ops list is very high volume.  There should be another
  list that's only for dealing with problems that need prompt action, like
  marking a router as !badexit.

Resolution:

  Solution #2 is described in "Proposal 123: Naming authorities
  automatically create bindings", and that approach is implemented.
  There are remaining issues in the problem statement above that need
  their own solutions.
Filename: 114-distributed-storage.txt
Title: Distributed Storage for Tor Hidden Service Descriptors
Author: Karsten Loesing
Created: 13-May-2007
Status: Closed
Implemented-In: 0.2.0.x

Change history:

  13-May-2007  Initial proposal
  14-May-2007  Added changes suggested by Lasse Øverlier
  30-May-2007  Changed descriptor format, key length discussion, typos
  09-Jul-2007  Incorporated suggestions by Roger, added status of specification
               and implementation for upcoming GSoC mid-term evaluation
  11-Aug-2007  Updated implementation statuses, included non-consecutive
               replication to descriptor format
  20-Aug-2007  Renamed config option HSDir as HidServDirectoryV2
  02-Dec-2007  Closed proposal

Overview:

  The basic idea of this proposal is to distribute the tasks of storing and
  serving hidden service descriptors from currently three authoritative
  directory nodes among a large subset of all onion routers. The three
  reasons to do this are better robustness (availability), better
  scalability, and improved security properties. Further,
  this proposal suggests changes to the hidden service descriptor format to
  prevent new security threats coming from decentralization and to gain even
  better security properties.

Status:

  As of December 2007, the new hidden service descriptor format is implemented
  and usable. However, servers and clients do not yet make use of descriptor
  cookies, because there are open usability issues of this feature that might
  be resolved in proposal 121. Further, hidden service directories do not
  perform replication by themselves, because (unauthorized) replica fetch
  requests would allow any attacker to fetch all hidden service descriptors in
  the system. As neither issue is critical to the functioning of v2
  descriptors and their distribution, this proposal is considered as Closed.
  
Motivation:

  The current design of hidden services exhibits the following performance and
  security problems:

  First, the three hidden service authoritative directories constitute a
  performance bottleneck in the system. The directory nodes are responsible for
  storing and serving all hidden service descriptors. As of May 2007 there are
  about 1000 descriptors at a time, but this number is assumed to increase in
  the future. Further, there is no replication protocol for descriptors between
  the three directory nodes, so that hidden services must ensure the
  availability of their descriptors by manually publishing them on all
  directory nodes. Whenever a fourth or fifth hidden service authoritative
  directory is added, hidden services will need to maintain an equally
  increasing number of replicas. These scalability issues have an impact on the
  current usage of hidden services and put an even higher burden on the
  development of new kinds of applications for hidden services that might
  require storing even more descriptors.

  Second, besides posing a limitation to scalability, storing all hidden
  service descriptors on three directory nodes also constitutes a security
  risk. The directory node operators could easily analyze the publish and fetch
  requests to derive information on service activity and usage and read the
  descriptor contents to determine which onion routers work as introduction
  points for a given hidden service and need to be attacked or threatened to
  shut it down. Furthermore, the contents of a hidden service descriptor offer
  only minimal security properties to the hidden service. Whoever gets aware of
  the service ID can easily find out whether the service is active at the
  moment and which introduction points it has. This applies to (former)
  clients, (former) introduction points, and of course to the directory nodes.
  It requires only to request the descriptor for the given service ID, which
  can be performed by anyone anonymously.

  This proposal suggests two major changes to approach the described
  performance and security problems:

  The first change affects the storage location for hidden service descriptors.
  Descriptors are distributed among a large subset of all onion routers instead
  of three fixed directory nodes. Each storing node is responsible for a subset
  of descriptors for a limited time only. It is not able to choose which
  descriptors it stores at a certain time, because this is determined by its
  onion ID which is hard to change frequently and in time (only routers which
  are stable for a given time are accepted as storing nodes). In order to
  resist single node failures and untrustworthy nodes, descriptors are
  replicated among a certain number of storing nodes. A first replication
  protocol makes sure that descriptors don't get lost when the node population
  changes; therefore, a storing node periodically requests the descriptors from
  its siblings. A second replication protocol distributes descriptors among
  non-consecutive nodes of the ID ring to prevent a group of adversaries from
  generating new onion keys until they have consecutive IDs to create a 'black
  hole' in the ring and make random services unavailable. Connections to
  storing nodes are established by extending existing circuits by one hop to
  the storing node. This also ensures that contents are encrypted. The effect
  of this first change is that the probability that a single node operator
  learns about a certain hidden service is very small and that it is very hard
  to track a service over time, even when it collaborates with other node
  operators.
  
  The second change concerns the content of hidden service descriptors.
  Obviously, security problems cannot be solved only by decentralizing storage;
  in fact, they could also get worse if done without caution. At first, a
  descriptor ID needs to change periodically in order to be stored on changing
  nodes over time. Next, the descriptor ID needs to be computable only for the
  service's clients, but should be unpredictable for all other nodes. Further,
  the storing node needs to be able to verify that the hidden service is the
  true originator of the descriptor with the given ID even though it is not a
  client. Finally, a storing node should learn as little information as
  necessary by storing a descriptor, because it might not be as trustworthy as
  a directory node; for example it does not need to know the list of
  introduction points. Therefore, a second key is applied that is only known to
  the hidden service provider and its clients and that is not included in the
  descriptor. It is used to calculate descriptor IDs and to encrypt the
  introduction points. This second key can either be given to all clients
  together with the hidden service ID, or to a group or a single client as
  an authentication token. In the future this second key could be the result of
  some key agreement protocol between the hidden service and one or more
  clients. A new text-based format is proposed for descriptors instead of an
  extension of the existing binary format for reasons of future extensibility.

Design:

  The proposed design is described by the required changes to the current
  design. These requirements are grouped by content, rather than by affected
  specification documents or code files, and numbered for reference below.

  Hidden service clients, servers, and directories:

  /1/ Create routing list

    All participants can filter the consensus status document received from the
    directory authorities to one routing list containing only those servers
    that store and serve hidden service descriptors and which are running for
    at least 24 hours. A participant only trusts its own routing list and never
    learns about routing information from other parties.

  /2/ Determine responsible hidden service directory

    All participants can determine the hidden service directory that is
    responsible for storing and serving a given ID, as well as the hidden
    service directories that replicate its content. Every hidden service
    directory is responsible for the descriptor IDs in the interval from
    its predecessor, exclusive, to its own ID, inclusive. Further, a hidden
    service directory holds replicas for its n predecessors, where n denotes
    the number of consecutive replicas. (requires /1/)

  [/3/ and /4/ were requirements to use BEGIN_DIR cells for directory
   requests which have not been fulfilled in the course of the implementation
   of this proposal, but elsewhere.]

  Hidden service directory nodes:
    
  /5/ Advertise hidden service directory functionality

    Every onion router that has its directory port open can decide whether it
    wants to store and serve hidden service descriptors by setting a new config
    option "HidServDirectoryV2" 0|1 to 1. An onion router with this config
    option being set includes the flag "hidden-service-dir" in its router
    descriptors that it sends to directory authorities.

  /6/ Accept v2 publish requests, parse and store v2 descriptors

    Hidden service directory nodes accept publish requests for hidden service
    descriptors and store them to their local memory. (It is not necessary to
    make descriptors persistent, because after disconnecting, the onion router
    would not be accepted as storing node anyway, because it has not been
    running for at least 24 hours.) All requests and replies are formatted as
    HTTP messages. Requests are directed to the router's directory port and are
    contained within BEGIN_DIR cells. A hidden service directory node stores a
    descriptor only when it thinks that it is responsible for storing that
    descriptor based on its own routing table. Every hidden service directory
    node is responsible for the descriptor IDs in the interval of its n-th
    predecessor in the ID circle up to its own ID (n denotes the number of
    consecutive replicas). (requires /1/)

  /7/ Accept v2 fetch requests

    Same as /6/, but with fetch requests for hidden service descriptors.
    (requires /2/)

  /8/ Replicate descriptors with neighbors

    A hidden service directory node replicates descriptors from its two
    predecessors by downloading them once an hour. Further, it checks its
    routing table periodically for changes. Whenever it realizes that a
    predecessor has left the network, it establishes a connection to the new
    n-th predecessor and requests its stored descriptors in the interval of its
    (n+1)-th predecessor and the requested n-th predecessor. Whenever it
    realizes that a new onion router has joined with an ID higher than its
    former n-th predecessor, it adds it to its predecessors and discards all
    descriptors in the interval of its (n+1)-th and its n-th predecessor.
    (requires /1/)

    [Dec 02: This function has not been implemented, because arbitrary nodes
     what have been able to download the entire set of v2 descriptors. An
     authorized replication request would be necessary. For the moment, the
     system runs without any directory-side replication. -KL]

  Authoritative directory nodes:

  /9/ Confirm a router's hidden service directory functionality

    Directory nodes include a new flag "HSDir" for routers that decided to
    provide storage for hidden service descriptors and that are running for at
    least 24 hours. The last requirement prevents a node from frequently
    changing its onion key to become responsible for an identifier it wants to
    target.

  Hidden service provider:

  /10/ Configure v2 hidden service

    Each hidden service provider that has set the config option
    "PublishV2HidServDescriptors" 0|1 to 1 is configured to publish v2
    descriptors and conform to the v2 connection establishment protocol. When
    configuring a hidden service, a hidden service provider checks if it has
    already created a random secret_cookie and a hostname2 file; if not, it
    creates both of them. (requires /2/)

  /11/ Establish introduction points with fresh key

    If configured to publish only v2 descriptors and no v0/v1 descriptors any
    more, a hidden service provider that is setting up the hidden service at
    introduction points does not pass its own public key, but the public key
    of a freshly generated key pair. It also includes these fresh public keys
    in the hidden service descriptor together with the other introduction point
    information. The reason is that the introduction point does not need to and
    therefore should not know for which hidden service it works, so as to
    prevent it from tracking the hidden service's activity. (If a hidden
    service provider supports both, v0/v1 and v2 descriptors, v0/v1 clients
    rely on the fact that all introduction points accept the same public key,
    so that this new feature cannot be used.)

  /12/ Encode v2 descriptors and send v2 publish requests

    If configured to publish v2 descriptors, a hidden service provider
    publishes a new descriptor whenever its content changes or a new
    publication period starts for this descriptor. If the current publication
    period would only last for less than 60 minutes (= 2 x 30 minutes to allow
    the server to be 30 minutes behind and the client 30 minutes ahead), the
    hidden service provider publishes both a current descriptor and one for
    the next period. Publication is performed by sending the descriptor to all
    hidden service directories that are responsible for keeping replicas for
    the descriptor ID. This includes two non-consecutive replicas that are
    stored at 3 consecutive nodes each. (requires /1/ and /2/)

  Hidden service client:

  /13/ Send v2 fetch requests

    A hidden service client that has set the config option
    "FetchV2HidServDescriptors" 0|1 to 1 handles SOCKS requests for v2 onion
    addresses by requesting a v2 descriptor from a randomly chosen hidden
    service directory that is responsible for keeping replica for the
    descriptor ID. In total there are six replicas of which the first and the
    last three are stored on consecutive nodes. The probability of picking one
    of the three consecutive replicas is 1/6, 2/6, and 3/6 to incorporate the
    fact that the availability will be the highest on the node with next higher
    ID. A hidden service client relies on the hidden service provider to store
    two sets of descriptors to compensate clock skew between service and
    client. (requires /1/ and /2/)

  /14/ Process v2 fetch reply and parse v2 descriptors

    A hidden service client that has sent a request for a v2 descriptor can
    parse it and store it to the local cache of rendezvous service descriptors.

  /15/ Establish connection to v2 hidden service

    A hidden service client can establish a connection to a hidden service
    using a v2 descriptor. This includes using the secret cookie for decrypting
    the introduction points contained in the descriptor. When contacting an
    introduction point, the client does not use the public key of the hidden
    service provider, but the freshly-generated public key that is included in
    the hidden service descriptor. Whether or not a fresh key is used instead
    of the key of the hidden service depends on the available protocol versions
    that are included in the descriptor; by this, connection establishment is
    to a certain extend decoupled from fetching the descriptor.

  Hidden service descriptor:

  (Requirements concerning the descriptor format are contained in /6/ and /7/.)
  
    The new v2 hidden service descriptor format looks like this:

      onion-address = h(public-key) + cookie
      descriptor-id = h(h(public-key) + h(time-period + cookie + relica))
      descriptor-content = {
        descriptor-id,
        version,
        public-key,
        h(time-period + cookie + replica),
        timestamp,
        protocol-versions,
        { introduction-points } encrypted with cookie
      } signed with private-key

    The "descriptor-id" needs to change periodically in order for the
    descriptor to be stored on changing nodes over time. It may only be
    computable by a hidden service provider and all of his clients to prevent
    unauthorized nodes from tracking the service activity by periodically
    checking whether there is a descriptor for this service. Finally, the
    hidden service directory needs to be able to verify that the hidden service
    provider is the true originator of the descriptor with the given ID.
    
    Therefore, "descriptor-id" is derived from the "public-key" of the hidden
    service provider, the current "time-period" which changes every 24 hours,
    a secret "cookie" shared between hidden service provider and clients, and
    a "replica" denoting the number of this non-consecutive replica. (The
    "time-period" is constructed in a way that time periods do not change at
    the same moment for all descriptors by deriving a value between 0:00 and
    23:59 hours from h(public-key) and making the descriptors of this hidden
    service provider expire at that time of the day.) The "descriptor-id" is
    defined to be 160 bits long. [extending the "descriptor-id" length
    suggested by LØ]
    
    Only the hidden service provider and the clients are able to generate
    future "descriptor-ID"s. Hence, the "onion-address" is extended from now 
    the hash value of "public-key" by the secret "cookie". The "public-key" is
    determined to be 80 bits long, whereas the "cookie" is dimensioned to be
    120 bits long. This makes a total of 200 bits or 40 base32 chars, which is
    quite a lot to handle for a human, but necessary to provide sufficient
    protection against an adversary from generating a key pair with same
    "public-key" hash or guessing the "cookie".
    
    A hidden service directory can verify that a descriptor was created by the
    hidden service provider by checking if the "descriptor-id" corresponds to
    the "public-key" and if the signature can be verified with the
    "public-key".

    The "introduction-points" that are included in the descriptor are encrypted
    using the same "cookie" that is shared between hidden service provider and
    clients. [correction to use another key than h(time-period + cookie) as
    encryption key for introduction points made by LØ]

    A new text-based format is proposed for descriptors instead of an extension
    of the existing binary format for reasons of future extensibility.

Security implications:

  The security implications of the proposed changes are grouped by the roles of
  nodes that could perform attacks or on which attacks could be performed.

  Attacks by authoritative directory nodes

    Authoritative directory nodes are no longer the single places in the
    network that know about a hidden service's activity and introduction
    points. Thus, they cannot perform attacks using this information, e.g.
    track a hidden service's activity or usage pattern or attack its
    introduction points. Formerly, it would only require a single corrupted
    authoritative directory operator to perform such an attack.

  Attacks by hidden service directory nodes

    A hidden service directory node could misuse a stored descriptor to track a
    hidden service's activity and usage pattern by clients. Though there is no
    countermeasure against this kind of attack, it is very expensive to track a
    certain hidden service over time. An attacker would need to run a large
    number of stable onion routers that work as hidden service directory nodes
    to have a good probability to become responsible for its changing
    descriptor IDs. For each period, the probability is:

      1-(N-c choose r)/(N choose r) for N-c>=r and 1 otherwise, with N
      as total
      number of hidden service directories, c as compromised nodes, and r as
      number of replicas

    The hidden service directory nodes could try to make a certain hidden
    service unavailable to its clients. Therefore, they could discard all
    stored descriptors for that hidden service and reply to clients that there
    is no descriptor for the given ID or return an old or false descriptor
    content. The client would detect a false descriptor, because it could not
    contain a correct signature. But an old content or an empty reply could
    confuse the client. Therefore, the countermeasure is to replicate
    descriptors among a small number of hidden service directories, e.g. 5.
    The probability of a group of collaborating nodes to make a hidden service
    completely unavailable is in each period:

      (c choose r)/(N choose r) for c>=r and N>=r, and 0 otherwise,
      with N as total
      number of hidden service directories, c as compromised nodes, and r as
      number of replicas

    A hidden service directory could try to find out which introduction points
    are working on behalf of a hidden service. In contrast to the previous
    design, this is not possible anymore, because this information is encrypted
    to the clients of a hidden service.

  Attacks on hidden service directory nodes

    An anonymous attacker could try to swamp a hidden service directory with
    false descriptors for a given descriptor ID. This is prevented by requiring
    that descriptors are signed.

    Anonymous attackers could swamp a hidden service directory with correct
    descriptors for non-existing hidden services. There is no countermeasure
    against this attack. However, the creation of valid descriptors is more
    expensive than verification and storage in local memory. This should make
    this kind of attack unattractive.

  Attacks by introduction points

    Current or former introduction points could try to gain information on the
    hidden service they serve. But due to the fresh key pair that is used by
    the hidden service, this attack is not possible anymore.

  Attacks by clients

    Current or former clients could track a hidden service's activity, attack
    its introduction points, or determine the responsible hidden service
    directory nodes and attack them. There is nothing that could prevent them
    from doing so, because honest clients need the full descriptor content to
    establish a connection to the hidden service. At the moment, the only
    countermeasure against dishonest clients is to change the secret cookie and
    pass it only to the honest clients.

Compatibility:

  The proposed design is meant to replace the current design for hidden service
  descriptors and their storage in the long run.

  There should be a first transition phase in which both, the current design
  and the proposed design are served in parallel. Onion routers should start
  serving as hidden service directories, and hidden service providers and
  clients should make use of the new design if both sides support it. Hidden
  service providers should be allowed to publish descriptors of the current
  format in parallel, and authoritative directories should continue storing and
  serving these descriptors.

  After the first transition phase, hidden service providers should stop
  publishing descriptors on authoritative directories, and hidden service
  clients should not try to fetch descriptors from the authoritative
  directories. However, the authoritative directories should continue serving
  hidden service descriptors for a second transition phase. As of this point,
  all v2 config options should be set to a default value of 1.

  After the second transition phase, the authoritative directories should stop
  serving hidden service descriptors.

Filename: 115-two-hop-paths.txt
Title: Two Hop Paths
Author: Mike Perry
Created:
Status: Dead
Supersedes: 112


Overview:

  The idea is that users should be able to choose if they would like
  to have either two or three hop paths through the tor network. 

  Let us be clear: the users who would choose this option should be
  those that are concerned with IP obfuscation only: ie they would not be
  targets of a resource-intensive multi-node attack. It is sometimes said
  that these users should find some other network to use other than Tor.
  This is a foolish suggestion: more users improves security of everyone,
  and the current small userbase size is a critical hindrance to
  anonymity, as is discussed below and in [1].

  This value should be modifiable from the controller, and should be
  available from Vidalia.


Motivation:

  The Tor network is slow and overloaded. Increasingly often I hear
  stories about friends and friends of friends who are behind firewalls,
  annoying censorware, or under surveillance that interferes with their
  productivity and Internet usage, or chills their speech. These people
  know about Tor, but they choose to put up with the censorship because
  Tor is too slow to be usable for them. In fact, to download a fresh,
  complete copy of levine-timing.pdf for the Theoretical Argument
  section of this proposal over Tor took me 3 tries.

  Furthermore, the biggest current problem with Tor's anonymity for
  those who really need it is not someone attacking the network to
  discover who they are. It's instead the extreme danger that so few
  people use Tor because it's so slow, that those who do use it have
  essentially no confusion set.

  The recent case where the professor and the rogue Tor user were the
  only Tor users on campus, and thus suspected in an incident involving
  Tor and that University underscores this point: "That was why the police
  had come to see me. They told me that only two people on our campus were
  using Tor: me and someone they suspected of engaging in an online scam.
  The detectives wanted to know whether the other user was a former
  student of mine, and why I was using Tor"[1].

  Not only does Tor provide no anonymity if you use it to be anonymous
  but are obviously from a certain institution, location or circumstance,
  it is also dangerous to use Tor for risk of being accused of having
  something significant enough to hide to be willing to put up with
  the horrible performance as opposed to using some weaker alternative.

  There are many ways to improve the speed problem, and of course we
  should and will implement as many as we can. Johannes's GSoC project
  and my reputation system are longer term, higher-effort things that
  will still provide benefit independent of this proposal.

  However, reducing the path length to 2 for those who do not need the
  extra anonymity 3 hops provide not only improves their Tor experience
  but also reduces their load on the Tor network by 33%, and should
  increase adoption of Tor by a good deal. That's not just Win-Win, it's
  Win-Win-Win.


Who will enable this option?

  This is the crux of the proposal. Admittedly, there is some anonymity
  loss and some degree of decreased investment required on the part of
  the adversary to attack 2 hop users versus 3 hop users, even if it is
  minimal and limited mostly to up-front costs and false positives.

  The key questions are:

  1. Are these users in a class such that their risk is significantly
     less than the amount of this anonymity loss?

  2. Are these users able to identify themselves?

  Many many users of Tor are not at risk for an adversary capturing c/n
  nodes of the network just to see what they do. These users use Tor to
  circumvent aggressive content filters, or simply to keep their IP out of
  marketing and search engine databases. Most content filters have no
  interest in running Tor nodes to catch violators, and marketers
  certainly would never consider such a thing, both on a cost basis and a
  legal one.

  In a sense, this represents an alternate threat model against these
  users who are not at risk for Tor's normal threat model.

  It should be evident to these users that they fall into this class. All
  that should be needed is a radio button

   * "I use Tor for local content filter circumvention and/or IP obfuscation, 
      not anonymity. Speed is more important to me than high anonymity. 
      No one will make considerable efforts to determine my real IP."
   * "I use Tor for anonymity and/or national-level, legally enforced 
      censorship. It is possible effort will be taken to identify 
      me, including but not limited to network surveillance. I need more 
      protection."
 
  and then some explanation in the help for exactly what this means, and
  the risks involved with eliminating the adversary's need for timing
  attacks with respect to false positives. Ultimately, the decision is a
  simple one that can be made without this information, however. The user
  does not need Paul Syverson to instruct them on the deep magic of Onion
  Routing to make this decision. They just need to know why they use Tor.
  If they use it just to stay out of marketing databases and/or bypass a
  local content filter, two hops is plenty. This is likely the vast
  majority of Tor users, and many non-users we would like to bring on 
  board.

  So, having established this class of users, let us now go on to
  examine theoretical and practical risks we place them at, and determine
  if these risks violate the users needs, or introduce additional risk 
  to node operators who may be subject to requests from law enforcement
  to track users who need 3 hops, but use 2 because they enjoy the
  thrill of russian roulette.


Theoretical Argument:

  It has long been established that timing attacks against mixed
  and onion networks are extremely effective, and that regardless 
  of path length, if the adversary has compromised your first and 
  last hop of your path, you can assume they have compromised your
  identity for that connection.

  In fact, it was demonstrated that for all but the slowest, lossiest
  networks, error rates for false positives and false negatives were
  very near zero[2]. Only for constant streams of traffic over slow and
  (more importantly) extremely lossy network links did the error rate
  hit 20%. For loss rates typical to the Internet, even the error rate
  for slow nodes with constant traffic streams was 13%.

  When you take into account that most Tor streams are not constant,
  but probably much more like their "HomeIP" dataset, which consists
  mostly of web traffic that exists over finite intervals at specific
  times, error rates drop to fractions of 1%, even for the "worst"
  network nodes.

  Therefore, the user has little benefit from the extra hop, assuming
  the adversary does timing correlation on their nodes. Since timing
  correlation is simply an implementation issue and is most likely
  a single up-front cost (and one that is like quite a bit cheaper
  than the cost of the machines purchased to host the nodes to mount
  an attack), the real protection is the low probability of getting
  both the first and last hop of a client's stream.


Practical Issues:

  Theoretical issues aside, there are several practical issues with the
  implementation of Tor that need to be addressed to ensure that
  identity information is not leaked by the implementation.

  Exit policy issues:

  If a client chooses an exit with a very restrictive exit policy
  (such as an IP or IP range), the first hop then knows a good deal
  about the destination. For this reason, clients should not select
  exits that match their destination IP with anything other than "*".

  Partitioning:

  Partitioning attacks form another concern. Since Tor uses telescoping
  to build circuits, it is possible to tell a user is constructing only
  two hop paths at the entry node and on the local network. An external
  adversary can potentially differentiate 2 and 3 hop users, and decide
  that all IP addresses connecting to Tor and using 3 hops have something
  to hide, and should be scrutinized more closely or outright apprehended.

  One solution to this is to use the "leaky-circuit" method of attaching
  streams: The user always creates 3-hop circuits, but if the option
  is enabled, they always exit from their 2nd hop. The ideal solution
  would be to create a RELAY_SHISHKABOB cell which contains onion
  skins for every host along the path, but this requires protocol
  changes at the nodes to support.

  Guard nodes:

  Since guard nodes can rotate due to client relocation, network
  failure, node upgrades and other issues, if you amortize the risk a
  mobile, dialup, or otherwise intermittently connected user is exposed to
  over any reasonable duration of Tor usage (on the order of a year), it
  is the same with or without guard nodes. Assuming an adversary has
  c%/n% of network bandwidth, and guards rotate on average with period R,
  statistically speaking, it's merely a question of if the user wishes
  their risk to be concentrated with probability c/n over an expected
  period of R*c, and probability 0 over an expected period of R*(n-c),
  versus a continuous risk of (c/n)^2. So statistically speaking, guards
  only create a time-tradeoff of risk over the long run for normal Tor
  usage. Rotating guards do not reduce risk for normal client usage long
  term.[3]

  On other other hand, assuming a more stable method of guard selection
  and preservation is devised, or a more stable client side network than 
  my own is typical (which rotates guards frequently due to network issues
  and moving about), guard nodes provide a tradeoff in the form of c/n% of
  the users being "sacrificial users" who are exposed to high risk O(c/n)
  of identification, while the rest of the network is exposed to zero
  risk.

  The nature of Tor makes it likely an adversary will take a "shock and
  awe" approach to suppressing Tor by rounding up a few users whose
  browsing activity has been observed to be made into examples, in an
  attempt to prove that Tor is not perfect.

  Since this "shock and awe" attack can be applied with or without guard
  nodes, stable guard nodes do offer a measure of accountability of sorts.
  If a user was using a small set of guard nodes and knows them well, and
  then is suddenly apprehended as a result of Tor usage, having a fixed
  set of entry points to suspect is a lot better than suspecting the whole
  network. Conversely, it can also give non-apprehended users comfort
  that they are likely to remain safe indefinitely with their set of (now
  presumably trusted) guards. This is probably the most beneficial
  property of reliable guards: they deter the adversary from mounting
  "shock and awe" attacks because the surviving users will not
  intimidated, but instead made more confident. Of course, guards need to
  be made much more stable and users need to be encouraged to know their
  guards for this property to really take effect. 

  This beneficial property of client vigilance also carries over to an
  active adversary, except in this case instead of relying on the user
  to remember their guard nodes and somehow communicate them after
  apprehension, the code can alert them to the presence of an active
  adversary before they are apprehended. But only if they use guard nodes.

  So lets consider the active adversary: Two hop paths allow malicious
  guards to get considerably more benefit from failing circuits if they do
  not extend to their colluding peers for the exit hop. Since guards can
  detect the number of hops in a path via either timing or by statistical
  analysis of the exit policy of the 2nd hop, they can perform this attack
  predominantly against 2 hop users.

  This can be addressed by completely abandoning an entry guard after a
  certain ratio of extend or general circuit failures with respect to
  non-failed circuits. The proper value for this ratio can be determined
  experimentally with TorFlow. There is the possibility that the local
  network can abuse this feature to cause certain guards to be dropped,
  but they can do that anyways with the current Tor by just making guards
  they don't like unreachable. With this mechanism, Tor will complain
  loudly if any guard failure rate exceeds the expected in any failure
  case, local or remote.

  Eliminating guards entirely would actually not address this issue due
  to the time-tradeoff nature of risk. In fact, it would just make it
  worse. Without guard nodes, it becomes much more difficult for clients
  to become alerted to Tor entry points that are failing circuits to make
  sure that they only devote bandwidth to carry traffic for streams which
  they observe both ends. Yet the rogue entry points are still able to
  significantly increase their success rates by failing circuits.

  For this reason, guard nodes should remain enabled for 2 hop users,
  at least until an IP-independent, undetectable guard scanner can
  be created. TorFlow can scan for failing guards, but after a while, 
  its unique behavior gives away the fact that its IP is a scanner and 
  it can be given selective service.
  
  Consideration of risks for node operators:

  There is a serious risk for two hop users in the form of guard
  profiling. If an adversary running an exit node notices that a
  particular site is always visited from a fixed previous hop, it is
  likely that this is a two hop user using a certain guard, which could be
  monitored to determine their identity. Thus, for the protection of both
  2 hop users and node operators, 2 hop users should limit their guard
  duration to a sufficient number of days to verify reliability of a node,
  but not much more. This duration can be determined experimentally by
  TorFlow.

  Considering a Tor client builds on average 144 circuits/day (10
  minutes per circuit), if the adversary owns c/n% of exits on the
  network, they can expect to see 144*c/n circuits from this user, or
  about 14 minutes of usage per day per percentage of network penetration.
  Since it will take several occurrences of user-linkable exit content
  from the same predecessor hop for the adversary to have any confidence
  this is a 2 hop user, it is very unlikely that any sort of demands made
  upon the predecessor node would guaranteed to be effective (ie it
  actually was a guard), let alone be executed in time to apprehend the 
  user before they rotated guards.

  The reverse risk also warrants consideration. If a malicious guard has
  orders to surveil Mike Perry, it can determine Mike Perry is using two
  hops by observing his tendency to choose a 2nd hop with a viable exit
  policy. This can be done relatively quickly, unfortunately, and
  indicates Mike Perry should spend some of his time building real 3 hop
  circuits through the same guards, to require them to at least wait for
  him to actually use Tor to determine his style of operation, rather than
  collect this information from his passive building patterns.

  However, to actively determine where Mike Perry is going, the guard
  will need to require logging ahead of time at multiple exit nodes that
  he may use over the course of the few days while he is at that guard,
  and correlate the usage times of the exit node with Mike Perry's
  activity at that guard for the few days he uses it. At this point, the
  adversary is mounting a scale and method of attack (widespread logging,
  timing attacks) that works pretty much just as effectively against 3
  hops, so exit node operators are exposed to no additional danger than
  they otherwise normally are.


Why not fix Pathlen=2?:

  The main reason I am not advocating that we always use 2 hops is that
  in some situations, timing correlation evidence by itself may not be
  considered as solid and convincing as an actual, uninterrupted, fully
  traced path. Are these timing attacks as effective on a real network as
  they are in simulation? Maybe the circuit multiplexing of Tor can serve 
  to frustrate them to a degree? Would an extralegal adversary or 
  authoritarian government even care? In the face of these situation 
  dependent unknowns, it should be up to the user to decide if this is 
  a concern for them or not.

  It should probably also be noted that even a false positive
  rate of 1% for a 200k concurrent-user network could mean that for a
  given node, a given stream could be confused with something like 10
  users, assuming ~200 nodes carry most of the traffic (ie 1000 users
  each). Though of course to really know for sure, someone needs to do
  an attack on a real network, unfortunately.

  Additionally, at some point cover traffic schemes may be implemented to
  frustrate timing attacks on the first hop. It is possible some expert
  users may do this ad-hoc already, and may wish to continue using 3 hops
  for this reason.


Implementation:

  new_route_len() can be modified directly with a check of the
  Pathlen option. However, circuit construction logic should be
  altered so that both 2 hop and 3 hop users build the same types of
  circuits, and the option should ultimately govern circuit selection,
  not construction. This improves coverage against guard nodes being
  able to passively profile users who aren't even using Tor.
  PathlenCoinWeight, anyone? :)

  The exit policy hack is a bit more tricky. compare_addr_to_addr_policy
  needs to return an alternate ADDR_POLICY_ACCEPTED_WILDCARD or
  ADDR_POLICY_ACCEPTED_SPECIFIC return value for use in
  circuit_is_acceptable.
  
  The leaky exit is trickier still.. handle_control_attachstream
  does allow paths to exit at a given hop. Presumably something similar
  can be done in connection_ap_handshake_process_socks, and elsewhere?
  Circuit construction would also have to be performed such that the
  2nd hop's exit policy is what is considered, not the 3rd's.

  The entry_guard_t structure could have num_circ_failed and
  num_circ_succeeded members such that if it exceeds F% circuit
  extend failure rate to a second hop, it is removed from the entry list.

  F should be sufficiently high to avoid churn from normal Tor circuit
  failure as determined by TorFlow scans.

  The Vidalia option should be presented as a radio button.


Migration:

  Phase 1: Adjust exit policy checks if Pathlen is set, implement leaky
  circuit ability, and 2-3 hop circuit selection logic governed by
  Pathlen.

  Phase 2: Experiment to determine the proper ratio of circuit
  failures used to expire garbage or malicious guards via TorFlow
  (pending Bug #440 backport+adoption).

  Phase 3: Implement guard expiration code to kick off failure-prone
  guards and warn the user. Cap 2 hop guard duration to a proper number
  of days determined sufficient to establish guard reliability (to be
  determined by TorFlow).

  Phase 4: Make radiobutton in Vidalia, along with help entry
  that explains in layman's terms the risks involved.

  Phase 5: Allow user to specify path length by HTTP URL suffix.


[1] http://p2pnet.net/story/11279
[2] http://www.cs.umass.edu/~mwright/papers/levine-timing.pdf
[3] Proof available upon request ;)
Filename: 116-two-hop-paths-from-guard.txt
Title: Two hop paths from entry guards
Author: Michael Lieberman
Created: 26-Jun-2007
Status: Dead

This proposal is related to (but different from) Mike Perry's proposal 115
"Two Hop Paths."

Overview:

Volunteers who run entry guards should have the option of using only 2
additional tor nodes when constructing their own tor circuits.

While the option of two hop paths should perhaps be extended to every client
(as discussed in Mike Perry's thread), I believe the anonymity properties of
two hop paths are particularly well-suited to client computers that are also
serving as entry guards.

First I will describe the details of the strategy, as well as possible
avenues of attack. Then I will list advantages and disadvantages. Then, I
will discuss some possibly safer variations of the strategy, and finally
some implementation issues.

Details:

Suppose Alice is an entry guard, and wants to construct a two hop circuit.
Alice chooses a middle node at random (not using the entry guard strategy),
and gains anonymity by having her traffic look just like traffic from
someone else using her as an entry guard.

Can Alice's middle node figure out that she is initiator of the traffic? I
can think of four possible approaches for distinguishing traffic from Alice
with traffic through Alice:

1) Notice that communication from Alice comes too fast: Experimentation is
needed to determine if traffic from Alice can be distinguished from traffic
from a computer with a decent link to Alice.

2) Monitor Alice's network traffic to discover the lack of incoming packets
at the appropriate times. If an adversary has this ability, then Alice
already has problems in the current system, because the adversary can run a
standard timing attack on Alice's traffic.

3) Notice that traffic from Alice is unique in some way such that if Alice
was just one of 3 entry guards for this traffic, then the traffic should be
coming from two other entry guards as well. An example of "unique traffic"
could be always sending 117 packets every 3 minutes to an exit node that
exits to port 4661. However, if such patterns existed with sufficient
precision, then it seems to me that Tor already has a problem. (This "unique
traffic" may not be a problem if clients often end up choosing a single
entry guard because their other two are down. Does anyone know if this is
the case?)

4) First, control the middle node *and* some other part of the traffic,
using standard attacks on a two hop circuit without entry nodes (my recent
paper on Browser-Based Attacks would work well for this
http://petworkshop.org/2007/papers/PET2007_preproc_Browser_based.pdf). With
control of the circuit, we can now cause "unique traffic" as in 3).
Alternatively, if we know something about Alice independently, and we can
see what websites are being visited, we might be able to guess that she is
the kind of person that would visit those websites.

Anonymity Advantages:

-Alice never has the problem of choosing a malicious entry guard. In some
sense, Alice acts as her own entry guard.

Anonymity Disadvantages:

-If Alice's traffic is identified as originating from herself (see above for
how hard that might be), then she has the anonymity of a 2 hop circuit
without entry guards.

Additional advantages:

-A discussion of the latency advantages of two hop circuits is going on in
Mike Perry's thread already.
-Also, we can advertise this change as "Run an entry guard and decrease your
own Tor latency." This incentive has the potential to add nodes to the
network, improving the network as a whole.

Safer variations:

To solve the "unique traffic" problem, Alice could use two hop paths only
1/3 of the time, and choose 2 other entry guards for the other 2/3 of the
time. All the advantages are now 1/3 as useful (possibly more, if the other
2 entry guards are not always up).

To solve the problem that Alice's responses are too fast, Alice could delay
her responses (ideally based on some real data of response time when Alice
is used an entry guard). This loses most of the speed advantages of the two
hop path, but if Alice is a fast entry guard, it doesn't lose everything. It
also still has the (arguable) anonymity advantage that Alice doesn't have to
worry about having a malicious entry guard.

Implementation details:
For Alice to remain anonymous using this strategy, she has to actually be
acting as an entry guard for other nodes. This means the two hop option can
only be available to whatever high-performance threshold is currently set on
entry guards. Alice may need to somehow check her own current status as an
entry guard before choosing this two hop strategy.

Another thing to consider: suppose Alice is also an exit node. If the
fraction of exit nodes in existence is too small, she may rarely or never be
chosen as an entry guard. It would be sad if we offered an incentive to run
an entry guard that didn't extend to exit nodes. I suppose clients of Exit
nodes could pull the same trick, and bypass using Tor altogether (zero hop
paths), though that has additional issues.*

Mike Lieberman
MIT

*Why we shouldn't recommend Exit nodes pull the same trick:
1) Exit nodes would suffer heavily from the problem of "unique traffic"
mentioned above.
2) It would give governments an incentive to confiscate exit nodes to see if
they are pulling this trick.
Filename: 117-ipv6-exits.txt
Title: IPv6 exits
Author: coderman
Created: 10-Jul-2007
Status: Closed
Target: 0.2.4.x
Implemented-In: 0.2.4.7-alpha

Overview

   Extend Tor for TCP exit via IPv6 transport and DNS resolution of IPv6
   addresses.  This proposal does not imply any IPv6 support for OR
   traffic, only exit and name resolution.


Contents

0. Motivation

   As the IPv4 address space becomes more scarce there is increasing
   effort to provide Internet services via the IPv6 protocol.  Many
   hosts are available at IPv6 endpoints which are currently
   inaccessible for Tor users.

   Extending Tor to support IPv6 exit streams and IPv6 DNS name
   resolution will allow users of the Tor network to access these hosts.
   This capability would be present for those who do not currently have
   IPv6 access, thus increasing the utility of Tor and furthering
   adoption of IPv6.


1. Design

1.1. General design overview

   There are three main components to this proposal.  The first is a
   method for routers to advertise their ability to exit IPv6 traffic.
   The second is the manner in which routers resolve names to IPv6
   addresses.  Last but not least is the method in which clients
   communicate with Tor to resolve and connect to IPv6 endpoints
   anonymously.

1.2. Router IPv6 exit support

   In order to specify exit policies and IPv6 capability new directives
   in the Tor configuration will be needed.  If a router advertises IPv6
   exit policies in its descriptor this will signal the ability to
   provide IPv6 exit.  There are a number of additional default deny
   rules associated with this new address space which are detailed in
   the addendum.

   When Tor is started on a host it should check for the presence of a
   global unicast IPv6 address and if present include the default IPv6
   exit policies and any user specified IPv6 exit policies.

   If a user provides IPv6 exit policies but no global unicast IPv6
   address is available Tor should generate a warning and not publish the
   IPv6 policies in the router descriptor.

   It should be noted that IPv4 mapped IPv6 addresses are not valid exit
   destinations.  This mechanism is mainly used to interoperate with
   both IPv4 and IPv6 clients on the same socket.  Any attempts to use
   an IPv4 mapped IPv6 address, perhaps to circumvent exit policy for
   IPv4, must be refused.

1.3. DNS name resolution of IPv6 addresses (AAAA records)

   In addition to exit support for IPv6 TCP connections, a method to
   resolve domain names to their respective IPv6 addresses is also
   needed.  This is accomplished in the existing DNS system via AAAA
   records.  Routers will perform both A and AAAA requests when
   resolving a name so that the client can utilize an IPv6 endpoint when
   available or preferred.

   To avoid potential problems with caching DNS servers that behave
   poorly all NXDOMAIN responses to AAAA requests should be ignored if a
   successful response is received for an A request.  This implies that
   both AAAA and A requests will always be performed for each name
   resolution.

   For reverse lookups on IPv6 addresses, like that used for
   RESOLVE_PTR, Tor will perform the necessary PTR requests via
   IP6.ARPA.

   All routers which perform DNS resolution on behalf of clients
   (RELAY_RESOLVE) should perform and respond with both A and AAAA
   resources.

   [NOTE: In a future version, when we extend the behavior of RESOLVE to
    encapsulate more of real DNS, it will make sense to allow more
    flexibility here. -nickm]

1.4. Client interaction with IPv6 exit capability

1.4.1. Usability goals

   There are a number of behaviors which Tor can provide when
   interacting with clients that will improve the usability of IPv6 exit
   capability.  These behaviors are designed to make it simple for
   clients to express a preference for IPv6 transport and utilize IPv6
   host services.

1.4.2. SOCKSv5 IPv6 client behavior

   The SOCKS version 5 protocol supports IPv6 connections.  When using
   SOCKSv5 with hostnames it is difficult to determine if a client
   wishes to use an IPv4 or IPv6 address to connect to the desired host
   if it resolves to both address types.

   In order to make this more intuitive the SOCKSv5 protocol can be
   supported on a local IPv6 endpoint, [::1] port 9050 for example.
   When a client requests a connection to the desired host via an IPv6
   SOCKS connection Tor will prefer IPv6 addresses when resolving the
   host name and connecting to the host.

   Likewise, RESOLVE and RESOLVE_PTR requests from an IPv6 SOCKS
   connection will return IPv6 addresses when available, and fall back
   to IPv4 addresses if not.

   [NOTE: This means that SocksListenAddress and DNSListenAddress should
    support IPv6 addresses.  Perhaps there should also be a general option
    to have listeners that default to 127.0.0.1 and 0.0.0.0 listen
    additionally or instead on ::1 and :: -nickm]

1.4.3. MAPADDRESS behavior

   The MAPADDRESS capability supports clients that may not be able to
   use the SOCKSv4a or SOCKSv5 hostname support to resolve names via
   Tor.  This ability should be extended to IPv6 addresses in SOCKSv5 as
   well.

   When a client requests an address mapping from the wildcard IPv6
   address, [::0], the server will respond with a unique local IPv6
   address on success.  It is important to note that there may be two
   mappings for the same name if both an IPv4 and IPv6 address are
   associated with the host.  In this case a CONNECT to a mapped IPv6
   address should prefer IPv6 for the connection to the host, if
   available, while CONNECT to a mapped IPv4 address will prefer IPv4.

   It should be noted that IPv6 does not provide the concept of a host
   local subnet, like 127.0.0.0/8 in IPv4.  For this reason integration
   of Tor with IPv6 clients should consider a firewall or filter rule to
   drop unique local addresses to or from the network when possible.
   These packets should not be routed, however, keeping them off the
   subnet entirely is worthwhile.

1.4.3.1. Generating unique local IPv6 addresses

   The usual manner of generating a unique local IPv6 address is to
   select a Global ID part randomly, along with a Subnet ID, and sharing
   this prefix among the communicating parties who each have their own
   distinct Interface ID.  In this style a given Tor instance might
   select a random Global and Subnet ID and provide MAPADDRESS
   assignments with a random Interface ID as needed.  This has the
   potential to associate unique Global/Subnet identifiers with a given
   Tor instance and may expose attacks against the anonymity of Tor
   users.

   To avoid this potential problem entirely MAPADDRESS must always
   generate the Global, Subnet, and Interface IDs randomly for each
   request.  It is also highly suggested that explicitly specifying an
   IPv6 source address instead of the wildcard address not be supported
   to ensure that a good random address is used.

1.4.4. DNSProxy IPv6 client behavior

   A new capability in recent Tor versions is the transparent DNS proxy.
   This feature will need to return both A and AAAA resource records
   when responding to client name resolution requests.

   The transparent DNS proxy should also support reverse lookups for
   IPv6 addresses.  It is suggested that any such requests to the
   deprecated IP6.INT domain should be translated to IP6.ARPA instead.
   This translation is not likely to be used and is of low priority.

   It would be nice to support DNS over IPv6 transport as well, however,
   this is not likely to be used and is of low priority.

1.4.5. TransPort IPv6 client behavior

   Tor also provides transparent TCP proxy support via the Trans*
   directives in the configuration.  The TransListenAddress directive
   should accept an IPv6 address in addition to IPv4 so that IPv6 TCP
   connections can be transparently proxied.

1.5. Additional changes

   The RedirectExit option should be deprecated rather than extending
   this feature to IPv6.


2. Spec changes

2.1. Tor specification

   In '6.2. Opening streams and transferring data' the following should
   be changed to indicate IPv6 exit capability:

      "No version of Tor currently generates the IPv6 format."

   In '6.4. Remote hostname lookup' the following should be updated to
   reflect use of ip6.arpa in addition to in-addr.arpa.

      "For a reverse lookup, the OP sends a RELAY_RESOLVE cell containing an
       in-addr.arpa address."

   In 'A.1. Differences between spec and implementation' the following
   should be updated to indicate IPv6 exit capability:

      "The current codebase has no IPv6 support at all."

   [NOTE: the EXITPOLICY end-cell reason says that it can hold an ipv4 or an
    ipv6 address, but doesn't say how.  We may want a separate EXITPOLICY2
    type that can hold an ipv6 address, since the way we encode ipv6
    addresses elsewhere ("0.0.0.0 indicates that the next 16 bytes are ipv6")
    is a bit dumb. -nickm]
   [Actually, the length field lets us distinguish EXITPOLICY. -nickm]

2.2. Directory specification

   In '2.1. Router descriptor format' a new set of directives is needed
   for IPv6 exit policy.  The existing accept/reject directives should
   be clarified to indicate IPv4 or wildcard address relevance.  The new
   IPv6 directives will be in the form of:

      "accept6" exitpattern NL
      "reject6" exitpattern NL

   The section describing accept6/reject6 should explain that the
   presence of accept6 or reject6 exit policies in a router descriptor
   signals the ability of that router to exit IPv6 traffic (according to
   IPv6 exit policies).

   The "[::]/0" notation is used to represent "all IPv6 addresses".
   "[::0]/0" may also be used for this representation.

   If a user specifies a 'reject6 [::]/0:*' policy in the Tor
   configuration this will be interpreted as forcing no IPv6 exit
   support and no accept6/reject6 policies will be included in the
   published descriptor.  This will prevent IPv6 exit if the router host
   has a global unicast IPv6 address present.

   It is important to note that a wildcard address in an accept or
   reject policy applies to both IPv4 and IPv6 addresses.

2.3. Control specification

   In '3.8. MAPADDRESS' the potential to have to addresses for a given
   name should be explained.  The method for generating unique local
   addresses for IPv6 mappings needs explanation as described above.

   When IPv6 addresses are used in this document they should include the
   brackets for consistency.  For example, the null IPv6 address should
   be written as "[::0]" and not "::0".  The control commands will
   expect the same syntax as well.

   In '3.9. GETINFO' the "address" command should return both public
   IPv4 and IPv6 addresses if present.  These addresses should be
   separated via \r\n.


2.4. Tor SOCKS extensions

   In '2. Name lookup' a description of IPv6 address resolution is
   needed for SOCKSv5 as described above.  IPv6 addresses should be
   supported in both the RESOLVE and RESOLVE_PTR extensions.

   A new section describing the ability to accept SOCKSv5 clients on a
   local IPv6 address to indicate a preference for IPv6 transport as
   described above is also needed.  The behavior of Tor SOCKSv5 proxy
   with an IPv6 preference should be explained, for example, preferring
   IPv6 transport to a named host with both IPv4 and IPv6 addresses
   available (A and AAAA records).


3. Questions and concerns

3.1. DNS A6 records

   A6 is explicitly avoided in this document.  There are potential
   reasons for implementing this, however, the inherent complexity of
   the protocol and resolvers make this unappealing.  Is there a
   compelling reason to consider A6 as part of IPv6 exit support?

   [IMO not till anybody needs it. -nickm]

3.2. IPv4 and IPv6 preference

   The design above tries to infer a preference for IPv4 or IPv6
   transport based on client interactions with Tor.  It might be useful
   to provide more explicit control over this preference.  For example,
   an IPv4 SOCKSv5 client may want to use IPv6 transport to named hosts
   in CONNECT requests while the current implementation would assume an
   IPv4 preference.  Should more explicit control be available, through
   either configuration directives or control commands?

   Many applications support a inet6-only or prefer-family type option
   that provides the user manual control over address preference.  This
   could be provided as a Tor configuration option.

   An explicit preference is still possible by resolving names and then
   CONNECTing to an IPv4 or IPv6 address as desired, however, not all
   client applications may have this option available.

3.3. Support for IPv6 only transparent proxy clients

   It may be useful to support IPv6 only transparent proxy clients using
   IPv4 mapped IPv6 like addresses.  This would require transparent DNS
   proxy using IPv6 transport and the ability to map A record responses
   into IPv4 mapped IPv6 like addresses in the manner described in the
   "NAT-PT" RFC for a traditional Basic-NAT-PT with DNS-ALG.  The
   transparent TCP proxy would thus need to detect these mapped addresses
   and connect to the desired IPv4 host.

   The IPv6 prefix used for this purpose must not be the actual IPv4
   mapped IPv6 address prefix, though the manner in which IPv4 addresses
   are embedded in IPv6 addresses would be the same.

   The lack of any IPv6 only hosts which would use this transparent proxy
   method makes this a lot of work for very little gain.  Is there a
   compelling reason to support this NAT-PT like capability?

3.4. IPv6 DNS and older Tor routers

   It is expected that many routers will continue to run with older
   versions of Tor when the IPv6 exit capability is released.  Clients
   who wish to use IPv6 will need to route RELAY_RESOLVE requests to the
   newer routers which will respond with both A and AAAA resource
   records when possible.

   One way to do this is to route RELAY_RESOLVE requests to routers with
   IPv6 exit policies published, however, this would not utilize current
   routers that can resolve IPv6 addresses even if they can't exit such
   traffic.

   There was also concern expressed about the ability of existing clients
   to cope with new RELAY_RESOLVE responses that contain IPv6 addresses.
   If this breaks backward compatibility, a new request type may be
   necessary, like RELAY_RESOLVE6, or some other mechanism of indicating
   the ability to parse IPv6 responses when making the request.

3.5. IPv4 and IPv6 bindings in MAPADDRESS

   It may be troublesome to try and support two distinct address mappings
   for the same name in the existing MAPADDRESS implementation.  If this
   cannot be accommodated then the behavior should replace existing
   mappings with the new address regardless of family.  A warning when
   this occurs would be useful to assist clients who encounter problems
   when both an IPv4 and IPv6 application are using MAPADDRESS for the
   same names concurrently, causing lost connections for one of them.

4. Addendum

4.1. Sample IPv6 default exit policy

   reject 0.0.0.0/8
   reject 169.254.0.0/16
   reject 127.0.0.0/8
   reject 192.168.0.0/16
   reject 10.0.0.0/8
   reject 172.16.0.0/12
   reject6 [0000::]/8
   reject6 [0100::]/8
   reject6 [0200::]/7
   reject6 [0400::]/6
   reject6 [0800::]/5
   reject6 [1000::]/4
   reject6 [4000::]/3
   reject6 [6000::]/3
   reject6 [8000::]/3
   reject6 [A000::]/3
   reject6 [C000::]/3
   reject6 [E000::]/4
   reject6 [F000::]/5
   reject6 [F800::]/6
   reject6 [FC00::]/7
   reject6 [FE00::]/9
   reject6 [FE80::]/10
   reject6 [FEC0::]/10
   reject6 [FF00::]/8
   reject *:25
   reject *:119
   reject *:135-139
   reject *:445
   reject *:1214
   reject *:4661-4666
   reject *:6346-6429
   reject *:6699
   reject *:6881-6999
   accept *:*
   # accept6 [2000::]/3:* is implied

4.2. Additional resources

   'DNS Extensions to Support IP Version 6'
   http://www.ietf.org/rfc/rfc3596.txt

   'DNS Extensions to Support IPv6 Address Aggregation and Renumbering'
   http://www.ietf.org/rfc/rfc2874.txt

   'SOCKS Protocol Version 5'
   http://www.ietf.org/rfc/rfc1928.txt

   'Unique Local IPv6 Unicast Addresses'
   http://www.ietf.org/rfc/rfc4193.txt

   'INTERNET PROTOCOL VERSION 6 ADDRESS SPACE'
   http://www.iana.org/assignments/ipv6-address-space

   'Network Address Translation - Protocol Translation (NAT-PT)'
   http://www.ietf.org/rfc/rfc2766.txt
Filename: 118-multiple-orports.txt
Title: Advertising multiple ORPorts at once
Author: Nick Mathewson
Created: 09-Jul-2007
Status: Superseded
Superseded-By: 186-multiple-orports.txt

[Needs Revision: This proposal needs revision to come up to 2011 standards
and take microdescriptors into account.]

Overview:

   This document is a proposal for servers to advertise multiple
   address/port combinations for their ORPort.

Motivation:

   Sometimes servers want to support multiple ports for incoming
   connections, either in order to support multiple address families, to
   better use multiple interfaces, or to support a variety of
   FascistFirewallPorts settings.  This is easy to set up now, but
   there's no way to advertise it to clients.

New descriptor syntax:

   We add a new line in the router descriptor, "or-address".  This line
   can occur zero, one, or multiple times.  Its format is:

      or-address SP ADDRESS ":" PORTLIST NL

      ADDRESS = IP6ADDR / IP4ADDR
      IPV6ADDR = an ipv6 address, surrounded by square brackets.
      IPV4ADDR = an ipv4 address, represented as a dotted quad.
      PORTLIST = PORTSPEC | PORTSPEC "," PORTLIST
      PORTSPEC = PORT | PORT "-" PORT

   [This is the regular format for specifying sets of addresses and
   ports in Tor.]

New OR behavior:

   We add two more options to supplement ORListenAddress:
   ORPublishedListenAddress, and ORPublishAddressSet.  The former
   listens on an address-port combination and publishes it in addition
   to the regular address.  The latter advertises a set of address-port
   combinations, but does not listen on them.  [To use this option, the
   server operator should set up port forwarding to the regular ORPort,
   as for example with firewall rules.]

   Servers should extend their testing to include advertised addresses
   and ports.  No address or port should be advertised until it's been
   tested.  [This might get expensive in practice.]

New authority behavior:

   Authorities should spot-test descriptors, and reject any where a
   substantial part of the addresses can't be reached.

New client behavior:

   When connecting to another server, clients SHOULD pick an
   address-port ocmbination at random as supported by their
   reachableaddresses.  If a client has a connection to a server at one
   address, it SHOULD use that address for any simultaneous connections
   to that server.  Clients SHOULD use the canonical address for any
   server when generating extend cells.

Not addressed here:

   * There's no reason to listen on multiple dirports; current Tors
   mostly don't connect directly to the dirport anyway.

   * It could be advantageous to list something about extra addresses in
   the network-status document.  This would, however, eat space there.
   More analysis is needed, particularly in light of proposal 141
   ("Download server descriptors on demand")

Dependencies:

   Testing for canonical connections needs to be implemented before it's
   safe to use this proposal.


Notes 3 July:
  - Write up the simple version of this.  No ranges needed yet.  No
    networkstatus chagnes yet.

Filename: 119-controlport-auth.txt
Title: New PROTOCOLINFO command for controllers
Author: Roger Dingledine
Created: 14-Aug-2007
Status: Closed
Implemented-In: 0.2.0.x

Overview:

  Here we describe how to help controllers locate the cookie
  authentication file when authenticating to Tor, so we can a) require
  authentication by default for Tor controllers and b) still keep
  things usable.  Also, we propose an extensible, general-purpose mechanism
  for controllers to learn about a Tor instance's protocol and
  authentication requirements before authenticating.

The Problem:

  When we first added the controller protocol, we wanted to make it
  easy for people to play with it, so by default we didn't require any
  authentication from controller programs. We allowed requests only from
  localhost as a stopgap measure for security.

  Due to an increasing number of vulnerabilities based on this approach,
  it's time to add authentication in default configurations.

  We have a number of goals:
  - We want the default Vidalia bundles to transparently work. That
    means we don't want the users to have to type in or know a password.
  - We want to allow multiple controller applications to connect to the
    control port. So if Vidalia is launching Tor, it can't just keep the
    secrets to itself.

  Right now there are three authentication approaches supported
  by the control protocol: NULL, CookieAuthentication, and
  HashedControlPassword. See Sec 5.1 in control-spec.txt for details.

  There are a couple of challenges here. The first is: if the controller
  launches Tor, how should we teach Tor what authentication approach
  it should require, and the secret that goes along with it? Next is:
  how should this work when the controller attaches to an existing Tor,
  rather than launching Tor itself?

  Cookie authentication seems most amenable to letting multiple controller
  applications interact with Tor. But that brings in yet another question:
  how does the controller guess where to look for the cookie file,
  without first knowing what DataDirectory Tor is using?

Design:

  We should add a new controller command PROTOCOLINFO that can be sent
  as a valid first command (the others being AUTHENTICATE and QUIT). If
  PROTOCOLINFO is sent as the first command, the second command must be
  either a successful AUTHENTICATE or a QUIT.

  If the initial command sequence is not valid, Tor closes the connection.


Spec:

  C:  "PROTOCOLINFO" *(SP PIVERSION) CRLF
  S:  "250+PROTOCOLINFO" SP PIVERSION CRLF *InfoLine "250 OK" CRLF

    InfoLine = AuthLine / VersionLine / OtherLine

     AuthLine = "250-AUTH" SP "METHODS=" AuthMethod *(",")AuthMethod
                       *(SP "COOKIEFILE=" AuthCookieFile) CRLF
     VersionLine = "250-VERSION" SP "Tor=" TorVersion [SP Arguments] CRLF

     AuthMethod =
      "NULL"           / ; No authentication is required
      "HASHEDPASSWORD" / ; A controller must supply the original password
      "COOKIE"         / ; A controller must supply the contents of a cookie

     AuthCookieFile = QuotedString
     TorVersion = QuotedString

     OtherLine = "250-" Keyword [SP Arguments] CRLF

  For example:

  C: PROTOCOLINFO CRLF
  S: "250+PROTOCOLINFO 1" CRLF
  S: "250-AUTH Methods=HASHEDPASSWORD,COOKIE COOKIEFILE="/tor/cookie"" CRLF
  S: "250-VERSION Tor=0.2.0.5-alpha" CRLF
  S: "250 OK" CRLF

  Tor MAY give its InfoLines in any order; controllers MUST ignore InfoLines
  with keywords it does not recognize.  Controllers MUST ignore extraneous
  data on any InfoLine.

  PIVERSION is there in case we drastically change the syntax one day. For
  now it should always be "1", for the controller protocol.  Controllers MAY
  provide a list of the protocol versions they support; Tor MAY select a
  version that the controller does not support.

  Right now only two "topics" (AUTH and VERSION) are included, but more
  may be included in the future. Controllers must accept lines with
  unexpected topics.

  AuthCookieFile = QuotedString

  AuthMethod is used to specify one or more control authentication
  methods that Tor currently accepts.

  AuthCookieFile specifies the absolute path and filename of the
  authentication cookie that Tor is expecting and is provided iff
  the METHODS field contains the method "COOKIE".  Controllers MUST handle
  escape sequences inside this string.

  The VERSION line contains the Tor version.

  [What else might we want to include that could be useful? -RD]

Compatibility:

  Tor 0.1.2.16 and 0.2.0.4-alpha hang up after the first failed
  command. Earlier Tors don't know about this command but don't hang
  up. That means controllers will need a mechanism for distinguishing
  whether they're talking to a Tor that speaks PROTOCOLINFO or not.

  I suggest that the controllers attempt a PROTOCOLINFO. Then:
    - If it works, great. Authenticate as required.
    - If they get hung up on, reconnect and do a NULL AUTHENTICATE.
    - If it's unrecognized but they're not hung up on, do a NULL
      AUTHENTICATE.

Unsolved problems:

  If Torbutton wants to be a Tor controller one day... talking TCP is
  bad enough, but reading from the filesystem is even harder. Is there
  a way to let simple programs work with the controller port without
  needing all the auth infrastructure?

  Once we put this approach in place, the next vulnerability we see will
  involve an attacker somehow getting read access to the victim's files
  --- and then we're back where we started. This means we still need
  to think about how to demand password-based authentication without
  bothering the user about it.

Filename: 120-shutdown-descriptors.txt
Title: Shutdown descriptors when Tor servers stop
Author: Roger Dingledine
Created: 15-Aug-2007
Status: Dead

[Proposal dead as of 11 Jul 2008. The point of this proposal was to give
routers a good way to get out of the networkstatus early, but proposal
138 (already implemented) has achieved this.]

Overview:

  Tor servers should publish a last descriptor whenever they shut down,
  to let others know that they are no longer offering service.

The Problem:

  The main reason for this is in reaction to Internet services that want
  to treat connections from the Tor network differently. Right now,
  if a user experiments with turning on the "relay" functionality, he
  is punished by being locked out of some websites, some IRC networks,
  etc --- and this lockout persists for several days even after he turns
  the server off.

Design:

  During the "slow shutdown" period if exiting, or shortly after the
  user sets his ORPort back to 0 if not exiting, Tor should publish a
  final descriptor with the following characteristics:

  1) Exit policy is listed as "reject *:*"
  2) It includes a new entry called "opt shutdown 1"

  The first step is so current blacklists will no longer list this node
  as exiting to whatever the service is.

  The second step is so directory authorities can avoid wasting time
  doing reachability testing. Authorities should automatically not list
  as Running any router whose latest descriptor says it shut down.

  [I originally had in mind a third step --- Advertised bandwidth capacity
  is listed as "0" --- so current Tor clients will skip over this node
  when building most circuits. But since clients won't fetch descriptors
  from nodes not listed as Running, this step seems pointless. -RD]

Spec:

  TBD but should be pretty straightforward.

Security issues:

  Now external people can learn exactly when a node stopped offering
  relay service. How bad is this? I can see a few minor attacks based
  on this knowledge, but on the other hand as it is we don't really take
  any steps to keep this information secret.

Overhead issues:

  We are creating more descriptors that want to be remembered. However,
  since the router won't be marked as Running, ordinary clients won't
  fetch the shutdown descriptors. Caches will, though. I hope this is ok.

Implementation:

  To make things easy, we should publish the shutdown descriptor only
  on controlled shutdown (SIGINT as opposed to SIGTERM). That would
  leave enough time for publishing that we probably wouldn't need any
  extra synchronization code.

  If that turns out to be too unintuitive for users, I could imagine doing
  it on SIGTERMs too, and just delaying exit until we had successfully
  published to at least one authority, at which point we'd hope that it
  propagated from there.

Acknowledgements:

  tup suggested this idea.

Comments:

  2) Maybe add a rule "Don't do this for hibernation if we expect to wake
     up before the next consensus is published"?
                                                      - NM 9 Oct 2007
Filename: 121-hidden-service-authentication.txt
Title: Hidden Service Authentication
Author: Tobias Kamm, Thomas Lauterbach, Karsten Loesing, Ferdinand Rieger,
        Christoph Weingarten
Created: 10-Sep-2007
Status: Closed
Implemented-In: 0.2.1.x

Change history:

  26-Sep-2007  Initial proposal for or-dev
  08-Dec-2007  Incorporated comments by Nick posted to or-dev on 10-Oct-2007
  15-Dec-2007  Rewrote complete proposal for better readability, modified
               authentication protocol, merged in personal notes
  24-Dec-2007  Replaced misleading term "authentication" by "authorization"
               and added some clarifications (comments by Sven Kaffille)
  28-Apr-2008  Updated most parts of the concrete authorization protocol
  04-Jul-2008  Add a simple algorithm to delay descriptor publication for
               different clients of a hidden service
  19-Jul-2008  Added INTRODUCE1V cell type (1.2), improved replay
               protection for INTRODUCE2 cells (1.3), described limitations
               for auth protocols (1.6), improved hidden service protocol
               without client authorization (2.1), added second, more
               scalable authorization protocol (2.2), rewrote existing
               authorization protocol (2.3); changes based on discussion
               with Nick
  31-Jul-2008  Limit maximum descriptor size to 20 kilobytes to prevent
               abuse.
  01-Aug-2008  Use first part of Diffie-Hellman handshake for replay
               protection instead of rendezvous cookie.
  01-Aug-2008  Remove improved hidden service protocol without client
               authorization (2.1). It might get implemented in proposal
               142.

Overview:

  This proposal deals with a general infrastructure for performing
  authorization (not necessarily implying authentication) of requests to
  hidden services at three points: (1) when downloading and decrypting
  parts of the hidden service descriptor, (2) at the introduction point,
  and (3) at Bob's Tor client before contacting the rendezvous point. A
  service provider will be able to restrict access to his service at these
  three points to authorized clients only. Further, the proposal contains
  specific authorization protocols as instances that implement the
  presented authorization infrastructure.

  This proposal is based on v2 hidden service descriptors as described in
  proposal 114 and introduced in version 0.2.0.10-alpha.

  The proposal is structured as follows: The next section motivates the
  integration of authorization mechanisms in the hidden service protocol.
  Then we describe a general infrastructure for authorization in hidden
  services, followed by specific authorization protocols for this
  infrastructure. At the end we discuss a number of attacks and non-attacks
  as well as compatibility issues.

Motivation:

  The major part of hidden services does not require client authorization
  now and won't do so in the future. To the contrary, many clients would
  not want to be (pseudonymously) identifiable by the service (though this
  is unavoidable to some extent), but rather use the service
  anonymously. These services are not addressed by this proposal.

  However, there may be certain services which are intended to be accessed
  by a limited set of clients only. A possible application might be a
  wiki or forum that should only be accessible for a closed user group.
  Another, less intuitive example might be a real-time communication
  service, where someone provides a presence and messaging service only to
  his buddies. Finally, a possible application would be a personal home
  server that should be remotely accessed by its owner.

  Performing authorization for a hidden service within the Tor network, as
  proposed here, offers a range of advantages compared to allowing all
  client connections in the first instance and deferring authorization to
  the transported protocol:

  (1) Reduced traffic: Unauthorized requests would be rejected as early as
  possible, thereby reducing the overall traffic in the network generated
  by establishing circuits and sending cells.

  (2) Better protection of service location: Unauthorized clients could not
  force Bob to create circuits to their rendezvous points, thus preventing
  the attack described by Øverlier and Syverson in their paper "Locating
  Hidden Servers" even without the need for guards.

  (3) Hiding activity: Apart from performing the actual authorization, a
  service provider could also hide the mere presence of his service from
  unauthorized clients when not providing hidden service descriptors to
  them, rejecting unauthorized requests already at the introduction
  point (ideally without leaking presence information at any of these
  points), or not answering unauthorized introduction requests.

  (4) Better protection of introduction points: When providing hidden
  service descriptors to authorized clients only and encrypting the
  introduction points as described in proposal 114, the introduction points
  would be unknown to unauthorized clients and thereby protected from DoS
  attacks.

  (5) Protocol independence: Authorization could be performed for all
  transported protocols, regardless of their own capabilities to do so.

  (6) Ease of administration: A service provider running multiple hidden
  services would be able to configure access at a single place uniformly
  instead of doing so for all services separately.

  (7) Optional QoS support: Bob could adapt his node selection algorithm
  for building the circuit to Alice's rendezvous point depending on a
  previously guaranteed QoS level, thus providing better latency or
  bandwidth for selected clients.

  A disadvantage of performing authorization within the Tor network is
  that a hidden service cannot make use of authorization data in
  the transported protocol. Tor hidden services were designed to be
  independent of the transported protocol. Therefore it's only possible to
  either grant or deny access to the whole service, but not to specific
  resources of the service.

  Authorization often implies authentication, i.e. proving one's identity.
  However, when performing authorization within the Tor network, untrusted
  points should not gain any useful information about the identities of
  communicating parties, neither server nor client. A crucial challenge is
  to remain anonymous towards directory servers and introduction points.
  However, trying to hide identity from the hidden service is a futile
  task, because a client would never know if he is the only authorized
  client and therefore perfectly identifiable. Therefore, hiding client
  identity from the hidden service is not an aim of this proposal.

  The current implementation of hidden services does not provide any kind
  of authorization. The hidden service descriptor version 2, introduced by
  proposal 114, was designed to use a descriptor cookie for downloading and
  decrypting parts of the descriptor content, but this feature is not yet
  in use. Further, most relevant cell formats specified in rend-spec
  contain fields for authorization data, but those fields are neither
  implemented nor do they suffice entirely.

Details:

  1. General infrastructure for authorization to hidden services

  We spotted three possible authorization points in the hidden service
  protocol:

    (1) when downloading and decrypting parts of the hidden service
        descriptor,
    (2) at the introduction point, and
    (3) at Bob's Tor client before contacting the rendezvous point.

  The general idea of this proposal is to allow service providers to
  restrict access to some or all of these points to authorized clients
  only.

  1.1. Client authorization at directory

  Since the implementation of proposal 114 it is possible to combine a
  hidden service descriptor with a so-called descriptor cookie. If done so,
  the descriptor cookie becomes part of the descriptor ID, thus having an
  effect on the storage location of the descriptor. Someone who has learned
  about a service, but is not aware of the descriptor cookie, won't be able
  to determine the descriptor ID and download the current hidden service
  descriptor; he won't even know whether the service has uploaded a
  descriptor recently. Descriptor IDs are calculated as follows (see
  section 1.2 of rend-spec for the complete specification of v2 hidden
  service descriptors):

      descriptor-id =
          H(service-id | H(time-period | descriptor-cookie | replica))

  Currently, service-id is equivalent to permanent-id which is calculated
  as in the following formula. But in principle it could be any public
  key.

      permanent-id = H(permanent-key)[:10]

  The second purpose of the descriptor cookie is to encrypt the list of
  introduction points, including optional authorization data. Hence, the
  hidden service directories won't learn any introduction information from
  storing a hidden service descriptor. This feature is implemented but
  unused at the moment. So this proposal will harness the advantages
  of proposal 114.

  The descriptor cookie can be used for authorization by keeping it secret
  from everyone but authorized clients. A service could then decide whether
  to publish hidden service descriptors using that descriptor cookie later
  on. An authorized client being aware of the descriptor cookie would be
  able to download and decrypt the hidden service descriptor.

  The number of concurrently used descriptor cookies for one hidden service
  is not restricted. A service could use a single descriptor cookie for all
  users, a distinct cookie per user, or something in between, like one
  cookie per group of users. It is up to the specific protocol and how it
  is applied by a service provider.

  Two or more hidden service descriptors for different groups or users
  should not be uploaded at the same time. A directory node could conclude
  easily that the descriptors were issued by the same hidden service, thus
  being able to link the two groups or users. Therefore, descriptors for
  different users or clients that ought to be stored on the same directory
  are delayed, so that only one descriptor is uploaded to a directory at a
  time. The remaining descriptors are uploaded with a delay of up to
  30 seconds.
  Further, descriptors for different groups or users that are to be stored
  on different directories are delayed for a random time of up to 30
  seconds to hide relations from colluding directories. Certainly, this
  does not prevent linking entirely, but it makes it somewhat harder.
  There is a conflict between hiding links between clients and making a
  service available in a timely manner.

  Although this part of the proposal is meant to describe a general
  infrastructure for authorization, changing the way of using the
  descriptor cookie to look up hidden service descriptors, e.g. applying
  some sort of asymmetric crypto system, would require in-depth changes
  that would be incompatible to v2 hidden service descriptors. On the
  contrary, using another key for en-/decrypting the introduction point
  part of a hidden service descriptor, e.g. a different symmetric key or
  asymmetric encryption, would be easy to implement and compatible to v2
  hidden service descriptors as understood by hidden service directories
  (clients and services would have to be upgraded anyway for using the new
  features).

  An adversary could try to abuse the fact that introduction points can be
  encrypted by storing arbitrary, unrelated data in the hidden service
  directory. This abuse can be limited by setting a hard descriptor size
  limit, forcing the adversary to split data into multiple chunks. There
  are some limitations that make splitting data across multiple descriptors
  unattractive: 1) The adversary would not be able to choose descriptor IDs
  freely and would therefore have to implement his own indexing
  structure. 2) Validity of descriptors is limited to at most 24 hours
  after which descriptors need to be republished.

  The regular descriptor size in bytes is 745 + num_ipos * 837 + auth_data.
  A large descriptor with 7 introduction points and 5 kilobytes of
  authorization data would be 11724 bytes in size. The upper size limit of
  descriptors should be set to 20 kilobytes, which limits the effect of
  abuse while retaining enough flexibility in designing authorization
  protocols.

  1.2. Client authorization at introduction point

  The next possible authorization point after downloading and decrypting
  a hidden service descriptor is the introduction point. It may be important
  for authorization, because it bears the last chance of hiding presence
  of a hidden service from unauthorized clients. Further, performing
  authorization at the introduction point might reduce traffic in the
  network, because unauthorized requests would not be passed to the
  hidden service. This applies to those clients who are aware of a
  descriptor cookie and thereby of the hidden service descriptor, but do
  not have authorization data to pass the introduction point or access the
  service (such a situation might occur when authorization data for
  authorization at the directory is not issued on a per-user basis, but
  authorization data for authorization at the introduction point is).

  It is important to note that the introduction point must be considered
  untrustworthy, and therefore cannot replace authorization at the hidden
  service itself. Nor should the introduction point learn any sensitive
  identifiable information from either the service or the client.

  In order to perform authorization at the introduction point, three
  message formats need to be modified: (1) v2 hidden service descriptors,
  (2) ESTABLISH_INTRO cells, and (3) INTRODUCE1 cells.

  A v2 hidden service descriptor needs to contain authorization data that
  is introduction-point-specific and sometimes also authorization data
  that is introduction-point-independent. Therefore, v2 hidden service
  descriptors as specified in section 1.2 of rend-spec already contain two
  reserved fields "intro-authorization" and "service-authorization"
  (originally, the names of these fields were "...-authentication")
  containing an authorization type number and arbitrary authorization
  data. We propose that authorization data consists of base64 encoded
  objects of arbitrary length, surrounded by "-----BEGIN MESSAGE-----" and
  "-----END MESSAGE-----". This will increase the size of hidden service
  descriptors, but this is allowed since there is no strict upper limit.

  The current ESTABLISH_INTRO cells as described in section 1.3 of
  rend-spec do not contain either authorization data or version
  information. Therefore, we propose a new version 1 of the ESTABLISH_INTRO
  cells adding these two issues as follows:

     V      Format byte: set to 255               [1 octet]
     V      Version byte: set to 1                [1 octet]
     KL     Key length                           [2 octets]
     PK     Bob's public key                    [KL octets]
     HS     Hash of session info                [20 octets]
     AUTHT  The auth type that is supported       [1 octet]
     AUTHL  Length of auth data                  [2 octets]
     AUTHD  Auth data                            [variable]
     SIG    Signature of above information       [variable]

  From the format it is possible to determine the maximum allowed size for
  authorization data: given the fact that cells are 512 octets long, of
  which 498 octets are usable (see section 6.1 of tor-spec), and assuming
  1024 bit = 128 octet long keys, there are 215 octets left for
  authorization data. Hence, authorization protocols are bound to use no
  more than these 215 octets, regardless of the number of clients that
  shall be authenticated at the introduction point. Otherwise, one would
  need to send multiple ESTABLISH_INTRO cells or split them up, which we do
  not specify here.

  In order to understand a v1 ESTABLISH_INTRO cell, the implementation of
  a relay must have a certain Tor version. Hidden services need to be able
  to distinguish relays being capable of understanding the new v1 cell
  formats and perform authorization. We propose to use the version number
  that is contained in networkstatus documents to find capable
  introduction points.

  The current INTRODUCE1 cell as described in section 1.8 of rend-spec is
  not designed to carry authorization data and has no version number, too.
  Unfortunately, unversioned INTRODUCE1 cells consist only of a fixed-size,
  seemingly random PK_ID, followed by the encrypted INTRODUCE2 cell. This
  makes it impossible to distinguish unversioned INTRODUCE1 cells from any
  later format. In particular, it is not possible to introduce some kind of
  format and version byte for newer versions of this cell. That's probably
  where the comment "[XXX011 want to put intro-level auth info here, but no
  version. crap. -RD]" that was part of rend-spec some time ago comes from.

  We propose that new versioned INTRODUCE1 cells use the new cell type 41
  RELAY_INTRODUCE1V (where V stands for versioned):

  Cleartext
     V      Version byte: set to 1                [1 octet]
     PK_ID  Identifier for Bob's PK             [20 octets]
     AUTHT  The auth type that is included        [1 octet]
     AUTHL  Length of auth data                  [2 octets]
     AUTHD  Auth data                            [variable]
  Encrypted to Bob's PK:
     (RELAY_INTRODUCE2 cell)

  The maximum length of contained authorization data depends on the length
  of the contained INTRODUCE2 cell. A calculation follows below when
  describing the INTRODUCE2 cell format we propose to use.

  1.3. Client authorization at hidden service

  The time when a hidden service receives an INTRODUCE2 cell constitutes
  the last possible authorization point during the hidden service
  protocol. Performing authorization here is easier than at the other two
  authorization points, because there are no possibly untrusted entities
  involved.

  In general, a client that is successfully authorized at the introduction
  point should be granted access at the hidden service, too. Otherwise, the
  client would receive a positive INTRODUCE_ACK cell from the introduction
  point and conclude that it may connect to the service, but the request
  will be dropped without notice. This would appear as a failure to
  clients. Therefore, the number of cases in which a client successfully
  passes the introduction point but fails at the hidden service should be
  zero. However, this does not lead to the conclusion that the
  authorization data used at the introduction point and the hidden service
  must be the same, but only that both authorization data should lead to
  the same authorization result.

  Authorization data is transmitted from client to server via an
  INTRODUCE2 cell that is forwarded by the introduction point. There are
  versions 0 to 2 specified in section 1.8 of rend-spec, but none of these
  contain fields for carrying authorization data. We propose a slightly
  modified version of v3 INTRODUCE2 cells that is specified in section
  1.8.1 and which is not implemented as of December 2007. In contrast to
  the specified v3 we avoid specifying (and implementing) IPv6 capabilities,
  because Tor relays will be required to support IPv4 addresses for a long
  time in the future, so that this seems unnecessary at the moment. The
  proposed format of v3 INTRODUCE2 cells is as follows:

     VER    Version byte: set to 3.               [1 octet]
     AUTHT  The auth type that is used            [1 octet]
     AUTHL  Length of auth data                  [2 octets]
     AUTHD  Auth data                            [variable]
     TS     Timestamp (seconds since 1-1-1970)   [4 octets]
     IP     Rendezvous point's address           [4 octets]
     PORT   Rendezvous point's OR port           [2 octets]
     ID     Rendezvous point identity ID        [20 octets]
     KLEN   Length of onion key                  [2 octets]
     KEY    Rendezvous point onion key        [KLEN octets]
     RC     Rendezvous cookie                   [20 octets]
     g^x    Diffie-Hellman data, part 1        [128 octets]

  The maximum possible length of authorization data is related to the
  enclosing INTRODUCE1V cell. A v3 INTRODUCE2 cell with
  1024 bit = 128 octets long public key without any authorization data
  occupies 306 octets (AUTHL is only used when AUTHT has a value != 0),
  plus 58 octets for hybrid public key encryption (see
  section 5.1 of tor-spec on hybrid encryption of CREATE cells). The
  surrounding INTRODUCE1V cell requires 24 octets. This leaves only 110
  of the 498 available octets free, which must be shared between
  authorization data to the introduction point _and_ to the hidden
  service.

  When receiving a v3 INTRODUCE2 cell, Bob checks whether a client has
  provided valid authorization data to him. He also requires that the
  timestamp is no more than 30 minutes in the past or future and that the
  first part of the Diffie-Hellman handshake has not been used in the past
  60 minutes to prevent replay attacks by rogue introduction points. (The
  reason for not using the rendezvous cookie to detect replays---even
  though it is only sent once in the current design---is that it might be
  desirable to re-use rendezvous cookies for multiple introduction requests
  in the future.) If all checks pass, Bob builds a circuit to the provided
  rendezvous point. Otherwise he drops the cell.

  1.4. Summary of authorization data fields

  In summary, the proposed descriptor format and cell formats provide the
  following fields for carrying authorization data:

  (1) The v2 hidden service descriptor contains:
      - a descriptor cookie that is used for the lookup process, and
      - an arbitrary encryption schema to ensure authorization to access
        introduction information (currently symmetric encryption with the
        descriptor cookie).

  (2) For performing authorization at the introduction point we can use:
      - the fields intro-authorization and service-authorization in
        hidden service descriptors,
      - a maximum of 215 octets in the ESTABLISH_INTRO cell, and
      - one part of 110 octets in the INTRODUCE1V cell.

  (3) For performing authorization at the hidden service we can use:
      - the fields intro-authorization and service-authorization in
        hidden service descriptors,
      - the other part of 110 octets in the INTRODUCE2 cell.

  It will also still be possible to access a hidden service without any
  authorization or only use a part of the authorization infrastructure.
  However, this requires to consider all parts of the infrastructure. For
  example, authorization at the introduction point relying on confidential
  intro-authorization data transported in the hidden service descriptor
  cannot be performed without using an encryption schema for introduction
  information.

  1.5. Managing authorization data at servers and clients

  In order to provide authorization data at the hidden service and the
  authenticated clients, we propose to use files---either the Tor
  configuration file or separate files. The exact format of these special
  files depends on the authorization protocol used.

  Currently, rend-spec contains the proposition to encode client-side
  authorization data in the URL, like in x.y.z.onion. This was never used
  and is also a bad idea, because in case of HTTP the requested URL may be
  contained in the Host and Referer fields.

  1.6. Limitations for authorization protocols

  There are two limitations of the current hidden service protocol for
  authorization protocols that shall be identified here.

    1. The three cell types ESTABLISH_INTRO, INTRODUCE1V, and INTRODUCE2
       restricts the amount of data that can be used for authorization.
       This forces authorization protocols that require per-user
       authorization data at the introduction point to restrict the number
       of authorized clients artificially. A possible solution could be to
       split contents among multiple cells and reassemble them at the
       introduction points.

    2. The current hidden service protocol does not specify cell types to
       perform interactive authorization between client and introduction
       point or hidden service. If there should be an authorization
       protocol that requires interaction, new cell types would have to be
       defined and integrated into the hidden service protocol.


  2. Specific authorization protocol instances

  In the following we present two specific authorization protocols that
  make use of (parts of) the new authorization infrastructure:

    1. The first protocol allows a service provider to restrict access
       to clients with a previously received secret key only, but does not
       attempt to hide service activity from others.

    2. The second protocol, albeit being feasible for a limited set of about
       16 clients, performs client authorization and hides service activity
       from everyone but the authorized clients.

  These two protocol instances extend the existing hidden service protocol
  version 2. Hidden services that perform client authorization may run in
  parallel to other services running versions 0, 2, or both.

  2.1. Service with large-scale client authorization

  The first client authorization protocol aims at performing access control
  while consuming as few additional resources as possible. A service
  provider should be able to permit access to a large number of clients
  while denying access for everyone else. However, the price for
  scalability is that the service won't be able to hide its activity from
  unauthorized or formerly authorized clients.

  The main idea of this protocol is to encrypt the introduction-point part
  in hidden service descriptors to authorized clients using symmetric keys.
  This ensures that nobody else but authorized clients can learn which
  introduction points a service currently uses, nor can someone send a
  valid INTRODUCE1 message without knowing the introduction key. Therefore,
  a subsequent authorization at the introduction point is not required.

  A service provider generates symmetric "descriptor cookies" for his
  clients and distributes them outside of Tor. The suggested key size is
  128 bits, so that descriptor cookies can be encoded in 22 base64 chars
  (which can hold up to 22 * 5 = 132 bits, leaving 4 bits to encode the
  authorization type (here: "0") and allow a client to distinguish this
  authorization protocol from others like the one proposed below).
  Typically, the contact information for a hidden service using this
  authorization protocol looks like this:

    v2cbb2l4lsnpio4q.onion Ll3X7Xgz9eHGKCCnlFH0uz

  When generating a hidden service descriptor, the service encrypts the
  introduction-point part with a single randomly generated symmetric
  128-bit session key using AES-CTR as described for v2 hidden service
  descriptors in rend-spec. Afterwards, the service encrypts the session
  key to all descriptor cookies using AES. Authorized client should be able
  to efficiently find the session key that is encrypted for him/her, so
  that 4 octet long client ID are generated consisting of descriptor cookie
  and initialization vector. Descriptors always contain a number of
  encrypted session keys that is a multiple of 16 by adding fake entries.
  Encrypted session keys are ordered by client IDs in order to conceal
  addition or removal of authorized clients by the service provider.

     ATYPE  Authorization type: set to 1.                      [1 octet]
     ALEN   Number of clients := 1 + ((clients - 1) div 16)    [1 octet]
   for each symmetric descriptor cookie:
     ID     Client ID: H(descriptor cookie | IV)[:4]          [4 octets]
     SKEY   Session key encrypted with descriptor cookie     [16 octets]
   (end of client-specific part)
     RND    Random data      [(15 - ((clients - 1) mod 16)) * 20 octets]
     IV     AES initialization vector                        [16 octets]
     IPOS   Intro points, encrypted with session key  [remaining octets]

  An authorized client needs to configure Tor to use the descriptor cookie
  when accessing the hidden service. Therefore, a user adds the contact
  information that she received from the service provider to her torrc
  file. Upon downloading a hidden service descriptor, Tor finds the
  encrypted introduction-point part and attempts to decrypt it using the
  configured descriptor cookie. (In the rare event of two or more client
  IDs being equal a client tries to decrypt all of them.)

  Upon sending the introduction, the client includes her descriptor cookie
  as auth type "1" in the INTRODUCE2 cell that she sends to the service.
  The hidden service checks whether the included descriptor cookie is
  authorized to access the service and either responds to the introduction
  request, or not.

  2.2. Authorization for limited number of clients

  A second, more sophisticated client authorization protocol goes the extra
  mile of hiding service activity from unauthorized clients. With all else
  being equal to the preceding authorization protocol, the second protocol
  publishes hidden service descriptors for each user separately and gets
  along with encrypting the introduction-point part of descriptors to a
  single client. This allows the service to stop publishing descriptors for
  removed clients. As long as a removed client cannot link descriptors
  issued for other clients to the service, it cannot derive service
  activity any more. The downside of this approach is limited scalability.
  Even though the distributed storage of descriptors (cf. proposal 114)
  tackles the problem of limited scalability to a certain extent, this
  protocol should not be used for services with more than 16 clients. (In
  fact, Tor should refuse to advertise services for more than this number
  of clients.)

  A hidden service generates an asymmetric "client key" and a symmetric
  "descriptor cookie" for each client. The client key is used as
  replacement for the service's permanent key, so that the service uses a
  different identity for each of his clients. The descriptor cookie is used
  to store descriptors at changing directory nodes that are unpredictable
  for anyone but service and client, to encrypt the introduction-point
  part, and to be included in INTRODUCE2 cells. Once the service has
  created client key and descriptor cookie, he tells them to the client
  outside of Tor. The contact information string looks similar to the one
  used by the preceding authorization protocol (with the only difference
  that it has "1" encoded as auth-type in the remaining 4 of 132 bits
  instead of "0" as before).

  When creating a hidden service descriptor for an authorized client, the
  hidden service uses the client key and descriptor cookie to compute
  secret ID part and descriptor ID:

    secret-id-part = H(time-period | descriptor-cookie | replica)

    descriptor-id = H(client-key[:10] | secret-id-part)

  The hidden service also replaces permanent-key in the descriptor with
  client-key and encrypts introduction-points with the descriptor cookie.

     ATYPE  Authorization type: set to 2.                         [1 octet]
     IV     AES initialization vector                           [16 octets]
     IPOS   Intro points, encr. with descriptor cookie   [remaining octets]

  When uploading descriptors, the hidden service needs to make sure that
  descriptors for different clients are not uploaded at the same time (cf.
  Section 1.1) which is also a limiting factor for the number of clients.

  When a client is requested to establish a connection to a hidden service
  it looks up whether it has any authorization data configured for that
  service. If the user has configured authorization data for authorization
  protocol "2", the descriptor ID is determined as described in the last
  paragraph. Upon receiving a descriptor, the client decrypts the
  introduction-point part using its descriptor cookie. Further, the client
  includes its descriptor cookie as auth-type "2" in INTRODUCE2 cells that
  it sends to the service.

  2.3. Hidden service configuration

  A hidden service that is meant to perform client authorization adds a
  new option HiddenServiceAuthorizeClient to its hidden service
  configuration. This option contains the authorization type which is
  either "1" for the protocol described in 2.1 or "2" for the protocol in
  2.2 and a comma-separated list of human-readable client names, so that
  Tor can create authorization data for these clients:

    HiddenServiceAuthorizeClient auth-type client-name,client-name,...

  If this option is configured, HiddenServiceVersion is automatically
  reconfigured to contain only version numbers of 2 or higher.

  Tor stores all generated authorization data for the authorization
  protocols described in Sections 2.1 and 2.2 in a new file using the
  following file format:

     "client-name" human-readable client identifier NL
     "descriptor-cookie" 128-bit key ^= 22 base64 chars NL

  If the authorization protocol of Section 2.2 is used, Tor also generates
  and stores the following data:

     "client-key" NL a public key in PEM format

  2.4. Client configuration

  Clients need to make their authorization data known to Tor using another
  configuration option that contains a service name (mainly for the sake of
  convenience), the service address, and the descriptor cookie that is
  required to access a hidden service (the authorization protocol number is
  encoded in the descriptor cookie):

    HidServAuth service-name service-address descriptor-cookie

Security implications:

  In the following we want to discuss possible attacks by dishonest
  entities in the presented infrastructure and specific protocol. These
  security implications would have to be verified once more when adding
  another protocol. The dishonest entities (theoretically) include the
  hidden service itself, the authenticated clients, hidden service directory
  nodes, introduction points, and rendezvous points. The relays that are
  part of circuits used during protocol execution, but never learn about
  the exchanged descriptors or cells by design, are not considered.
  Obviously, this list makes no claim to be complete. The discussed attacks
  are sorted by the difficulty to perform them, in ascending order,
  starting with roles that everyone could attempt to take and ending with
  partially trusted entities abusing the trust put in them.

  (1) A hidden service directory could attempt to conclude presence of a
  service from the existence of a locally stored hidden service descriptor:
  This passive attack is possible only for a single client-service
  relation, because descriptors need to contain a publicly visible
  signature of the service using the client key.
  A possible protection would be to increase the number of hidden service
  directories in the network.

  (2) A hidden service directory could try to break the descriptor cookies
  of locally stored descriptors: This attack can be performed offline. The
  only useful countermeasure against it might be using safe passwords that
  are generated by Tor.

[passwords? where did those come in? -RD]

  (3) An introduction point could try to identify the pseudonym of the
  hidden service on behalf of which it operates: This is impossible by
  design, because the service uses a fresh public key for every
  establishment of an introduction point (see proposal 114) and the
  introduction point receives a fresh introduction cookie, so that there is
  no identifiable information about the service that the introduction point
  could learn. The introduction point cannot even tell if client accesses
  belong to the same client or not, nor can it know the total number of
  authorized clients. The only information might be the pattern of
  anonymous client accesses, but that is hardly enough to reliably identify
  a specific service.

  (4) An introduction point could want to learn the identities of accessing
  clients: This is also impossible by design, because all clients use the
  same introduction cookie for authorization at the introduction point.

  (5) An introduction point could try to replay a correct INTRODUCE1 cell
  to other introduction points of the same service, e.g. in order to force
  the service to create a huge number of useless circuits: This attack is
  not possible by design, because INTRODUCE1 cells are encrypted using a
  freshly created introduction key that is only known to authorized
  clients.

  (6) An introduction point could attempt to replay a correct INTRODUCE2
  cell to the hidden service, e.g. for the same reason as in the last
  attack: This attack is stopped by the fact that a service will drop
  INTRODUCE2 cells containing a DH handshake they have seen recently.

  (7) An introduction point could block client requests by sending either
  positive or negative INTRODUCE_ACK cells back to the client, but without
  forwarding INTRODUCE2 cells to the server: This attack is an annoyance
  for clients, because they might wait for a timeout to elapse until trying
  another introduction point. However, this attack is not introduced by
  performing authorization and it cannot be targeted towards a specific
  client. A countermeasure might be for the server to periodically perform
  introduction requests to his own service to see if introduction points
  are working correctly.

  (8) The rendezvous point could attempt to identify either server or
  client: This remains impossible as it was before, because the
  rendezvous cookie does not contain any identifiable information.

  (9) An authenticated client could swamp the server with valid INTRODUCE1
  and INTRODUCE2 cells, e.g. in order to force the service to create
  useless circuits to rendezvous points; as opposed to an introduction
  point replaying the same INTRODUCE2 cell, a client could include a new
  rendezvous cookie for every request: The countermeasure for this attack
  is the restriction to 10 connection establishments per client per hour.

Compatibility:

  An implementation of this proposal would require changes to hidden
  services and clients to process authorization data and encode and
  understand the new formats. However, both services and clients would
  remain compatible to regular hidden services without authorization.

Implementation:

  The implementation of this proposal can be divided into a number of
  changes to hidden service and client side. There are no
  changes necessary on directory, introduction, or rendezvous nodes. All
  changes are marked with either [service] or [client] do denote on which
  side they need to be made.

  /1/ Configure client authorization [service]

  - Parse configuration option HiddenServiceAuthorizeClient containing
    authorized client names.
  - Load previously created client keys and descriptor cookies.
  - Generate missing client keys and descriptor cookies, add them to
    client_keys file.
  - Rewrite the hostname file.
  - Keep client keys and descriptor cookies of authorized clients in
    memory.
 [- In case of reconfiguration, mark which client authorizations were
    added and whether any were removed. This can be used later when
    deciding whether to rebuild introduction points and publish new
    hidden service descriptors. Not implemented yet.]

  /2/ Publish hidden service descriptors [service]

  - Create and upload hidden service descriptors for all authorized
    clients.
 [- See /1/ for the case of reconfiguration.]

  /3/ Configure permission for hidden services [client]

  - Parse configuration option HidServAuth containing service
    authorization, store authorization data in memory.

  /5/ Fetch hidden service descriptors [client]

  - Look up client authorization upon receiving a hidden service request.
  - Request hidden service descriptor ID including client key and
    descriptor cookie. Only request v2 descriptors, no v0.

  /6/ Process hidden service descriptor [client]

  - Decrypt introduction points with descriptor cookie.

  /7/ Create introduction request [client]

  - Include descriptor cookie in INTRODUCE2 cell to introduction point.
  - Pass descriptor cookie around between involved connections and
    circuits.

  /8/ Process introduction request [service]

  - Read descriptor cookie from INTRODUCE2 cell.
  - Check whether descriptor cookie is authorized for access, including
    checking access counters.
  - Log access for accountability.

Filename: 122-unnamed-flag.txt
Title: Network status entries need a new Unnamed flag
Author: Roger Dingledine
Created: 04-Oct-2007
Status: Closed
Implemented-In: 0.2.0.x

1. Overview:

  Tor's directory authorities can give certain servers a "Named" flag
  in the network-status entry, when they want to bind that nickname to
  that identity key. This allows clients to specify a nickname rather
  than an identity fingerprint and still be certain they're getting the
  "right" server. As dir-spec.txt describes it,

    Name X is bound to identity Y if at least one binding directory lists
    it, and no directory binds X to some other Y'.

  In practice, clients can refer to servers by nickname whether they are
  Named or not; if they refer to nicknames that aren't Named, a complaint
  shows up in the log asking them to use the identity key in the future
  --- but it still works.

  The problem? Imagine a Tor server with nickname Bob. Bob and his
  identity fingerprint are registered in tor26's approved-routers
  file, but none of the other authorities registered him. Imagine
  there are several other unregistered servers also with nickname Bob
  ("the imposters").

  While Bob is online, all is well: a) tor26 gives a Named flag to
  the real one, and refuses to list the other ones; and b) the other
  authorities list the imposters but don't give them a Named flag. Clients
  who have all the network-statuses can compute which one is the real Bob.

  But when the real Bob disappears and his descriptor expires? tor26
  continues to refuse to list any of the imposters, and the other
  authorities continue to list the imposters. Clients don't have any
  idea that there exists a Named Bob, so they can ask for server Bob and
  get one of the imposters. (A warning will also appear in their log,
  but so what.)

2. The stopgap solution:

  tor26 should start accepting and listing the imposters, but it should
  assign them a new flag: "Unnamed".

  This would produce three cases in terms of assigning flags in the consensus
  networkstatus:

  i) a router gets the Named flag in the v3 networkstatus if
    a) it's the only router with that nickname that has the Named flag
       out of all the votes, and
    b) no vote lists it as Unnamed
  else,
  ii) a router gets the Unnamed flag if
    a) some vote lists a different router with that nickname as Named, or
    b) at least one vote lists it as Unnamed, or
    c) there are other routers with the same nickname that are Unnamed
  else,
  iii) the router neither gets a Named nor an Unnamed flag.

  (This whole proposal is meant only for v3 dir flags; we shouldn't try
  to backport it to the v2 dir world.)

  Then client behavior is:

  a) If there's a Bob with a Named flag, pick that one.
  else b) If the Bobs don't have the Unnamed flag (notice that they should
          either all have it, or none), pick one of them and warn.
  else c) They all have the Unnamed flag -- no router found.

3. Problems not solved by this stopgap:

  3.1. Naming authorities can go offline.

  If tor26 is the only authority that provides a binding for Bob, when
  tor26 goes offline we're back in our previous situation -- the imposters
  can be referenced with a mere ignorable warning in the client's log.

  If some other authority Names a different Bob, and tor26 goes offline,
  then that other Bob becomes the unique Named Bob.

  So be it. We should try to solve these one day, but there's no clear way
  to do it that doesn't destroy usability in other ways, and if we want
  to get the Unnamed flag into v3 network statuses we should add it soon.

  3.2. V3 dir spec magnifies brief discrepancies.

  Another point to notice is if tor26 names Bob(1), doesn't know about
  Bob(2), but moria lists Bob(2). Then Bob(2) doesn't get an Unnamed flag
  even if it should (and Bob(1) is not around).

  Right now, in v2 dirs, the case where an authority doesn't know about
  a server but the other authorities do know is rare. That's because
  authorities periodically ask for other networkstatuses and then fetch
  descriptors that are missing.

  With v3, if that window occurs at the wrong time, it is extended for the
  entire period. We could solve this by making the voting more complex,
  but that doesn't seem worth it.

  [3.3. Tor26 is only one tor26.

  We need more naming authorities, possibly with some kind of auto-naming
  feature.  This is out-of-scope for this proposal -NM]

4. Changes to the v2 directory

  Previously, v2 authorities that had a binding for a server named Bob did
  not list any other server named Bob.  This will change too:

  Version 2 authorities will start listing all routers they know about,
  whether they conflict with a name-binding or not:  Servers for which
  this authority has a binding will continue to be marked Named,
  additionally all other servers of that nickname will be listed without the
  Named flag (i.e. there will be no Unnamed flag in v2 status documents).

  Clients already should handle having a named Bob alongside unnamed
  Bobs correctly, and having the unnamed Bobs in the status file even
  without the named server is no worse than the current status quo where
  clients learn about those servers from other authorities.

  The benefit of this is that an authority's opinion on a server like
  Guard, Stable, Fast etc. can now be learned by clients even if that
  specific authority has reserved that server's name for somebody else.

5. Other benefits:

  This new flag will allow people to operate servers that happen to have
  the same nickname as somebody who registered their server two years ago
  and left soon after. Right now there are dozens of nicknames that are
  registered on all three binding directory authorities, yet haven't been
  running for years. While it's bad that these nicknames are effectively
  blacklisted from the network, the really bad part is that this logic
  is really unintuitive to prospective new server operators.

Filename: 123-autonaming.txt
Title: Naming authorities automatically create bindings
Author: Peter Palfrader
Created: 2007-10-11
Status: Closed
Implemented-In: 0.2.0.x

Overview:

  Tor's directory authorities can give certain servers a "Named" flag
  in the network-status entry, when they want to bind that nickname to
  that identity key. This allows clients to specify a nickname rather
  than an identity fingerprint and still be certain they're getting the
  "right" server.

  Authority operators name a server by adding their nickname and
  identity fingerprint to the 'approved-routers' file.  Historically
  being listed in the file was required for a router, at first for being
  listed in the directory at all, and later in order to be used by
  clients as a first or last hop of a circuit.

  Adding identities to the list of named routers so far has been a
  manual, time consuming, and boring job.  Given that and the fact that
  the Tor network works just fine without named routers the last
  authority to keep a current binding list stopped updating it well over
  half a year ago.

  Naming, if it were done, would serve a useful purpose however in that
  users can have a reasonable expectation that the exit server Bob they
  are using in their http://www.google.com.bob.exit/ URL is the same
  Bob every time.

Proposal:
  I propose that identity<->name binding be completely automated:

  New bindings should be added after the router has been around for a
  bit and their name has not been used by other routers, similarly names
  that have not appeared on the network for a long time should be freed
  in case a new router wants to use it.

  The following rules are suggested:
  i) If a named router has not been online for half a year, the
     identity<->name binding for that name is removed.  The nickname
     is free to be taken by other routers now.
  ii) If a router claims a certain nickname and
       a) has been on the network for at least two weeks, and
       b) that nickname is not yet linked to a different router, and
       c) no other router has wanted that nickname in the last month,
      a new binding should be created for this router and its desired
      nickname.

 This automaton does not necessarily need to live in the Tor code, it
 can do its job just as well when it's an external tool.

Filename: 124-tls-certificates.txt
Title: Blocking resistant TLS certificate usage
Author: Steven J. Murdoch
Created: 2007-10-25
Status: Superseded

Overview:

  To be less distinguishable from HTTPS web browsing, only Tor servers should
  present TLS certificates. This should be done whilst maintaining backwards
  compatibility with Tor nodes which present and expect client certificates, and
  while preserving existing security properties. This specification describes
  the negotiation protocol, what certificates should be presented during the TLS
  negotiation, and how to move the client authentication within the encrypted
  tunnel.

Motivation:

  In Tor's current TLS [1] handshake, both client and server present a
  two-certificate chain. Since TLS performs authentication prior to establishing
  the encrypted tunnel, the contents of these certificates are visible to an
  eavesdropper. In contrast, during normal HTTPS web browsing, the server
  presents a single certificate, signed by a root CA and the client presents no
  certificate. Hence it is possible to distinguish Tor from HTTP by identifying
  this pattern.

  To resist blocking based on traffic identification, Tor should behave as close
  to HTTPS as possible, i.e. servers should offer a single certificate and not
  request a client certificate; clients should present no certificate. This
  presents two difficulties: clients are no longer authenticated and servers are
  authenticated by the connection key, rather than identity key. The link
  protocol must thus be modified to preserve the old security semantics.

  Finally, in order to maintain backwards compatibility, servers must correctly
  identify whether the client supports the modified certificate handling. This
  is achieved by modifying the cipher suites that clients advertise support
  for. These cipher suites are selected to be similar to those chosen by web
  browsers, in order to resist blocking based on client hello.

Terminology:

  Initiator: OP or OR which initiates a TLS connection ("client" in TLS
   terminology)
  
  Responder: OR which receives an incoming TLS connection ("server" in TLS
   terminology) 

Version negotiation and cipher suite selection:

  In the modified TLS handshake, the responder does not request a certificate
  from the initiator. This request would normally occur immediately after the
  responder receives the client hello (the first message in a TLS handshake) and
  so the responder must decide whether to request a certificate based only on
  the information in the client hello. This is achieved by examining the cipher
  suites in the client hello.

   List 1: cipher suites lists offered by version 0/1 Tor

   From src/common/tortls.c, revision 12086:
    TLS1_TXT_DHE_RSA_WITH_AES_128_SHA 
    TLS1_TXT_DHE_RSA_WITH_AES_128_SHA : SSL3_TXT_EDH_RSA_DES_192_CBC3_SHA
    SSL3_TXT_EDH_RSA_DES_192_CBC3_SHA

 Client hello sent by initiator:

  Initiators supporting version 2 of the Tor connection protocol MUST
  offer a different cipher suite list from those sent by pre-version 2
  Tors, contained in List 1. To maintain compatibility with older Tor
  versions and common browsers, the cipher suite list MUST include
  support for:

   TLS_DHE_RSA_WITH_AES_256_CBC_SHA
   TLS_DHE_RSA_WITH_AES_128_CBC_SHA
   SSL_DHE_RSA_WITH_3DES_EDE_CBC_SHA
   SSL_DHE_DSS_WITH_3DES_EDE_CBC_SHA

 Client hello received by responder/server hello sent by responder:

  Responders supporting version 2 of the Tor connection protocol should compare
  the cipher suite list in the client hello with those in List 1. If it matches
  any in the list then the responder should assume that the initiatior supports
  version 1, and thus should maintain the version 1 behavior, i.e. send a
  two-certificate chain, request a client certificate and do not send or expect
  a VERSIONS cell [2].

  Otherwise, the responder should assume version 2 behavior and select a cipher
  suite following TLS [1] behavior, i.e. select the first entry from the client
  hello cipher list which is acceptable. Responders MUST NOT select any suite
  that lacks ephemeral keys, or whose symmetric keys are less then KEY_LEN bits,
  or whose digests are less than HASH_LEN bits. Implementations SHOULD NOT
  allow other SSLv3 ciphersuites. 

  Should no mutually acceptable cipher suite be found, the connection MUST be
  closed.

  If the responder is implementing version 2 of the connection protocol it
  SHOULD send a server certificate with random contents. The organizationName
  field MUST NOT be "Tor", "TOR" or "t o r".

 Server certificate received by initiator:

  If the server certificate has an organizationName of "Tor", "TOR" or "t o r",
  the initiator should assume that the responder does not support version 2 of
  the connection protocol. In which case the initiator should respond following
  version 1, i.e. send a two-certificate client chain and do not send or expect
  a VERSIONS cell.

  [SJM: We could also use the fact that a client certificate request was sent]
  
  If the server hello contains a ciphersuite which does not comply with the key
  length requirements above, even if it was one offered in the client hello, the
  connection MUST be closed. This will only occur if the responder is not a Tor
  server.

 Backward compatibility:

  v1 Initiator, v1 Responder: No change
  v1 Initiator, v2 Responder: Responder detects v1 initiator by client hello
  v2 Initiator, v1 Responder: Responder accepts v2 client hello. Initiator
   detects v1 server certificate and continues with v1 protocol
  v2 Initiator, v2 Responder: Responder accepts v2 client hello. Initiator
   detects v2 server certificate and continues with v2 protocol.

 Additional link authentication process:

  Following VERSION and NETINFO negotiation, both responder and
  initiator MUST send a certification chain in a CERT cell. If one
  party does not have a certificate, the CERT cell MUST still be sent,
  but with a length of zero.

  A CERT cell is a variable length cell, of the format
        CircID                                [2 bytes]
        Command                               [1 byte]
        Length                                [2 bytes]
        Payload                               [<length> bytes]

  CircID MUST set to be 0x0000
  Command is [SJM: TODO]
  Length is the length of the payload
  Payload contains 0 or more certificates, each is of the format:
        Cert_Length  [2 bytes]
        Certificate  [<cert_length> bytes]

  Each certificate MUST sign the one preceding it. The initator MUST
  place its connection certificate first; the responder, having
  already sent its connection certificate as part of the TLS handshake
  MUST place its identity certificate first.

  Initiators who send a CERT cell MUST follow that with an LINK_AUTH
  cell to prove that they posess the corresponding private key.  

  A LINK_AUTH cell is fixed-lenth, of the format:
         CircID                                [2 bytes]
         Command                               [1 byte]
         Length                                [2 bytes]
         Payload (padded with 0 bytes)         [PAYLOAD_LEN - 2 bytes]

  CircID MUST set to be 0x0000
  Command is [SJM: TODO]
  Length is the valid portion of the payload
  Payload is of the format:
         Signature version                     [1 byte]
         Signature                             [<length> - 1 bytes]
         Padding                               [PAYLOAD_LEN - <length> - 2 bytes]

  Signature version: Identifies the type of signature, currently 0x00
  Signature: Digital signature under the initiator's connection key of the
   following item, in PKCS #1 block type 1 [3] format:

    HMAC-SHA1, using the TLS master secret as key, of the
    following elements concatenated:
     - The signature version (0x00)
     - The NUL terminated ASCII string: "Tor initiator certificate verification"
     - client_random, as sent in the Client Hello
     - server_random, as sent in the Server Hello
     - SHA-1 hash of the initiator connection certificate
     - SHA-1 hash of the responder connection certificate

  Security checks:

    - Before sending a LINK_AUTH cell, a node MUST ensure that the TLS
      connection is authenticated by the responder key.
    - For the handshake to have succeeded, the initiator MUST confirm:
       - That the TLS handshake was authenticated by the 
         responder connection key
       - That the responder connection key was signed by the first
         certificate in the CERT cell
       - That each certificate in the CERT cell was signed by the
         following certificate, with the exception of the last
       - That the last certificate in the CERT cell is the expected
         identity certificate for the node being connected to
    - For the handshake to have succeeded, the responder MUST confirm
      either:
       A) - A zero length CERT cell was sent and no LINK_AUTH cell was
            sent
          In which case the responder shall treat the identity of the
          initiator as unknown
        or
       B) - That the LINK_AUTH MAC contains a signature by the first
            certificate in the CERT cell
          - That the MAC signed matches the expected value
          - That each certificate in the CERT cell was signed by the
            following certificate, with the exception of the last
          In which case the responder shall treat the identity of the
          initiator as that of the last certificate in the CERT cell

  Protocol summary:

  1. I(nitiator) <-> R(esponder): TLS handshake, including responder
                               authentication under connection certificate R_c
  2. I <->: VERSION and NETINFO negotiation
  3. R -> I: CERT (Responder identity certificate R_i (which signs R_c))
  4. I -> R: CERT (Initiator connection certificate I_c, 
                   Initiator identity certificate I_i (which signs I_c)
  5. I -> R: LINK_AUTH (Signature, under I_c of HMAC-SHA1(master_secret,
                    "Tor initiator certificate verification" ||
                    client_random || server_random ||
                    I_c hash || R_c hash)

  Notes: I -> R doesn't need to wait for R_i before sending its own
   messages (reduces round-trips).
   Certificate hash is calculated like identity hash in CREATE cells.
   Initiator signature is calculated in a similar way to Certificate
   Verify messages in TLS 1.1 (RFC4346, Sections 7.4.8 and 4.7).
   If I is an OP, a zero length certificate chain may be sent in step 4;
   In which case, step 5 is not performed

  Rationale: 

  - Version and netinfo negotiation before authentication: The version cell needs
   to come before before the rest of the protocol, since we may choose to alter
   the rest at some later point, e.g switch to a different MAC/signature scheme.
   It is useful to keep the NETINFO and VERSION cells close to each other, since
   the time between them is used to check if there is a delay-attack. Still, a
   server might want to not act on NETINFO data from an initiator until the
   authentication is complete.

Appendix A: Cipher suite choices

  This specification intentionally does not put any constraints on the
  TLS ciphersuite lists presented by clients, other than a minimum
  required for compatibility. However, to maximize blocking
  resistance, ciphersuite lists should be carefully selected.

   Recommended client ciphersuite list

     Source: http://lxr.mozilla.org/security/source/security/nss/lib/ssl/sslproto.h

     0xc00a: TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA  
     0xc014: TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA 
     0x0039: TLS_DHE_RSA_WITH_AES_256_CBC_SHA 
     0x0038: TLS_DHE_DSS_WITH_AES_256_CBC_SHA
     0xc00f: TLS_ECDH_RSA_WITH_AES_256_CBC_SHA 
     0xc005: TLS_ECDH_ECDSA_WITH_AES_256_CBC_SHA 
     0x0035: TLS_RSA_WITH_AES_256_CBC_SHA
     0xc007: TLS_ECDHE_ECDSA_WITH_RC4_128_SHA 
     0xc009: TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA 
     0xc011: TLS_ECDHE_RSA_WITH_RC4_128_SHA
     0xc013: TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA 
     0x0033: TLS_DHE_RSA_WITH_AES_128_CBC_SHA 
     0x0032: TLS_DHE_DSS_WITH_AES_128_CBC_SHA 
     0xc00c: TLS_ECDH_RSA_WITH_RC4_128_SHA
     0xc00e: TLS_ECDH_RSA_WITH_AES_128_CBC_SHA
     0xc002: TLS_ECDH_ECDSA_WITH_RC4_128_SHA  
     0xc004: TLS_ECDH_ECDSA_WITH_AES_128_CBC_SHA 
     0x0004: SSL_RSA_WITH_RC4_128_MD5 
     0x0005: SSL_RSA_WITH_RC4_128_SHA 
     0x002f: TLS_RSA_WITH_AES_128_CBC_SHA 
     0xc008: TLS_ECDHE_ECDSA_WITH_3DES_EDE_CBC_SHA 
     0xc012: TLS_ECDHE_RSA_WITH_3DES_EDE_CBC_SHA
     0x0016: SSL_DHE_RSA_WITH_3DES_EDE_CBC_SHA  
     0x0013: SSL_DHE_DSS_WITH_3DES_EDE_CBC_SHA 
     0xc00d: TLS_ECDH_RSA_WITH_3DES_EDE_CBC_SHA 
     0xc003: TLS_ECDH_ECDSA_WITH_3DES_EDE_CBC_SHA
     0xfeff: SSL_RSA_FIPS_WITH_3DES_EDE_CBC_SHA (168-bit Triple DES with RSA and a SHA1 MAC)
     0x000a: SSL_RSA_WITH_3DES_EDE_CBC_SHA 

     Order specified in:
      http://lxr.mozilla.org/security/source/security/nss/lib/ssl/sslenum.c#47

   Recommended options:
      0x0000: Server Name Indication [4]
      0x000a: Supported Elliptic Curves [5]
      0x000b: Supported Point Formats [5]

   Recommended compression:
      0x00

   Recommended server ciphersuite selection:

     The responder should select the first entry in this list which is
     listed in the client hello:

     0x0039: TLS_DHE_RSA_WITH_AES_256_CBC_SHA  [ Common Firefox choice ]
     0x0033: TLS_DHE_RSA_WITH_AES_128_CBC_SHA  [ Tor v1 default ] 
     0x0016: SSL_DHE_RSA_WITH_3DES_EDE_CBC_SHA [ Tor v1 fallback ]
     0x0013: SSL_DHE_DSS_WITH_3DES_EDE_CBC_SHA [ Valid IE option ]

References:

[1] The Transport Layer Security (TLS) Protocol, Version 1.1, RFC4346, IETF

[2] Version negotiation for the Tor protocol, Tor proposal 105

[3] B. Kaliski, "Public-Key Cryptography Standards (PKCS) #1:
    RSA Cryptography Specifications Version 1.5", RFC 2313,
    March 1998.

[4] TLS Extensions, RFC 3546

[5] Elliptic Curve Cryptography (ECC) Cipher Suites for Transport Layer Security (TLS)

% <!-- Local IspellDict: american -->
Filename: 125-bridges.txt
Title: Behavior for bridge users, bridge relays, and bridge authorities
Author: Roger Dingledine
Created: 11-Nov-2007
Status: Closed
Implemented-In: 0.2.0.x

0. Preface

  This document describes the design decisions around support for bridge
  users, bridge relays, and bridge authorities. It acts as an overview
  of the bridge design and deployment for developers, and it also tries
  to point out limitations in the current design and implementation.

  For more details on what all of these mean, look at blocking.tex in
  /doc/design-paper/

1. Bridge relays

  Bridge relays are just like normal Tor relays except they don't publish
  their server descriptors to the main directory authorities.

1.1. PublishServerDescriptor

  To configure your relay to be a bridge relay, just add
    BridgeRelay 1
    PublishServerDescriptor bridge
  to your torrc. This will cause your relay to publish its descriptor
  to the bridge authorities rather than to the default authorities.

  Alternatively, you can say
    BridgeRelay 1
    PublishServerDescriptor 0
  which will cause your relay to not publish anywhere. This could be
  useful for private bridges.

1.2. Exit policy

  Bridge relays should use an exit policy of "reject *:*". This is
  because they only need to relay traffic between the bridge users
  and the rest of the Tor network, so there's no need to let people
  exit directly from them.

1.3. RelayBandwidthRate / RelayBandwidthBurst

  We invented the RelayBandwidth* options for this situation: Tor clients
  who want to allow relaying too. See proposal 111 for details. Relay
  operators should feel free to rate-limit their relayed traffic.

1.4. Helping the user with port forwarding, NAT, etc.

  Just as for operating normal relays, our documentation and hints for
  how to make your ORPort reachable are inadequate for normal users.

  We need to work harder on this step, perhaps in 0.2.2.x.

1.5. Vidalia integration

  Vidalia has turned its "Relay" settings page into a tri-state
  "Don't relay" / "Relay for the Tor network" / "Help censored users".

  If you click the third choice, it forces your exit policy to reject *:*.

  If all the bridges end up on port 9001, that's not so good. On the
  other hand, putting the bridges on a low-numbered port in the Unix
  world requires jumping through extra hoops. The current compromise is
  that Vidalia makes the ORPort default to 443 on Windows, and 9001 on
  other platforms.

  At the bottom of the relay config settings window, Vidalia displays
  the bridge identifier to the operator (see Section 3.1) so he can pass
  it on to bridge users.

1.6. What if the default ORPort is already used?

  If the user already has a webserver or some other application
  bound to port 443, then Tor will fail to bind it and complain to the
  user, probably in a cryptic way. Rather than just working on a better
  error message (though we should do this), we should consider an
  "ORPort auto" option that tells Tor to try to find something that's
  bindable and reachable. This would also help us tolerate ISPs that
  filter incoming connections on port 80 and port 443. But this should
  be a different proposal, and can wait until 0.2.2.x.

2. Bridge authorities.

  Bridge authorities are like normal directory authorities, except they
  don't create their own network-status documents or votes. So if you
  ask an authority for a network-status document or consensus, they
  behave like a directory mirror: they give you one from one of the main
  authorities. But if you ask the bridge authority for the descriptor
  corresponding to a particular identity fingerprint, it will happily
  give you the latest descriptor for that fingerprint.

  To become a bridge authority, add these lines to your torrc:
    AuthoritativeDirectory 1
    BridgeAuthoritativeDir 1

  Right now there's one bridge authority, running on the Tonga relay.

2.1. Exporting bridge-purpose descriptors

  We've added a new purpose for server descriptors: the "bridge"
  purpose. With the new router-descriptors file format that includes
  annotations, it's easy to look through it and find the bridge-purpose
  descriptors.

  Currently we export the bridge descriptors from Tonga to the
  BridgeDB server, so it can give them out according to the policies
  in blocking.pdf.

2.2. Reachability/uptime testing

  Right now the bridge authorities do active reachability testing of
  bridges, so we know which ones to recommend for users.

  But in the design document, we suggested that bridges should publish
  anonymously (i.e. via Tor) to the bridge authority, so somebody watching
  the bridge authority can't just enumerate all the bridges. But if we're
  doing active measurement, the game is up. Perhaps we should back off on
  this goal, or perhaps we should do our active measurement anonymously?

  Answering this issue is scheduled for 0.2.1.x.

2.3. Migrating to multiple bridge authorities

  Having only one bridge authority is both a trust bottleneck (if you
  break into one place you learn about every single bridge we've got)
  and a robustness bottleneck (when it's down, bridge users become sad).

  Right now if we put up a second bridge authority, all the bridges would
  publish to it, and (assuming the code works) bridge users would query
  a random bridge authority. This resolves the robustness bottleneck,
  but makes the trust bottleneck even worse.

  In 0.2.2.x and later we should think about better ways to have multiple
  bridge authorities.

3. Bridge users.

  Bridge users are like ordinary Tor users except they use encrypted
  directory connections by default, and they use bridge relays as both
  entry guards (their first hop) and directory guards (the source of
  all their directory information).

  To become a bridge user, add the following line to your torrc:

    UseBridges 1

  and then add at least one "Bridge" line to your torrc based on the
  format below.

3.1. Format of the bridge identifier.

  The canonical format for a bridge identifier contains an IP address,
  an ORPort, and an identity fingerprint:
    bridge 128.31.0.34:9009 4C17 FB53 2E20 B2A8 AC19 9441 ECD2 B017 7B39 E4B1

  However, the identity fingerprint can be left out, in which case the
  bridge user will connect to that relay and use it as a bridge regardless
  of what identity key it presents:
    bridge 128.31.0.34:9009
  This might be useful for cases where only short bridge identifiers
  can be communicated to bridge users.

  In a future version we may also support bridge identifiers that are
  only a key fingerprint:
    bridge 4C17 FB53 2E20 B2A8 AC19 9441 ECD2 B017 7B39 E4B1
  and the bridge user can fetch the latest descriptor from the bridge
  authority (see Section 3.4).

3.2. Bridges as entry guards

  For now, bridge users add their bridge relays to their list of "entry
  guards" (see path-spec.txt for background on entry guards). They are
  managed by the entry guard algorithms exactly as if they were a normal
  entry guard -- their keys and timing get cached in the "state" file,
  etc. This means that when the Tor user starts up with "UseBridges"
  disabled, he will skip past the bridge entries since they won't be
  listed as up and usable in his networkstatus consensus. But to be clear,
  the "entry_guards" list doesn't currently distinguish guards by purpose.

  Internally, each bridge user keeps a smartlist of "bridge_info_t"
  that reflects the "bridge" lines from his torrc along with a download
  schedule (see Section 3.5 below). When he starts Tor, he attempts
  to fetch a descriptor for each configured bridge (see Section 3.4
  below). When he succeeds at getting a descriptor for one of the bridges
  in his list, he adds it directly to the entry guard list using the
  normal add_an_entry_guard() interface. Once a bridge descriptor has
  been added, should_delay_dir_fetches() will stop delaying further
  directory fetches, and the user begins to bootstrap his directory
  information from that bridge (see Section 3.3).

  Currently bridge users cache their bridge descriptors to the
  "cached-descriptors" file (annotated with purpose "bridge"), but
  they don't make any attempt to reuse descriptors they find in this
  file. The theory is that either the bridge is available now, in which
  case you can get a fresh descriptor, or it's not, in which case an
  old descriptor won't do you much good.

  We could disable writing out the bridge lines to the state file, if
  we think this is a problem.

  As an exception, if we get an application request when we have one
  or more bridge descriptors but we believe none of them are running,
  we mark them all as running again. This is similar to the exception
  already in place to help long-idle Tor clients realize they should
  fetch fresh directory information rather than just refuse requests.

3.3. Bridges as directory guards

  In addition to using bridges as the first hop in their circuits, bridge
  users also use them to fetch directory updates. Other than initial
  bootstrapping to find a working bridge descriptor (see Section 3.4
  below), all further non-anonymized directory fetches will be redirected
  to the bridge.

  This means that bridge relays need to have cached answers for all
  questions the bridge user might ask. This makes the upgrade path
  tricky --- for example, if we migrate to a v4 directory design, the
  bridge user would need to keep using v3 so long as his bridge relays
  only knew how to answer v3 queries.

  In a future design, for cases where the user has enough information
  to build circuits yet the chosen bridge doesn't know how to answer a
  given query, we might teach bridge users to make an anonymized request
  to a more suitable directory server.

3.4. How bridge users get their bridge descriptor

  Bridge users can fetch bridge descriptors in two ways: by going directly
  to the bridge and asking for "/tor/server/authority", or by going to
  the bridge authority and asking for "/tor/server/fp/ID". By default,
  they will only try the direct queries. If the user sets
    UpdateBridgesFromAuthority 1
  in his config file, then he will try querying the bridge authority
  first for bridges where he knows a digest (if he only knows an IP
  address and ORPort, then his only option is a direct query).

  If the user has at least one working bridge, then he will do further
  queries to the bridge authority through a full three-hop Tor circuit.
  But when bootstrapping, he will make a direct begin_dir-style connection
  to the bridge authority.

  As of Tor 0.2.0.10-alpha, if the user attempts to fetch a descriptor
  from the bridge authority and it returns a 404 not found, the user
  will automatically fall back to trying a direct query. Therefore it is
  recommended that bridge users always set UpdateBridgesFromAuthority,
  since at worst it will delay their fetches a little bit and notify
  the bridge authority of the identity fingerprint (but not location)
  of their intended bridges.

3.5. Bridge descriptor retry schedule

  Bridge users try to fetch a descriptor for each bridge (using the
  steps in Section 3.4 above) on startup. Whenever they receive a
  bridge descriptor, they reschedule a new descriptor download for 1
  hour from then.

  If on the other hand it fails, they try again after 15 minutes for the
  first attempt, after 15 minutes for the second attempt, and after 60
  minutes for subsequent attempts.

  In 0.2.2.x we should come up with some smarter retry schedules.

3.6. Vidalia integration

  Vidalia 0.0.16 has a checkbox in its Network config window called
  "My ISP blocks connections to the Tor network." Users who click that
  box change their configuration to:
    UseBridges 1
    UpdateBridgesFromAuthority 1
  and should specify at least one Bridge identifier.

3.7. Do we need a second layer of entry guards?

  If the bridge user uses the bridge as its entry guard, then the
  triangulation attacks from Lasse and Paul's Oakland paper work to
  locate the user's bridge(s).

  Worse, this is another way to enumerate bridges: if the bridge users
  keep rotating through second hops, then if you run a few fast servers
  (and avoid getting considered an Exit or a Guard) you'll quickly get
  a list of the bridges in active use.

  That's probably the strongest reason why bridge users will need to
  pick second-layer guards. Would this mean bridge users should switch
  to four-hop circuits?

  We should figure this out in the 0.2.1.x timeframe.

Filename: 126-geoip-reporting.txt
Title: Getting GeoIP data and publishing usage summaries
Author: Roger Dingledine
Created: 2007-11-24
Status: Closed
Implemented-In: 0.2.0.x

0. Status

  In 0.2.0.x, this proposal is implemented to the extent needed to
  address its motivations.  See notes below with the test "RESOLUTION"
  for details.

1. Background and motivation

  Right now we can keep a rough count of Tor users, both total and by
  country, by watching connections to a single directory mirror. Being
  able to get usage estimates is useful both for our funders (to
  demonstrate progress) and for our own development (so we know how
  quickly we're scaling and can design accordingly, and so we know which
  countries and communities to focus on more). This need for information
  is the only reason we haven't deployed "directory guards" (think of
  them like entry guards but for directory information; in practice,
  it would seem that Tor clients should simply use their entry guards
  as their directory guards; see also proposal 125).

  With the move toward bridges, we will no longer be able to track Tor
  clients that use bridges, since they use their bridges as directory
  guards. Further, we need to be able to learn which bridges stop seeing
  use from certain countries (and are thus likely blocked), so we can
  avoid giving them out to other users in those countries.

  Right now we already do GeoIP lookups in Vidalia: Vidalia draws relays
  and circuits on its 'network map', and it performs anonymized GeoIP
  lookups to its central servers to know where to put the dots. Vidalia
  caches answers it gets -- to reduce delay, to reduce overhead on
  the network, and to reduce anonymity issues where users reveal their
  knowledge about the network through which IP addresses they ask about.

  But with the advent of bridges, Tor clients are asking about IP
  addresses that aren't in the main directory. In particular, bridge
  users inform the central Vidalia servers about each bridge as they
  discover it and their Vidalia tries to map it.

  Also, we wouldn't mind letting Vidalia do a GeoIP lookup on the client's
  own IP address, so it can provide a more useful map.

  Finally, Vidalia's central servers leave users open to partitioning
  attacks, even if they can't target specific users. Further, as we
  start using GeoIP results for more operational or security-relevant
  goals, such as avoiding or including particular countries in circuits,
  it becomes more important that users can't be singled out in terms of
  their IP-to-country mapping beliefs.

2. The available GeoIP databases

  There are at least two classes of GeoIP database out there: "IP to
  country", which tells us the country code for the IP address but
  no more details, and "IP to city", which tells us the country code,
  the name of the city, and some basic latitude/longitude guesses.

  A recent ip-to-country.csv is 3421362 bytes. Compressed, it is 564252
  bytes. A typical line is:
    "205500992","208605279","US","USA","UNITED STATES"
  http://ip-to-country.webhosting.info/node/view/5

  Similarly, the maxmind GeoLite Country database is also about 500KB
  compressed.
  http://www.maxmind.com/app/geolitecountry

  The maxmind GeoLite City database gives more finegrained detail like
  geo coordinates and city name. Vidalia currently makes use of this
  information. On the other hand it's 16MB compressed. A typical line is:
    206.124.149.146,Bellevue,WA,US,47.6051,-122.1134
  http://www.maxmind.com/app/geolitecity

  There are other databases out there, like
  http://www.hostip.info/faq.html
  http://www.webconfs.com/ip-to-city.php
  that want more attention, but for now let's assume that all the db's
  are around this size.

3. What we'd like to solve

  Goal #1a: Tor relays collect IP-to-country user stats and publish
  sanitized versions.
  Goal #1b: Tor bridges collect IP-to-country user stats and publish
  sanitized versions.

  Goal #2a: Vidalia learns IP-to-city stats for Tor relays, for better
  mapping.
  Goal #2b: Vidalia learns IP-to-country stats for Tor relays, so the user
  can pick countries for her paths.

  Goal #3: Vidalia doesn't do external lookups on bridge relay addresses.

  Goal #4: Vidalia resolves the Tor client's IP-to-country or IP-to-city
  for better mapping.

  Goal #5: Reduce partitioning opportunities where Vidalia central
  servers can give different (distinguishing) responses.

4. Solution overview

  Our goal is to allow Tor relays, bridges, and clients to learn enough
  GeoIP information so they can do local private queries.

4.1. The IP-to-country db

  Directory authorities should publish a "geoip" file that contains
  IP-to-country mappings. Directory caches will mirror it, and Tor clients
  and relays (including bridge relays) will fetch it. Thus we can solve
  goals 1a and 1b (publish sanitized usage info). Controllers could also
  use this to solve goal 2b (choosing path by country attributes). It
  also solves goal 4 (learning the Tor client's country), though for
  huge countries like the US we'd still need to decide where the "middle"
  should be when we're mapping that address.

  The IP-to-country details are described further in Sections 5 and
  6 below.

  [RESOLUTION: The geoip file in 0.2.0.x is not distributed through
  Tor.  Instead, it is shipped with the bundle.]

4.2. The IP-to-city db

  In an ideal world, the IP-to-city db would be small enough that we
  could distribute it in the above manner too. But for now, it is too
  large. Here's where the design choice forks.

  Option A: Vidalia should continue doing its anonymized IP-to-city
  queries. Thus we can achieve goals 2a and 2b. We would solve goal
  3 by only doing lookups on descriptors that are purpose "general"
  (see Section 4.2.1 for how). We would leave goal 5 unsolved.

  Option B: Each directory authority should keep an IP-to-city db,
  lookup the value for each router it lists, and include that line in
  the router's network-status entry. The network-status consensus would
  then use the line that appears in the majority of votes. This approach
  also solves goals 2a and 2b, goal 3 (Vidalia doesn't do any lookups
  at all now), and goal 5 (reduced partitioning risks).

  Option B has the advantage that Vidalia can simplify its operation,
  and the advantage that this consensus IP-to-city data is available to
  other controllers besides just Vidalia. But it has the disadvantage
  that the networkstatus consensus becomes larger, even though most of
  the GeoIP information won't change from one consensus to the next. Is
  there another reasonable location for it that can provide similar
  consensus security properties?

  [RESOLUTION: IP-to-city is not supported.]

4.2.1. Controllers can query for router annotations

  Vidalia needs to stop doing queries on bridge relay IP addresses.
  It could do that by only doing lookups on descriptors that are in
  the networkstatus consensus, but that precludes designs like Blossom
  that might want to map its relay locations. The best answer is that it
  should learn the router annotations, with a new controller 'getinfo'
  command:
    "GETINFO desc-annotations/id/<OR identity>"
  which would respond with something like
    @downloaded-at 2007-11-29 08:06:38
    @source "128.31.0.34"
    @purpose bridge

  [We could also make the answer include the digest for the router in
  question, which would enable us to ask GETINFO router-annotations/all.
  Is this worth it? -RD]

  Then Vidalia can avoid doing lookups on descriptors with purpose
  "bridge". Even better would be to add a new annotation "@private true"
  so Vidalia can know how to handle new purposes that we haven't created
  yet. Vidalia could special-case "bridge" for now, for compatibility
  with the current 0.2.0.x-alphas.

4.3. Recommendation

  My overall recommendation is that we should implement 4.1 soon
  (e.g. early in 0.2.1.x), and we can go with 4.2 option A for now,
  with the hope that later we discover a better way to distribute the
  IP-to-city info and can switch to 4.2 option B.

  Below we discuss more how to go about achieving 4.1.

5. Publishing and caching the GeoIP (IP-to-country) database

  Each v3 directory authority should put a copy of the "geoip" file in
  its datadirectory. Then its network-status votes should include a hash
  of this file (Recommended-geoip-hash: %s), and the resulting consensus
  directory should specify the consensus hash.

  There should be a new URL for fetching this geoip db (by "current.z"
  for testing purposes, and by hash.z for typical downloads). Authorities
  should fetch and serve the one listed in the consensus, even when they
  vote for their own. This would argue for storing the cached version
  in a better filename than "geoip".

  Directory mirrors should keep a copy of this file available via the
  same URLs.

  We assume that the file would change at most a few times a month. Should
  Tor ship with a bootstrap geoip file? An out-of-date geoip file may
  open you up to partitioning attacks, but for the most part it won't
  be that different.

  There should be a config option to disable updating the geoip file,
  in case users want to use their own file (e.g. they have a proprietary
  GeoIP file they prefer to use). In that case we leave it up to the
  user to update his geoip file out-of-band.

  [XXX Should consider forward/backward compatibility, e.g. if we want
  to move to a new geoip file format. -RD]

  [RESOLUTION: Not done over Tor.]

6. Controllers use the IP-to-country db for mapping and for path building

  Down the road, Vidalia could use the IP-to-country mappings for placing
  on its map:
  - The location of the client
  - The location of the bridges, or other relays not in the
    networkstatus, on the map.
  - Any relays that it doesn't yet have an IP-to-city answer for.

  Other controllers can also use it to set EntryNodes, ExitNodes, etc
  in a per-country way.

  To support these features, we need to export the IP-to-country data
  via the Tor controller protocol.

  Is it sufficient just to add a new GETINFO command?
    GETINFO ip-to-country/128.31.0.34
    250+ip-to-country/128.31.0.34="US","USA","UNITED STATES"

  [RESOLUTION: Not done now, except for the getinfo command.]

6.1. Other interfaces

  Robert Hogan has also suggested a

    GETINFO relays-by-country/cn

  as well as torrc options for ExitCountryCodes, EntryCountryCodes,
  ExcludeCountryCodes, etc.

  [RESOLUTION: Not implemented in 0.2.0.x.  Fodder for a future proposal.]

7. Relays and bridges use the IP-to-country db for usage summaries

  Once bridges have a GeoIP database locally, they can start to publish
  sanitized summaries of client usage -- how many users they see and from
  what countries. This might also be a more useful way for ordinary Tor
  relays to convey the level of usage they see, which would allow us to
  switch to using directory guards for all users by default.

  But how to safely summarize this information without opening too many
  anonymity leaks?

7.1 Attacks to think about

  First, note that we need to have a large enough time window that we're
  not aiding correlation attacks much. I hope 24 hours is enough. So
  that means no publishing stats until you've been up at least 24 hours.
  And you can't publish follow-up stats more often than every 24 hours,
  or people could look at the differential.

  Second, note that we need to be sufficiently vague about the IP
  addresses we're reporting. We are hoping that just specifying the
  country will be vague enough. But a) what about active attacks where
  we convince a bridge to use a GeoIP db that labels each suspect IP
  address as a unique country? We have to assume that the consensus GeoIP
  db won't be malicious in this way. And b) could such singling-out
  attacks occur naturally, for example because of countries that have
  a very small IP space? We should investigate that.

7.2. Granularity of users

  Do we only want to report countries that have a sufficient anonymity set
  (that is, number of users) for the day? For example, we might avoid
  listing any countries that have seen less than five addresses over
  the 24 hour period. This approach would be helpful in reducing the
  singling-out opportunities -- in the extreme case, we could imagine a
  situation where one blogger from the Sudan used Tor on a given day, and
  we can discover which entry guard she used.

  But I fear that especially for bridges, seeing only one hit from a
  given country in a given day may be quite common.

  As a compromise, we should start out with an "Other" category in
  the reported stats, which is the sum of unlisted countries; if that
  category is consistently interesting, we can think harder about how
  to get the right data from it safely.

  But note that bridge summaries will not be made public individually,
  since doing so would help people enumerate bridges. Whereas summaries
  from normal relays will be public. So perhaps that means we can afford
  to be more specific in bridge summaries? In particular, I'm thinking the
  "other" category should be used by public relays but not for bridges
  (or if it is, used with a lower threshold).

  Even for countries that have many Tor users, we might not want to be
  too specific about how many users we've seen. For example, we might
  round down the number of users we report to the nearest multiple of 5.
  My instinct for now is that this won't be that useful.

7.3 Other issues

  Another note: we'll likely be overreporting in the case of users with
  dynamic IP addresses: if they rotate to a new address over the course
  of the day, we'll count them twice. So be it.

7.4. Where to publish the summaries?

  We designed extrainfo documents for information like this. So they
  should just be more entries in the extrainfo doc.

  But if we want to publish summaries every 24 hours (no more often,
  no less often), aren't we tried to the router descriptor publishing
  schedule? That is, if we publish a new router descriptor at the 18
  hour mark, and nothing much has changed at the 24 hour mark, won't
  the new descriptor get dropped as being "cosmetically similar", and
  then nobody will know to ask about the new extrainfo document?

  One solution would be to make and remember the 24 hour summary at the
  24 hour mark, but not actually publish it anywhere until we happen to
  publish a new descriptor for other reasons. If we happen to go down
  before publishing a new descriptor, then so be it, at least we tried.

7.5. What if the relay is unreachable or goes to sleep?

  Even if you've been up for 24 hours, if you were hibernating for 18
  of them, then we're not getting as much fuzziness as we'd like. So
  I guess that means that we need a 24-hour period of being "awake"
  before we'll willing to publish a summary. A similar attack works if
  you've been awake but unreachable for the first 18 of the 24 hours. As
  another example, a bridge that's on a laptop might be suspended for
  some of each day.

  This implies that some relays and bridges will never publish summary
  stats, because they're not ever reliably working for 24 hours in
  a row. If a significant percentage of our reporters end up being in
  this boat, we should investigate whether we can accumulate 24 hours of
  "usefulness", even if there are holes in the middle, and publish based
  on that.

  What other issues are like this? It seems that just moving to a new
  IP address shouldn't be a reason to cancel stats publishing, assuming
  we were usable at each address.

7.6. IP addresses that aren't in the geoip db

  Some IP addresses aren't in the public geoip databases. In particular,
  I've found that a lot of African countries are missing, but there
  are also some common ones in the US that are missing, like parts of
  Comcast. We could just lump unknown IP addresses into the "other"
  category, but it might be useful to gather a general sense of how many
  lookups are failing entirely, by adding a separate "Unknown" category.

  We could also contribute back to the geoip db, by letting bridges set
  a config option to report the actual IP addresses that failed their
  lookup. Then the bridge authority operators can manually make sure
  the correct answer will be in later geoip files. This config option
  should be disabled by default.

7.7 Bringing it all together

  So here's the plan:

  24 hours after starting up (modulo Section 7.5 above), bridges and
  relays should construct a daily summary of client countries they've
  seen, including the above "Unknown" category (Section 7.6) as well.

  Non-bridge relays lump all countries with less than K (e.g. K=5) users
  into the "Other" category (see Sec 7.2 above), whereas bridge relays are
  willing to list a country even when it has only one user for the day.

  Whenever we have a daily summary on record, we include it in our
  extrainfo document whenever we publish one. The daily summary we
  remember locally gets replaced with a newer one when another 24
  hours pass.

7.8. Some forward secrecy

  How should we remember addresses locally? If we convert them into
  country-codes immediately, we will count them again if we see them
  again. On the other hand, we don't really want to keep a list hanging
  around of all IP addresses we've seen in the past 24 hours.

  Step one is that we should never write this stuff to disk. Keeping it
  only in ram will make things somewhat better. Step two is to avoid
  keeping any timestamps associated with it: rather than a rolling
  24-hour window, which would require us to remember the various times
  we've seen that address, we can instead just throw out the whole list
  every 24 hours and start over.

  We could hash the addresses, and then compare hashes when deciding if
  we've seen a given address before. We could even do keyed hashes. Or
  Bloom filters. But if our goal is to defend against an adversary
  who steals a copy of our ram while we're running and then does
  guess-and-check on whatever blob we're keeping, we're in bad shape.

  We could drop the last octet of the IP address as soon as we see
  it. That would cause us to undercount some users from cablemodem and
  DSL networks that have a high density of Tor users. And it wouldn't
  really help that much -- indeed, the extent to which it does help is
  exactly the extent to which it makes our stats less useful.

  Other ideas?

Filename: 127-dirport-mirrors-downloads.txt
Title: Relaying dirport requests to Tor download site / website
Author: Roger Dingledine
Created: 2007-12-02
Status: Obsolete

1. Overview

  Some countries and networks block connections to the Tor website. As
  time goes by, this will remain a problem and it may even become worse.

  We have a big pile of mirrors (google for "Tor mirrors"), but few of
  our users think to try a search like that. Also, many of these mirrors
  might be automatically blocked since their pages contain words that
  might cause them to get banned. And lastly, we can imagine a future
  where the blockers are aware of the mirror list too.

  Here we describe a new set of URLs for Tor's DirPort that will relay
  connections from users to the official Tor download site. Rather than
  trying to cache a bunch of new Tor packages (which is a hassle in terms
  of keeping them up to date, and a hassle in terms of drive space used),
  we instead just proxy the requests directly to Tor's /dist page.

  Specifically, we should support

    GET /tor/dist/$1

  and

    GET /tor/website/$1

2. Direct connections, one-hop circuits, or three-hop circuits?

  We could relay the connections directly to the download site -- but
  this produces recognizable outgoing traffic on the bridge or cache's
  network, which will probably surprise our nice volunteers. (Is this
  a good enough reason to discard the direct connection idea?)

  Even if we don't do direct connections, should we do a one-hop
  begindir-style connection to the mirror site (make a one-hop circuit
  to it, then send a 'begindir' cell down the circuit), or should we do
  a normal three-hop anonymized connection?

  If these mirrors are mainly bridges, doing either a direct or a one-hop
  connection creates another way to enumerate bridges. That would argue
  for three-hop. On the other hand, downloading a 10+ megabyte installer
  through a normal Tor circuit can't be fun. But if you're already getting
  throttled a lot because you're in the "relayed traffic" bucket, you're
  going to have to accept a slow transfer anyway. So three-hop it is.

  Speaking of which, we would want to label this connection
  as "relay" traffic for the purposes of rate limiting; see
  connection_counts_as_relayed_traffic() and or_conn->client_used. This
  will be a bit tricky though, because these connections will use the
  bridge's guards.

3. Scanning resistance

  One other goal we'd like to achieve, or at least not hinder, is making
  it hard to scan large swaths of the Internet to look for responses
  that indicate a bridge.

  In general this is a really hard problem, so we shouldn't demand to
  solve it here. But we can note that some bridges should open their
  DirPort (and offer this functionality), and others shouldn't. Then
  some bridges provide a download mirror while others can remain
  scanning-resistant.

4. Integrity checking

  If we serve this stuff in plaintext from the bridge, anybody in between
  the user and the bridge can intercept and modify it. The bridge can too.

  If we do an anonymized three-hop connection, the exit node can also
  intercept and modify the exe it sends back.

  Are we setting ourselves up for rogue exit relays, or rogue bridges,
  that trojan our users?

  Answer #1: Users need to do pgp signature checking. Not a very good
  answer, a) because it's complex, and b) because they don't know the
  right signing keys in the first place.

  Answer #2: The mirrors could exit from a specific Tor relay, using the
  '.exit' notation. This would make connections a bit more brittle, but
  would resolve the rogue exit relay issue. We could even round-robin
  among several, and the list could be dynamic -- for example, all the
  relays with an Authority flag that allow exits to the Tor website.

  Answer #3: The mirrors should connect to the main distribution site
  via SSL. That way the exit relay can't influence anything.

  Answer #4: We could suggest that users only use trusted bridges for
  fetching a copy of Tor. Hopefully they heard about the bridge from a
  trusted source rather than from the adversary.

  Answer #5: What if the adversary is trawling for Tor downloads by
  network signature -- either by looking for known bytes in the binary,
  or by looking for "GET /tor/dist/"? It would be nice to encrypt the
  connection from the bridge user to the bridge. And we can! The bridge
  already supports TLS. Rather than initiating a TLS renegotiation after
  connecting to the ORPort, the user should actually request a URL. Then
  the ORPort can either pass the connection off as a linked conn to the
  dirport, or renegotiate and become a Tor connection, depending on how
  the client behaves.

5. Linked connections: at what level should we proxy?

  Check out the connection_ap_make_link() function, as called from
  directory.c. Tor clients use this to create a "fake" socks connection
  back to themselves, and then they attach a directory request to it,
  so they can launch directory fetches via Tor. We can piggyback on
  this feature.

  We need to decide if we're going to be passing the bytes back and
  forth between the web browser and the main distribution site, or if
  we're going to be actually acting like a proxy (parsing out the file
  they want, fetching that file, and serving it back).

  Advantages of proxying without looking inside:
    - We don't need to build any sort of http support (including
      continues, partial fetches, etc etc).
  Disadvantages:
    - If the browser thinks it's speaking http, are there easy ways
      to pass the bytes to an https server and have everything work
      correctly? At the least, it would seem that the browser would
      complain about the cert. More generally, ssl wants to be negotiated
      before the URL and headers are sent, yet we need to read the URL
      and headers to know that this is a mirror request; so we have an
      ordering problem here.
    - Makes it harder to do caching later on, if we don't look at what
      we're relaying. (It might be useful down the road to cache the
      answers to popular requests, so we don't have to keep getting
      them again.)

6. Outstanding problems

  1) HTTP proxies already exist.  Why waste our time cloning one
  badly? When we clone existing stuff, we usually regret it.

  2) It's overbroad.  We only seem to need a secure get-a-tor feature,
  and instead we're contemplating building a locked-down HTTP proxy.

  3) It's going to add a fair bit of complexity to our code.  We do
  not currently implement HTTPS.  We'd need to refactor lots of the
  low-level connection stuff so that "SSL" and "Cell-based" were no
  longer synonymous.

  4) It's still unclear how effective this proposal would be in
  practice. You need to know that this feature exists, which means
  somebody needs to tell you about a bridge (mirror) address and tell
  you how to use it. And if they're doing that, they could (e.g.) tell
  you about a gmail autoresponder address just as easily, and then you'd
  get better authentication of the Tor program to boot.

Filename: 128-bridge-families.txt
Title: Families of private bridges
Author: Roger Dingledine
Created: 2007-12-xx
Status: Dead

1. Overview

  Proposal 125 introduced the basic notion of how bridge authorities,
  bridge relays, and bridge users should behave. But it doesn't get into
  the various mechanisms of how to distribute bridge relay addresses to
  bridge users.

  One of the mechanisms we have in mind is called 'families of bridges'.
  If a bridge user knows about only one private bridge, and that bridge
  shuts off for the night or gets a new dynamic IP address, the bridge
  user is out of luck and needs to re-bootstrap manually or wait and
  hope it comes back. On the other hand, if the bridge user knows about
  a family of bridges, then as long as one of those bridges is still
  reachable his Tor client can automatically  learn about where the
  other bridges have gone.

  So in this design, a single volunteer could run multiple coordinated
  bridges, or a group of volunteers could each run a bridge. We abstract
  out the details of how these volunteers find each other and decide to
  set up a family.

2. Other notes.

  somebody needs to run a bridge authority

  it needs to have a torrc option to publish networkstatuses of its bridges

  it should also do reachability testing just of those bridges

  people ask for the bridge networkstatus by asking for a url that
  contains a password. (it's safe to do this because of begin_dir.)

  so the bridge users need to know a) a password, and b) a bridge
  authority line.

  the bridge users need to know the bridge authority line.

  the bridge authority needs to know the password.

3. Current state

  I implemented a BridgePassword config option. Bridge authorities
  should set it, and users who want to use those bridge authorities
  should set it.

  Now there is a new directory URL "/tor/networkstatus-bridges" that
  directory mirrors serve if BridgeAuthoritativeDir is set and it's a
  begin_dir connection. It looks for the header
    Authorization: Basic %s
  where %s is the base-64 bridge password.

  I never got around to teaching clients how to set the header though,
  so it may or may not, and may or may not do what we ultimate want.

  I've marked this proposal dead; it really never should have left the
  ideas/ directory. Somebody should pick it up sometime and finish the
  design and implementation.

Filename: 129-reject-plaintext-ports.txt
Title: Block Insecure Protocols by Default
Author: Kevin Bauer & Damon McCoy
Created: 2008-01-15
Status: Closed
Implemented-In: 0.2.0.x

Overview:

  Below is a proposal to mitigate insecure protocol use over Tor.

  This document 1) demonstrates the extent to which insecure protocols are
  currently used within the Tor network, and 2) proposes a simple solution
  to prevent users from unknowingly using these insecure protocols. By
  insecure, we consider protocols that explicitly leak sensitive user names
  and/or passwords, such as POP, IMAP, Telnet, and FTP.

Motivation:

  As part of a general study of Tor use in 2006/2007 [1], we attempted to
  understand what types of protocols are used over Tor. While we observed a
  enormous volume of Web and Peer-to-peer traffic, we were surprised by the
  number of insecure protocols that were used over Tor. For example, over an
  8 day observation period, we observed the following number of connections
  over insecure protocols:

    POP and IMAP:10,326 connections
    Telnet: 8,401 connections
    FTP: 3,788 connections

  Each of the above listed protocols exchange user name and password
  information in plain-text. As an upper bound, we could have observed
  22,515 user names and passwords. This observation echos the reports of
  a Tor router logging and posting e-mail passwords in August 2007 [2]. The
  response from the Tor community has been to further educate users
  about the dangers of using insecure protocols over Tor. However, we
  recently repeated our Tor usage study from last year and noticed that the
  trend in insecure protocol use has not declined. Therefore, we propose that
  additional steps be taken to protect naive Tor users from inadvertently
  exposing their identities (and even passwords) over Tor.

Security Implications:

  This proposal is intended to improve Tor's security by limiting the
  use of insecure protocols.

  Roger added: By adding these warnings for only some of the risky
  behavior, users may do other risky behavior, not get a warning, and
  believe that it is therefore safe. But overall, I think it's better
  to warn for some of it than to warn for none of it.

Specification:

  As an initial step towards mitigating the use of the above-mentioned
  insecure protocols, we propose that the default ports for each respective
  insecure service be blocked at the Tor client's socks proxy. These default
  ports include:

    23 - Telnet
    109 - POP2
    110 - POP3
    143 - IMAP

  Notice that FTP is not included in the proposed list of ports to block. This
  is because FTP is often used anonymously, i.e., without any identifying
  user name or password.

  This blocking scheme can be implemented as a set of flags in the client's
  torrc configuration file:

    BlockInsecureProtocols 0|1
    WarnInsecureProtocols 0|1

  When the warning flag is activated, a message should be displayed to
  the user similar to the message given when Tor's socks proxy is given an IP
  address rather than resolving a host name.

  We recommend that the default torrc configuration file block insecure
  protocols and provide a warning to the user to explain the behavior.

  Finally, there are many popular web pages that do not offer secure
  login features, such as MySpace, and it would be prudent to provide
  additional rules to Privoxy to attempt to protect users from unknowingly
  submitting their login credentials in plain-text.

Compatibility:

  None, as the proposed changes are to be implemented in the client.

References:

  [1] Shining Light in Dark Places: A Study of Anonymous Network Usage.
      University of Colorado Technical Report CU-CS-1032-07. August 2007.

  [2] Rogue Nodes Turn Tor Anonymizer Into Eavesdropper's Paradise.
      http://www.wired.com/politics/security/news/2007/09/embassy_hacks.
      Wired. September 10, 2007.

Implementation:

  Roger added this feature in
  http://archives.seul.org/or/cvs/Jan-2008/msg00182.html
  He also added a status event for Vidalia to recognize attempts to use
  vulnerable-plaintext ports, so it can help the user understand what's
  going on and how to fix it.

Next steps:

  a) Vidalia should learn to recognize this controller status event,
  so we don't leave users out in the cold when we enable this feature.

  b) We should decide which ports to reject by default. The current
  consensus is 23,109,110,143 -- the same set that we warn for now.

Filename: 130-v2-conn-protocol.txt
Title: Version 2 Tor connection protocol
Author: Nick Mathewson
Created: 2007-10-25
Status: Closed
Implemented-In: 0.2.0.x

Overview:

  This proposal describes the significant changes to be made in the v2
  Tor connection protocol.

  This proposal relates to other proposals as follows:

    It refers to and supersedes:
       Proposal 124: Blocking resistant TLS certificate usage
    It refers to aspects of:
       Proposal 105: Version negotiation for the Tor protocol


  In summary, The Tor connection protocol has been in need of a redesign
  for a while.  This proposal describes how we can add to the Tor
  protocol:

     - A new TLS handshake (to achieve blocking resistance without
       breaking backward compatibility)
     - Version negotiation (so that future connection protocol changes
       can happen without breaking compatibility)
     - The actual changes in the v2 Tor connection protocol.

Motivation:

  For motivation, see proposal 124.

Proposal:

0. Terminology

  The version of the Tor connection protocol implemented up to now is
  "version 1".  This proposal describes "version 2".

  "Old" or "Older" versions of Tor are ones not aware that version 2
  of this protocol exists;
  "New" or "Newer" versions are ones that are.

  The connection initiator is referred to below as the Client; the
  connection responder is referred to below as the Server.

1. The revised TLS handshake.

  For motivation, see proposal 124.  This is a simplified version of the
  handshake that uses TLS's renegotiation capability in order to avoid
  some of the extraneous steps in proposal 124.

  The Client connects to the Server and, as in ordinary TLS, sends a
  list of ciphers.  Older versions of Tor will send only ciphers from
  the list:
    TLS_DHE_RSA_WITH_AES_256_CBC_SHA
    TLS_DHE_RSA_WITH_AES_128_CBC_SHA
    SSL_DHE_RSA_WITH_3DES_EDE_CBC_SHA
    SSL_DHE_DSS_WITH_3DES_EDE_CBC_SHA
  Clients that support the revised handshake will send the recommended
  list of ciphers from proposal 124, in order to emulate the behavior of
  a web browser.

  If the server notices that the list of ciphers contains only ciphers
  from this list, it proceeds with Tor's version 1 TLS handshake as
  documented in tor-spec.txt.

  (The server may also notice cipher lists used by other implementations
  of the Tor protocol (in particular, the BouncyCastle default cipher
  list as used by some Java-based implementations), and whitelist them.)

  On the other hand, if the server sees a list of ciphers that could not
  have been sent from an older implementation (because it includes other
  ciphers, and does not match any known-old list), the server sends a
  reply containing a single connection certificate, constructed as for
  the link certificate in the v1 Tor protocol.  The subject names in
  this certificate SHOULD NOT have any strings to identify them as
  coming from a Tor server.  The server does not ask the client for
  certificates.

  Old Servers will (mostly) ignore the cipher list and respond as in the v1
  protocol, sending back a two-certificate chain.

  After the Client gets a response from the server, it checks for the
  number of certificates it received.  If there are two certificates,
  the client assumes a V1 connection and proceeds as in tor-spec.txt.
  But if there is only one certificate, the client assumes a V2 or later
  protocol and continues.

  At this point, the client has established a TLS connection with the
  server, but the parties have not been authenticated: the server hasn't
  sent its identity certificate, and the client hasn't sent any
  certificates at all.  To fix this, the client begins a TLS session
  renegotiation.  This time, the server continues with two certificates
  as usual, and asks for certificates so that the client will send
  certificates of its own.  Because the TLS connection has been
  established, all of this is encrypted.  (The certificate sent by the
  server in the renegotiated connection need not be the same that
  as sentin the original connection.)

  The server MUST NOT write any data until the client has renegotiated.

  Once the renegotiation is finished, the server and client check one
  another's certificates as in V1.  Now they are mutually authenticated.

1.1. Revised TLS handshake: implementation notes.

  It isn't so easy to adjust server behavior based on the client's
  ciphersuite list.  Here's how we can do it using OpenSSL.  This is a
  bit of an abuse of the OpenSSL APIs, but it's the best we can do, and
  we won't have to do it forever.

  We can use OpenSSL's SSL_set_info_callback() to register a function to
  be called when the state changes.  The type/state tuple of
     SSL_CB_ACCEPT_LOOP/SSL3_ST_SW_SRVR_HELLO_A
  happens when we have completely parsed the client hello, and are about
  to send a response.  From this callback, we can check the cipherlist
  and act accordingly:

     * If the ciphersuite list indicates a v1 protocol, we set the
       verify mode to SSL_VERIFY_NONE with a callback (so we get
       certificates).

     * If the ciphersuite list indicates a v2 protocol, we set the
       verify mode to SSL_VERIFY_NONE with no callback (so we get
       no certificates) and set the SSL_MODE_NO_AUTO_CHAIN flag (so that
       we send only 1 certificate in the response.

  Once the handshake is done, the server clears the
  SSL_MODE_NO_AUTO_CHAIN flag and sets the callback as for the V1
  protocol.  It then starts reading.

  The other problem to take care of is missing ciphers and OpenSSL's
  cipher sorting algorithms. The two main issues are a) OpenSSL doesn't
  support some of the default ciphers that Firefox advertises, and b)
  OpenSSL sorts the list of ciphers it offers in a different way than
  Firefox sorts them, so unless we fix that Tor will still look different
  than Firefox.
  [XXXX more on this.]


1.2. Compatibility for clients using libraries less hackable than OpenSSL.

  As discussed in proposal 105, servers advertise which protocol
  versions they support in their router descriptors.  Clients can simply
  behave as v1 clients when connecting to servers that do not support
  link version 2 or higher, and as v2 clients when connecting to servers
  that do support link version 2 or higher.

  (Servers can't use this strategy because we do not assume that servers
  know one another's capabilities when connecting.)

2. Version negotiation.

  Version negotiation proceeds as described in proposal 105, except as
  follows:

   * Version negotiation only happens if the TLS handshake as described
     above completes.

   * The TLS renegotiation must be finished before the client sends a
     VERSIONS cell; the server sends its VERSIONS cell in response.

   * The VERSIONS cell uses the following variable-width format:
         Circuit  [2 octets; set to 0]
         Command  [1 octet; set to 7 for VERSIONS]
         Length   [2 octets; big-endian]
         Data     [Length bytes]

     The Data in the cell is a series of big-endian two-byte integers.

   * It is not allowed to negotiate V1 connections once the v2 protocol
     has been used.  If this happens, Tor instances should close the
     connection.

3. The rest of the "v2" protocol

   Once a v2 protocol has been negotiated, NETINFO cells are exchanged
   as in proposal 105, and communications begin as per tor-spec.txt.
   Until NETINFO cells have been exchanged, the connection is not open.


Filename: 131-verify-tor-usage.txt
Title: Help users to verify they are using Tor
Author: Steven J. Murdoch
Created: 2008-01-25
Status: Obsolete

Overview:

  Websites for checking whether a user is accessing them via Tor are a
  very helpful aid to configuring web browsers correctly. Existing
  solutions have both false positives and false negatives when
  checking if Tor is being used. This proposal will discuss how to
  modify Tor so as to make testing more reliable.

Motivation:

  Currently deployed websites for detecting Tor use work by comparing
  the client IP address for a request with a list of known Tor nodes.
  This approach is generally effective, but suffers from both false
  positives and false negatives. 

  If a user has a Tor exit node installed, or just happens to have
  been allocated an IP address previously used by a Tor exit node, any
  web requests will be incorrectly flagged as coming from Tor. If any
  customer of an ISP which implements a transparent proxy runs an exit
  node, all other users of the ISP will be flagged as Tor users.

  Conversely, if the exit node chosen by a Tor user has not yet been
  recorded by the Tor checking website, requests will be incorrectly
  flagged as not coming via Tor.
  
  The only reliable way to tell whether Tor is being used or not is for
  the Tor client to flag this to the browser.

Proposal:

  A DNS name should be registered and point to an IP address 
  controlled by the Tor project and likely to remain so for the
  useful lifetime of a Tor client. A web server should be placed
  at this IP address.
  
  Tor should be modified to treat requests to port 80, at the
  specified DNS name or IP address specially. Instead of opening a
  circuit, it should respond to a HTTP request with a helpful web
  page:

  - If the request to open a connection was to the domain name, the web
    page should state that Tor is working properly.
  - If the request was to the IP address, the web page should state
    that there is a DNS-leakage vulnerability.

  If the request goes through to the real web server, the page
  should state that Tor has not been set up properly.

Extensions:

  Identifying proxy server:

  If needed, other applications between the web browser and Tor (e.g.
  Polipo and Privoxy) could piggyback on the same mechanism to flag
  whether they are in use. All three possible web pages should include
  a machine-readable placeholder, into which another program could
  insert their own message.

  For example, the webpage returned by Tor to indicate a successful
  configuration could include the following HTML:
   <h2>Connection chain</h2>
   <ul>
   <li>Tor 0.1.2.14-alpha</li>
   <!-- Tor Connectivity Check: success -->
   </ul>

  When the proxy server observes this string, in response to a request
  for the Tor connectivity check web page, it would prepend it's own
  message, resulting in the following being returned to the web
  browser:
   <h2>Connection chain
   <ul>
   <li>Tor 0.1.2.14-alpha</li>
   <li>Polipo version 1.0.4</li>
   <!-- Tor Connectivity Check: success -->
   </ul>

  Checking external connectivity:

  If Tor intercepts a request, and returns a response itself, the user
  will not actually confirm whether Tor is able to build a successful
  circuit. It may then be advantageous to include an image in the web
  page which is loaded from a different domain. If this is able to be
  loaded then the user will know that external connectivity through
  Tor works.

  Automatic Firefox Notification:

  All forms of the website should return valid XHTML and have a
  hidden link with an id attribute "TorCheckResult" and a target
  property that can be queried to determine the result. For example,   
  a hidden link would convey success like this: 

  <a id="TorCheckResult" target="success" href="/"></a>

  failure like this:

  <a id="TorCheckResult" target="failure" href="/"></a>

  and DNS leaks like this:

  <a id="TorCheckResult" target="dnsleak" href="/"></a>

  Firefox extensions such as Torbutton would then be able to 
  issue an XMLHttpRequest for the page and query the result
  with resultXML.getElementById("TorCheckResult").target
  to automatically report the Tor status to the user when
  they first attempt to enable Tor activity, or whenever
  they request a check from the extension preferences window.

  If the check website is to be themed with heavy graphics and/or
  extensive documentation, the check result itself should be
  contained in a seperate lightweight iframe that extensions can
  request via an alternate url.

Security and resiliency implications:

  What attacks are possible?

  If the IP address used for this feature moves there will be two
  consequences:
   - A new website at this IP address will remain inaccessible over
     Tor
   - Tor users who are leaking DNS will be informed that Tor is not
     working, rather than that it is active but leaking DNS
  We should thus attempt to find an IP address which we reasonably
  believe can remain static.

Open issues:

  If a Tor version which does not support this extra feature is used,
  the webpage returned will indicate that Tor is not being used. Can
  this be safely fixed?

Related work:

  The proposed mechanism is very similar to config.privoxy.org. The
  most significant difference is that if the web browser is
  misconfigured, Tor will only get an IP address. Even in this case,
  Tor should be able to respond with a webpage to notify the user of how
  to fix the problem. This also implies that Tor must be told of the
  special IP address, and so must be effectively permanent.
Filename: 132-browser-check-tor-service.txt
Title: A Tor Web Service For Verifying Correct Browser Configuration
Author: Robert Hogan
Created: 2008-03-08
Status: Obsolete

Overview:

  Tor should operate a primitive web service on the loopback network device
  that tests the operation of user's browser, privacy proxy and Tor client.
  The tests are performed by serving unique, randomly generated elements in
  image URLs embedded in static HTML. The images are only displayed if the DNS
  and HTTP requests for them are routed through Tor, otherwise the 'alt' text
  may be displayed. The proposal assumes that 'alt' text is not displayed on
  all browsers so suggests that text and links should accompany each image
  advising the user on next steps in case the test fails.

  The service is primarily for the use of controllers, since presumably users
  aren't going to want to edit text files and then type something exotic like
  127.0.0.1:9999 into their address bar. In the main use case the controller
  will have configured the actual port for the webservice so will know where
  to direct the request. It would also be the responsibility of the controller
  to ensure the webservice is available, and tor is running, before allowing
  the user to access the page through their browser.

Motivation:

  This is a complementary approach to proposal 131. It overcomes some of the
  limitations of the approach described in proposal 131: reliance
  on a permanent, real IP address and compatibility with older versions of
  Tor. Unlike 131, it is not as useful to Tor users who are not running a
  controller.

Objective:

  Provide a reliable means of helping users to determine if their Tor
  installation, privacy proxy and browser are properly configured for
  anonymous browsing.

Proposal:

  When configured to do so, Tor should run a basic web service available
  on a configured port on 127.0.0.1. The purpose of this web service is to
  serve a number of basic test images that will allow the user to determine
  if their browser is properly configured and that Tor is working normally.

  The service can consist of a single web page with two columns. The left
  column contains images, the right column contains advice on what the
  display/non-display of the column means.

  The rest of this proposal assumes that the service is running on port
  9999. The port should be configurable, and configuring the port enables the
  service. The service must run on 127.0.0.1.

  In all the examples below [uniquesessionid] refers to a random, base64
  encoded string that is unique to the URL it is contained in. Tor only ever
  stores the most recently generated [uniquesessionid] for each URL, storing 3
  in total. Tor should generate a [uniquesessionid] for each of the test URLs
  below every time a HTTP GET is received at 127.0.0.1:9999 for index.htm.

  The most suitable image for each test case is an implementation decision.
  Tor will need to store and serve images for the first and second test
  images, and possibly the third (see 'Open Issues').

  1. DNS Request Test Image
  
  This is a HTML element embedded in the page served by Tor at
  http://127.0.0.1:9999:

  <IMG src="http://[uniquesessionid]:9999/torlogo.jpg" alt="If you can see
  this text, your browser's DNS requests are not being routed through Tor."
  width="200" height="200" align="middle" border="2">

  If the browser's DNS request for [uniquesessionid] is routed through Tor,
  Tor will intercept the request and return 127.0.0.1 as the resolved IP
  address. This will shortly be followed by a HTTP request from the browser
  for http://127.0.0.1:9999/torlogo.jpg. This request should be served with
  the appropriate image.

  If the browser's DNS request for [uniquesessionid] is not routed through Tor
  the browser may display the 'alt' text specified in the html element. The
  HTML served by Tor should also contain text accompanying the image to advise
  users what it means if they do not see an image. It should also provide a
  link to click that provides information on how to remedy the problem. This
  behaviour also applies to the images described in 2. and 3. below, so should
  be assumed there as well.


  2. Proxy Configuration Test Image

  This is a HTML element embedded in the page served by Tor at
  http://127.0.0.1:9999:

  <IMG src="http://torproject.org/[uniquesessionid].jpg" alt="If you can see
  this text, your browser is not configured to work with Tor." width="200"
  height="200" align="middle" border="2">

  If the HTTP request for the resource [uniquesessionid].jpg is received by
  Tor it will serve the appropriate image in response. It should serve this
  image itself, without attempting to retrieve anything from the Internet.

  If Tor can identify the name of the proxy application requesting the
  resource then it could store and serve an image identifying the proxy to the
  user.

  3. Tor Connectivity Test Image

  This is a HTML element embedded in the page served by Tor at
  http://127.0.0.1:9999:

  <IMG src="http://torproject.org/[uniquesessionid]-torlogo.jpg" alt="If you
  can see this text, your Tor installation cannot connect to the Internet."
  width="200" height="200" align="middle" border="2">

  The referenced image should actually exist on the Tor project website. If
  Tor receives the request for the above resource it should remove the random
  base64 encoded digest from the request (i.e. [uniquesessionid]-) and attempt
  to retrieve the real image.

  Even on a fully operational Tor client this test may not always succeed. The
  user should be advised that one or more attempts to retrieve this image may
  be necessary to confirm a genuine problem.

Open Issues:

  The final connectivity test relies on an externally maintained resource, if
  this resource becomes unavailable the connectivity test will always fail.
  Either the text accompanying the test should advise of this possibility or
  Tor clients should be advised of the location of the test resource in the
  main network directory listings.

  Any number of misconfigurations may make the web service unreachable, it is
  the responsibility of the user's controller to recognize these and assist
  the user in eliminating them. Tor can mitigate against the specific
  misconfiguration of routing HTTP traffic to 127.0.0.1 to Tor itself by
  serving such requests through the SOCKS port as well as the configured web
  service report.

  Now Tor is inspecting the URLs requested on its SOCKS port and 'dropping'
  them. It already inspects for raw IP addresses (to warn of DNS leaks) but
  maybe the behaviour proposed here is qualitatively different. Maybe this is
  an unwelcome precedent that can be used to beat the project over the head in
  future. Or maybe it's not such a bad thing, Tor is merely attempting to make
  normally invalid resource requests valid for a given purpose.

Filename: 133-unreachable-ors.txt
Title: Incorporate Unreachable ORs into the Tor Network
Author: Robert Hogan
Created: 2008-03-08
Status: Reserve

Overview:

  Propose a scheme for harnessing the bandwidth of ORs who cannot currently
  participate in the Tor network because they can only make outbound
  TCP connections.

Motivation: 

  Restrictive local and remote firewalls are preventing many willing
  candidates from becoming ORs on the Tor network.These
  ORs have a casual interest in joining the network but their operator is not
  sufficiently motivated or adept to complete the necessary router or firewall
  configuration. The Tor network is losing out on their bandwidth. At the
  moment we don't even know how many such 'candidate' ORs there are.


Objective:

  1. Establish how many ORs are unable to qualify for publication because
     they cannot establish that their ORPort is reachable.

  2. Devise a method for making such ORs available to clients for circuit
     building without prejudicing their anonymity.

Proposal:

  ORs whose ORPort reachability testing fails a specified number of
  consecutive times should:
  1. Enlist themselves with the authorities setting a 'Fallback' flag. This
      flag indicates that the OR is up and running but cannot connect to
      itself.
  2. Open an orconn with all ORs whose fingerprint begins with the same
      byte as their own. The management of this orconn will be transferred
      entirely to the OR at the other end.
  2. The fallback OR should update it's router status to contain the
      'Running' flag if it has managed to open an orconn with 3/4 of the ORs
      with an FP beginning with the same byte as its own.

  Tor ORs who are contacted by fallback ORs requesting an orconn should:
   1. Accept the orconn until they have reached a defined limit of orconn
      connections with fallback ORs.
   2. Should only accept such orconn requests from listed fallback ORs who
      have an FP beginning with the same byte as its own.

  Tor clients can include fallback ORs in the network by doing the
  following:
   1. When building a circuit, observe the fingerprint of each node they
      wish to connect to.
   2. When randomly selecting a node from the set of all eligible nodes,
      add all published, running fallback nodes to the set where the first
      byte of the fingerprint matches the previous node in the circuit.

Anonymity Implications:

  At least some, and possibly all, nodes on the network will have a set
  of nodes that only they and a few others can build circuits on.

    1. This means that fallback ORs might be unsuitable for use as middlemen
       nodes, because if the exit node is the attacker it knows that the
       number of nodes that could be the entry guard in the circuit is
       reduced to roughly 1/256th of the network, or worse 1/256th of all
       nodes listed as Guards. For the same reason, fallback nodes would
       appear to be unsuitable for two-hop circuits.

    2. This is not a problem if fallback ORs are always exit nodes. If
       the fallback OR is an attacker it will not be able to reduce the
       set of possible nodes for the entry guard any further than a normal,
       published OR.

Possible Attacks/Open Issues:

  1. Gaming Node Selection
    Does running a fallback OR customized for a specific set of published ORs
    improve an attacker's chances of seeing traffic from that set of published
    ORs? Would such a strategy be any more effective than running published
    ORs with other 'attractive' properties?

  2. DOS Attack
    An attacker could prevent all other legitimate fallback ORs with a
    given byte-1 in their FP from functioning by running 20 or 30 fallback ORs
    and monopolizing all available fallback slots on the published ORs. 
    This same attacker would then be in a position to monopolize all the
    traffic of the fallback ORs on that byte-1 network segment. I'm not sure
    what this would allow such an attacker to do.

  4. Circuit-Sniffing
    An observer watching exit traffic from a fallback server will know that the
    previous node in the circuit is one of a  very small, identifiable
    subset of the total ORs in the network. To establish the full path of the
    circuit they would only have to watch the exit traffic from the fallback
    OR and all the traffic from the 20 or 30 ORs it is likely to be connected
    to. This means it is substantially easier to establish all members of a
    circuit which has a fallback OR as an exit (sniff and analyse 10-50 (i.e.
    1/256 varying) + 1 ORs) rather than a normal published OR (sniff all 2560
    or so ORs on the network). The same mechanism that allows the client to
    expect a specific fallback OR to be available from a specific published OR
    allows an attacker to prepare his ground.

    Mitigant:
    In terms of the resources and access required to monitor 2000 to 3000
    nodes, the effort of the adversary is not significantly diminished when he
    is only interested in 20 or 30. It is hard to see how an adversary who can
    obtain access to a randomly selected portion of the Tor network would face
    any new or qualitatively different obstacles in attempting to access much
    of the rest of it.


Implementation Issues:

  The number of ORs this proposal would add to the Tor network is not known.
  This is because there is no mechanism at present for recording unsuccessful
  attempts to become an OR. If the proposal is considered promising it may be
  worthwhile to issue an alpha series release where candidate ORs post a
  primitive fallback descriptor to the authority directories. This fallback
  descriptor would not contain any other flag that would make it eligible for
  selection by clients. It would act solely as a means of sizing the number of
  Tor instances that try and fail to become ORs.

  The upper limit on the number of orconns from fallback ORs a normal,
  published OR should be willing to accept is an open question. Is one
  hundred, mostly idle, such orconns too onerous?

Filename: 134-robust-voting.txt
Title: More robust consensus voting with diverse authority sets
Author: Peter Palfrader
Created: 2008-04-01
Status: Rejected

History:
  2009 May 27: Added note on rejecting this proposal -- Nick

Overview:

  A means to arrive at a valid directory consensus even when voters
  disagree on who is an authority.


Motivation:

  Right now there are about five authoritative directory servers in the
  Tor network, tho this number is expected to rise to about 15 eventually.

  Adding a new authority requires synchronized action from all operators of
  directory authorities so that at any time during the update at least half of
  all authorities are running and agree on who is an authority.  The latter
  requirement is there so that the authorities can arrive at a common
  consensus:  Each authority builds the consensus based on the votes from
  all authorities it recognizes, and so a different set of recognized
  authorities will lead to a different consensus document.


Objective:

  The modified voting procedure outlined in this proposal obsoletes the
  requirement for most authorities to exactly agree on the list of
  authorities.


Proposal:

  The vote document each authority generates contains a list of 
  authorities recognized by the generating authority.  This will be 
  a list of authority identity fingerprints.

  Authorities will accept votes from and serve/mirror votes also for
  authorities they do not recognize.  (Votes contain the signing,
  authority key, and the certificate linking them so they can be 
  verified even without knowing the authority beforehand.)

  Before building the consensus we will check which votes to use for
  building:

   1) We build a directed graph of which authority/vote recognizes
      whom.
   2) (Parts of the graph that aren't reachable, directly or
      indirectly, from any authorities we recognize can be discarded
      immediately.)
   3) We find the largest fully connected subgraph.
      (Should there be more than one subgraph of the same size there
      needs to be some arbitrary ordering so we always pick the same.
      E.g. pick the one who has the smaller (XOR of all votes' digests)
      or something.)
   4) If we are part of that subgraph, great.  This is the list of 
      votes we build our consensus with.
   5) If we are not part of that subgraph, remove all the nodes that
      are part of it and go to 3.

  Using this procedure authorities that are updated to recognize a
  new authority will continue voting with the old group until a
  sufficient number has been updated to arrive at a consensus with
  the recently added authority.

  In fact, the old set of authorities will probably be voting among
  themselves until all but one has been updated to recognize the
  new authority.  Then which set of votes is used for consensus 
  building depends on which of the two equally large sets gets 
  ordered before the other in step (3) above.

  It is necessary to continue with the process in (5) even if we
  are not in the largest subgraph.  Otherwise one rogue authority
  could create a number of extra votes (by new authorities) so that
  everybody stops at 5 and no consensus is built, even tho it would
  be trusted by all clients.


Anonymity Implications:

  The author does not believe this proposal to have anonymity
  implications.


Possible Attacks/Open Issues/Some thinking required:

 Q: Can a number (less or exactly half) of the authorities cause an honest
    authority to vote for "their" consensus rather than the one that would
    result were all authorities taken into account?


 Q: Can a set of votes from external authorities, i.e of whom we trust either
    none or at least not all, cause us to change the set of consensus makers we
    pick?
 A: Yes, if other authorities decide they rather build a consensus with them
    then they'll be thrown out in step 3.  But that's ok since those other
    authorities will never vote with us anyway.
    If we trust none of them then we throw them out even sooner, so no harm done.

 Q: Can this ever force us to build a consensus with authorities we do not
    recognize?
 A: No, we can never build a fully connected set with them in step 3.

------------------------------

I'm rejecting this proposal as insecure.

Suppose that we have a clique of size N, and M hostile members in the
clique.  If these hostile members stop declaring trust for up to M-1
good members of the clique, the clique with the hostile members will
in it will be larger than the one without them.

The M hostile members will constitute a majority of this new clique
when M > (N-(M-1)) / 2, or when M > (N + 1) / 3.  This breaks our
requirement that an adversary must compromise a majority of authorities
in order to control the consensus.

-- Nick
Filename: 135-private-tor-networks.txt
Title: Simplify Configuration of Private Tor Networks
Author: Karsten Loesing
Created: 29-Apr-2008
Status: Closed
Target: 0.2.1.x
Implemented-In: 0.2.1.2-alpha

Change history:

  29-Apr-2008  Initial proposal for or-dev
  19-May-2008  Included changes based on comments by Nick to or-dev and
               added a section for test cases.
  18-Jun-2008  Changed testing-network-only configuration option names.

Overview:

  Configuring a private Tor network has become a time-consuming and
  error-prone task with the introduction of the v3 directory protocol. In
  addition to that, operators of private Tor networks need to set an
  increasing number of non-trivial configuration options, and it is hard
  to keep FAQ entries describing this task up-to-date. In this proposal we
  (1) suggest to (optionally) accelerate timing of the v3 directory voting
  process and (2) introduce an umbrella config option specifically aimed at
  creating private Tor networks.

Design:

  1. Accelerate Timing of v3 Directory Voting Process

  Tor has reasonable defaults for setting up a large, Internet-scale
  network with comparably high latencies and possibly wrong server clocks.
  However, those defaults are bad when it comes to quickly setting up a
  private Tor network for testing, either on a single node or LAN (things
  might be different when creating a test network on PlanetLab or
  something). Some time constraints should be made configurable for private
  networks. The general idea is to accelerate everything that has to do
  with propagation of directory information, but nothing else, so that a
  private network is available as soon as possible. (As a possible
  safeguard, changing these configuration values could be made dependent on
  the umbrella configuration option introduced in 2.)

  1.1. Initial Voting Schedule

  When a v3 directory does not know any consensus, it assumes an initial,
  hard-coded VotingInterval of 30 minutes, VoteDelay of 5 minutes, and
  DistDelay of 5 minutes. This is important for multiple, simultaneously
  restarted directory authorities to meet at a common time and create an
  initial consensus. Unfortunately, this means that it may take up to half
  an hour (or even more) for a private Tor network to bootstrap.

  We propose to make these three time constants configurable (note that
  V3AuthVotingInterval, V3AuthVoteDelay, and V3AuthDistDelay do not have an
  effect on the _initial_ voting schedule, but only on the schedule that a
  directory authority votes for). This can be achieved by introducing three
  new configuration options: TestingV3AuthInitialVotingInterval,
  TestingV3AuthInitialVoteDelay, and TestingV3AuthInitialDistDelay.

  As first safeguards, Tor should only accept configuration values for
  TestingV3AuthInitialVotingInterval that divide evenly into the default
  value of 30 minutes. The effect is that even if people misconfigured
  their directory authorities, they would meet at the default values at the
  latest. The second safeguard is to allow configuration only when the
  umbrella configuration option TestingTorNetwork is set.

  1.2. Immediately Provide Reachability Information (Running flag)

  The default behavior of a directory authority is to provide the Running
  flag only after the authority is available for at least 30 minutes. The
  rationale is that before that time, an authority simply cannot deliver
  useful information about other running nodes. But for private Tor
  networks this may be different. This is currently implemented in the code
  as:

  /** If we've been around for less than this amount of time, our
   * reachability information is not accurate. */
  #define DIRSERV_TIME_TO_GET_REACHABILITY_INFO (30*60)

  There should be another configuration option
  TestingAuthDirTimeToLearnReachability with a default value of 30 minutes
  that can be changed when running testing Tor networks, e.g. to 0 minutes.
  The configuration value would simply replace the quoted constant. Again,
  changing this option could be safeguarded by requiring the umbrella
  configuration option TestingTorNetwork to be set.

  1.3. Reduce Estimated Descriptor Propagation Time

  Tor currently assumes that it takes up to 10 minutes until router
  descriptors are propagated from the authorities to directory caches.
  This is not very useful for private Tor networks, and we want to be able
  to reduce this time, so that clients can download router descriptors in a
  timely manner.

  /** Clients don't download any descriptor this recent, since it will
   * probably not have propagated to enough caches. */
  #define ESTIMATED_PROPAGATION_TIME (10*60)

  We suggest to introduce a new config option
  TestingEstimatedDescriptorPropagationTime which defaults to 10 minutes,
  but that can be set to any lower non-negative value, e.g. 0 minutes. The
  same safeguards as in 1.2 could be used here, too.

  2. Umbrella Option for Setting Up Private Tor Networks

  Setting up a private Tor network requires a number of specific settings
  that are not required or useful when running Tor in the public Tor
  network. Instead of writing down these options in a FAQ entry, there
  should be a single configuration option, e.g. TestingTorNetwork, that
  changes all required settings at once. Newer Tor versions would keep the
  set of configuration options up-to-date. It should still remain possible
  to manually overwrite the settings that the umbrella configuration option
  affects.

  The following configuration options are set by TestingTorNetwork:

  - ServerDNSAllowBrokenResolvConf 1
      Ignore the situation that private relays are not aware of any name
      servers.

  - DirAllowPrivateAddresses 1
      Allow router descriptors containing private IP addresses.

  - EnforceDistinctSubnets 0
      Permit building circuits with relays in the same subnet.

  - AssumeReachable 1
      Omit self-testing for reachability.

  - AuthDirMaxServersPerAddr 0
  - AuthDirMaxServersPerAuthAddr 0
      Permit an unlimited number of nodes on the same IP address.

  - ClientDNSRejectInternalAddresses 0
      Believe in DNS responses resolving to private IP addresses.

  - ExitPolicyRejectPrivate 0
      Allow exiting to private IP addresses. (This one is a matter of
      taste---it might be dangerous to make this a default in a private
      network, although people setting up private Tor networks should know
      what they are doing.)

  - V3AuthVotingInterval 5 minutes
  - V3AuthVoteDelay 20 seconds
  - V3AuthDistDelay 20 seconds
      Accelerate voting schedule after first consensus has been reached.

  - TestingV3AuthInitialVotingInterval 5 minutes
  - TestingV3AuthInitialVoteDelay 20 seconds
  - TestingV3AuthInitialDistDelay 20 seconds
      Accelerate initial voting schedule until first consensus is reached.

  - TestingAuthDirTimeToLearnReachability 0 minutes
      Consider routers as Running from the start of running an authority.

  - TestingEstimatedDescriptorPropagationTime 0 minutes
      Clients try downloading router descriptors from directory caches,
      even when they are not 10 minutes old.

  In addition to changing the defaults for these configuration options,
  TestingTorNetwork can only be set when a user has manually configured
  DirServer lines.

Test:

  The implementation of this proposal must pass the following tests:

  1. Set TestingTorNetwork and see if dependent configuration options are
     correctly changed.

     tor DataDirectory . ControlPort 9051 TestingTorNetwork 1 DirServer \
       "mydir 127.0.0.1:1234 0000000000000000000000000000000000000000"
     telnet 127.0.0.1 9051
     AUTHENTICATE
     GETCONF TestingTorNetwork TestingAuthDirTimeToLearnReachability
     250-TestingTorNetwork=1
     250 TestingAuthDirTimeToLearnReachability=0
     QUIT

  2. Set TestingTorNetwork and a dependent configuration value to see if
     the provided value is used for the dependent option.

     tor DataDirectory . ControlPort 9051 TestingTorNetwork 1 DirServer \
       "mydir 127.0.0.1:1234 0000000000000000000000000000000000000000" \
       TestingAuthDirTimeToLearnReachability 5
     telnet 127.0.0.1 9051
     AUTHENTICATE
     GETCONF TestingTorNetwork TestingAuthDirTimeToLearnReachability
     250-TestingTorNetwork=1
     250 TestingAuthDirTimeToLearnReachability=5
     QUIT

  3. Start with TestingTorNetwork set and change a dependent configuration
     option later on.

     tor DataDirectory . ControlPort 9051 TestingTorNetwork 1 DirServer \
       "mydir 127.0.0.1:1234 0000000000000000000000000000000000000000"
     telnet 127.0.0.1 9051
     AUTHENTICATE
     SETCONF TestingAuthDirTimeToLearnReachability=5
     GETCONF TestingAuthDirTimeToLearnReachability
     250 TestingAuthDirTimeToLearnReachability=5
     QUIT

  4. Start with TestingTorNetwork set and a dependent configuration value,
     and reset that dependent configuration value. The result should be
     the testing-network specific default value.

     tor DataDirectory . ControlPort 9051 TestingTorNetwork 1 DirServer \
       "mydir 127.0.0.1:1234 0000000000000000000000000000000000000000" \
       TestingAuthDirTimeToLearnReachability 5
     telnet 127.0.0.1 9051
     AUTHENTICATE
     GETCONF TestingAuthDirTimeToLearnReachability
     250 TestingAuthDirTimeToLearnReachability=5
     RESETCONF TestingAuthDirTimeToLearnReachability
     GETCONF TestingAuthDirTimeToLearnReachability
     250 TestingAuthDirTimeToLearnReachability=0
     QUIT

  5. Leave TestingTorNetwork unset and check if dependent configuration
     options are left unchanged.

     tor DataDirectory . ControlPort 9051 DirServer \
       "mydir 127.0.0.1:1234 0000000000000000000000000000000000000000"
     telnet 127.0.0.1 9051
     AUTHENTICATE
     GETCONF TestingTorNetwork TestingAuthDirTimeToLearnReachability
     250-TestingTorNetwork=0
     250 TestingAuthDirTimeToLearnReachability=1800
     QUIT

  6. Leave TestingTorNetwork unset, but set dependent configuration option
     which should fail.

     tor DataDirectory . ControlPort 9051 DirServer \
       "mydir 127.0.0.1:1234 0000000000000000000000000000000000000000" \
       TestingAuthDirTimeToLearnReachability 0
     [warn] Failed to parse/validate config:
     TestingAuthDirTimeToLearnReachability may only be changed in testing
     Tor networks!

  7. Start with TestingTorNetwork unset and change dependent configuration
     option later on which should fail.

     tor DataDirectory . ControlPort 9051 DirServer \
       "mydir 127.0.0.1:1234 0000000000000000000000000000000000000000"
     telnet 127.0.0.1 9051
     AUTHENTICATE
     SETCONF TestingAuthDirTimeToLearnReachability=0
     513 Unacceptable option value: TestingAuthDirTimeToLearnReachability
     may only be changed in testing Tor networks!

  8. Start with TestingTorNetwork unset and set it later on which should
     fail.

     tor DataDirectory . ControlPort 9051 DirServer \
       "mydir 127.0.0.1:1234 0000000000000000000000000000000000000000"
     telnet 127.0.0.1 9051
     AUTHENTICATE
     SETCONF TestingTorNetwork=1
     553 Transition not allowed: While Tor is running, changing
     TestingTorNetwork is not allowed.

  9. Start with TestingTorNetwork set and unset it later on which should
     fail.

     tor DataDirectory . ControlPort 9051 TestingTorNetwork 1 DirServer \
       "mydir 127.0.0.1:1234 0000000000000000000000000000000000000000"
     telnet 127.0.0.1 9051
     AUTHENTICATE
     RESETCONF TestingTorNetwork
     513 Unacceptable option value: TestingV3AuthInitialVotingInterval may
     only be changed in testing Tor networks!

 10. Set TestingTorNetwork, but do not provide an alternate DirServer
     which should fail.

     tor DataDirectory . ControlPort 9051 TestingTorNetwork 1
     [warn] Failed to parse/validate config: TestingTorNetwork may only be
     configured in combination with a non-default set of DirServers.

Filename: 136-legacy-keys.txt
Title: Mass authority migration with legacy keys
Author: Nick Mathewson
Created: 13-May-2008
Status: Closed
Implemented-In: 0.2.0.x

Overview:

  This document describes a mechanism to change the keys of more than
  half of the directory servers at once without breaking old clients
  and caches immediately.

Motivation:

  If a single authority's identity key is believed to be compromised,
  the solution is obvious: remove that authority from the list,
  generate a new certificate, and treat the new cert as belonging to a
  new authority.  This approach works fine so long as less than 1/2 of
  the authority identity keys are bad.

  Unfortunately, the mass-compromise case is possible if there is a
  sufficiently bad bug in Tor or in any OS used by a majority of v3
  authorities.  Let's be prepared for it!

  We could simply stop using the old keys and start using new ones,
  and tell all clients running insecure versions to upgrade.
  Unfortunately, this breaks our cacheing system pretty badly, since
  caches won't cache a consensus that they don't believe in.  It would
  be nice to have everybody become secure the moment they upgrade to a
  version listing the new authority keys, _without_ breaking upgraded
  clients until the caches upgrade.

  So, let's come up with a way to provide a time window where the
  consensuses are signed with the new keys and with the old.

Design:

  We allow directory authorities to list a single "legacy key"
  fingerprint in their votes.  Each authority may add a single legacy
  key.  The format for this line is:

     legacy-dir-key FINGERPRINT

  We describe a new consensus method for generating directory
  consensuses.  This method is consensus method "3".

  When the authorities decide to use method "3" (as described in 3.4.1
  of dir-spec.txt), for every included vote with a legacy-dir-key line,
  the consensus includes an extra dir-source line.  The fingerprint in
  this extra line is as in the legacy-dir-key line.  The ports and
  addresses are in the dir-source line.  The nickname is as in the
  dir-source line, with the string "-legacy" appended.

      [We need to include this new dir-source line because the code
      won't accept or preserve signatures from authorities not listed
      as contributing to the consensus.]

  Authorities using legacy dir keys include two signatures on their
  consensuses: one generated with a signing key signed with their real
  signing key, and another generated with a signing key signed with
  another signing key attested to by their identity key.  These
  signing keys MUST be different.  Authorities MUST serve both
  certificates if asked.

Process:

  In the event of a mass key failure, we'll follow the following
  (ugly) procedure:
     - All affected authorities generate new certificates and identity
       keys, and circulate their new dirserver lines.  They copy their old
       certificates and old broken keys, but put them in new "legacy
       key files".
     - At the earliest time that can be arranged, the authorities
       replace their signing keys, identity keys, and certificates
       with the new uncompromised versions, and update to the new list
       of dirserer lines.
     - They add an "V3DirAdvertiseLegacyKey 1" option to their torrc.
     - Now, new consensuses will be generated using the new keys, but
       the results will also be signed with the old keys.
     - Clients and caches are told they need to upgrade, and given a
       time window to do so.
     - At the end of the time window, authorities remove the
       V3DirAdvertiseLegacyKey option.

Notes:

  It might be good to get caches to cache consensuses that they do not
  believe in.  I'm not sure the best way of how to do this.

  It's a superficially neat idea to have new signing keys and have
  them signed by the new and by the old authority identity keys.  This
  breaks some code, though, and doesn't actually gain us anything,
  since we'd still need to include each signature twice.

  It's also a superficially neat idea, if identity keys and signing
  keys are compromised, to at least replace all the signing keys.
  I don't think this achieves us anything either, though.


Filename: 137-bootstrap-phases.txt
Title: Keep controllers informed as Tor bootstraps
Author: Roger Dingledine
Created: 07-Jun-2008
Status: Closed
Implemented-In: 0.2.1.x

1. Overview.

  Tor has many steps to bootstrapping directory information and
  initial circuits, but from the controller's perspective we just have
  a coarse-grained "CIRCUIT_ESTABLISHED" status event. Tor users with
  slow connections or with connectivity problems can wait a long time
  staring at the yellow onion, wondering if it will ever change color.

  This proposal describes a new client status event so Tor can give
  more details to the controller. Section 2 describes the changes to the
  controller protocol; Section 3 describes Tor's internal bootstrapping
  phases when everything is going correctly; Section 4 describes when
  Tor detects a problem and issues a bootstrap warning; Section 5 covers
  suggestions for how controllers should display the results.

2. Controller event syntax.

  The generic status event is:

    "650" SP StatusType SP StatusSeverity SP StatusAction
                                        [SP StatusArguments] CRLF

  So in this case we send
  650 STATUS_CLIENT NOTICE/WARN BOOTSTRAP \
  PROGRESS=num TAG=Keyword SUMMARY=String \
  [WARNING=String REASON=Keyword COUNT=num RECOMMENDATION=Keyword]

  The arguments MAY appear in any order. Controllers MUST accept unrecognized
  arguments.

  "Progress" gives a number between 0 and 100 for how far through
  the bootstrapping process we are. "Summary" is a string that can be
  displayed to the user to describe the *next* task that Tor will tackle,
  i.e., the task it is working on after sending the status event. "Tag"
  is an optional string that controllers can use to recognize bootstrap
  phases from Section 3, if they want to do something smarter than just
  blindly displaying the summary string.

  The severity describes whether this is a normal bootstrap phase
  (severity notice) or an indication of a bootstrapping problem
  (severity warn). If severity warn, it should also include a "warning"
  argument string with any hints Tor has to offer about why it's having
  troubles bootstrapping, a "reason" string that lists one of the reasons
  allowed in the ORConn event, a "count" number that tells how many
  bootstrap problems there have been so far at this phase, and a
  "recommendation" keyword to indicate how the controller ought to react.

3. The bootstrap phases.

  This section describes the various phases currently reported by
  Tor. Controllers should not assume that the percentages and tags listed
  here will continue to match up, or even that the tags will stay in
  the same order. Some phases might also be skipped (not reported) if the
  associated bootstrap step is already complete, or if the phase no longer
  is necessary.  Only "starting" and "done" are guaranteed to exist in all
  future versions.

  Current Tor versions enter these phases in order, monotonically;
  future Tors MAY revisit earlier stages.

  Phase 0:
  tag=starting summary="starting"

  Tor starts out in this phase.

  Phase 5:
  tag=conn_dir summary="Connecting to directory mirror"

  Tor sends this event as soon as Tor has chosen a directory mirror ---
  one of the authorities if bootstrapping for the first time or after
  a long downtime, or one of the relays listed in its cached directory
  information otherwise.

  Tor will stay at this phase until it has successfully established
  a TCP connection with some directory mirror. Problems in this phase
  generally happen because Tor doesn't have a network connection, or
  because the local firewall is dropping SYN packets.

  Phase 10
  tag=handshake_dir summary="Finishing handshake with directory mirror"

  This event occurs when Tor establishes a TCP connection with a relay used
  as a directory mirror (or its https proxy if it's using one). Tor remains
  in this phase until the TLS handshake with the relay is finished.

  Problems in this phase generally happen because Tor's firewall is
  doing more sophisticated MITM attacks on it, or doing packet-level
  keyword recognition of Tor's handshake.

  Phase 15:
  tag=onehop_create summary="Establishing one-hop circuit for dir info"

  Once TLS is finished with a relay, Tor will send a CREATE_FAST cell
  to establish a one-hop circuit for retrieving directory information.
  It will remain in this phase until it receives the CREATED_FAST cell
  back, indicating that the circuit is ready.

  Phase 20:
  tag=requesting_status summary="Asking for networkstatus consensus"

  Once we've finished our one-hop circuit, we will start a new stream
  for fetching the networkstatus consensus. We'll stay in this phase
  until we get the 'connected' relay cell back, indicating that we've
  established a directory connection.

  Phase 25:
  tag=loading_status summary="Loading networkstatus consensus"

  Once we've established a directory connection, we will start fetching
  the networkstatus consensus document. This could take a while; this
  phase is a good opportunity for using the "progress" keyword to indicate
  partial progress.

  This phase could stall if the directory mirror we picked doesn't
  have a copy of the networkstatus consensus so we have to ask another,
  or it does give us a copy but we don't find it valid.

  Phase 40:
  tag=loading_keys summary="Loading authority key certs"

  Sometimes when we've finished loading the networkstatus consensus,
  we find that we don't have all the authority key certificates for the
  keys that signed the consensus. At that point we put the consensus we
  fetched on hold and fetch the keys so we can verify the signatures.

  Phase 45
  tag=requesting_descriptors summary="Asking for relay descriptors"

  Once we have a valid networkstatus consensus and we've checked all
  its signatures, we start asking for relay descriptors. We stay in this
  phase until we have received a 'connected' relay cell in response to
  a request for descriptors.

  Phase 50:
  tag=loading_descriptors summary="Loading relay descriptors"

  We will ask for relay descriptors from several different locations,
  so this step will probably make up the bulk of the bootstrapping,
  especially for users with slow connections. We stay in this phase until
  we have descriptors for at least 1/4 of the usable relays listed in
  the networkstatus consensus. This phase is also a good opportunity to
  use the "progress" keyword to indicate partial steps.

  Phase 80:
  tag=conn_or summary="Connecting to entry guard"

  Once we have a valid consensus and enough relay descriptors, we choose
  some entry guards and start trying to build some circuits. This step
  is similar to the "conn_dir" phase above; the only difference is
  the context.

  If a Tor starts with enough recent cached directory information,
  its first bootstrap status event will be for the conn_or phase.

  Phase 85:
  tag=handshake_or summary="Finishing handshake with entry guard"

  This phase is similar to the "handshake_dir" phase, but it gets reached
  if we finish a TCP connection to a Tor relay and we have already reached
  the "conn_or" phase. We'll stay in this phase until we complete a TLS
  handshake with a Tor relay.

  Phase 90:
  tag=circuit_create "Establishing circuits"

  Once we've finished our TLS handshake with an entry guard, we will
  set about trying to make some 3-hop circuits in case we need them soon.

  Phase 100:
  tag=done summary="Done"

  A full 3-hop circuit has been established. Tor is ready to handle
  application connections now.

4. Bootstrap problem events.

  When an OR Conn fails, we send a "bootstrap problem" status event, which
  is like the standard bootstrap status event except with severity warn.
  We include the same progress, tag, and summary values as we would for
  a normal bootstrap event, but we also include "warning", "reason",
  "count", and "recommendation" key/value combos.

  The "reason" values are long-term-stable controller-facing tags to
  identify particular issues in a bootstrapping step.  The warning
  strings, on the other hand, are human-readable. Controllers SHOULD
  NOT rely on the format of any warning string. Currently the possible
  values for "recommendation" are either "ignore" or "warn" -- if ignore,
  the controller can accumulate the string in a pile of problems to show
  the user if the user asks; if warn, the controller should alert the
  user that Tor is pretty sure there's a bootstrapping problem.

  Currently Tor uses recommendation=ignore for the first nine bootstrap
  problem reports for a given phase, and then uses recommendation=warn
  for subsequent problems at that phase. Hopefully this is a good
  balance between tolerating occasional errors and reporting serious
  problems quickly.

5. Suggested controller behavior.

  Controllers should start out with a yellow onion or the equivalent
  ("starting"), and then watch for either a bootstrap status event
  (meaning the Tor they're using is sufficiently new to produce them,
  and they should load up the progress bar or whatever they plan to use
  to indicate progress) or a circuit_established status event (meaning
  bootstrapping is finished).

  In addition to a progress bar in the display, controllers should also
  have some way to indicate progress even when no controller window is
  open. For example, folks using Tor Browser Bundle in hostile Internet
  cafes don't want a big splashy screen up. One way to let the user keep
  informed of progress in a more subtle way is to change the task tray
  icon and/or tooltip string as more bootstrap events come in.

  Controllers should also have some mechanism to alert their user when
  bootstrapping problems are reported. Perhaps we should gather a set of
  help texts and the controller can send the user to the right anchor in a
  "bootstrapping problems" page in the controller's help subsystem?

6. Getting up to speed when the controller connects.

  There's a new "GETINFO /status/bootstrap-phase" option, which returns
  the most recent bootstrap phase status event sent. Specifically,
  it returns a string starting with either "NOTICE BOOTSTRAP ..." or
  "WARN BOOTSTRAP ...".

  Controllers should use this getinfo when they connect or attach to
  Tor to learn its current state.

Filename: 138-remove-down-routers-from-consensus.txt
Title: Remove routers that are not Running from consensus documents
Author: Peter Palfrader
Created: 11-Jun-2008
Status: Closed
Implemented-In: 0.2.1.2-alpha

1. Overview.

  Tor directory authorities hourly vote and agree on a consensus document
  which lists all the routers on the network together with some of their
  basic properties, like if a router is an exit node, whether it is
  stable or whether it is a version 2 directory mirror.

  One of the properties given with each router is the 'Running' flag.
  Clients do not use routers that are not listed as running.

  This proposal suggests that routers without the Running flag are not
  listed at all.

2. Current status

  At a typical bootstrap a client downloads a 140KB consensus, about
  10KB of certificates to verify that consensus, and about 1.6MB of
  server descriptors, about 1/4 of which it requires before it will
  start building circuits.

  Another proposal deals with how to get that huge 1.6MB fraction to
  effectively zero (by downloading only individual descriptors, on
  demand).  Should that get successfully implemented that will leave the
  140KB compressed consensus as a large fraction of what a client needs
  to get in order to work.

  About one third of the routers listed in a consensus are not running
  and will therefore never be used by clients who use this consensus.
  Not listing those routers will save about 30% to 40% in size.

3. Proposed change

  Authority directory servers produce vote documents that include all
  the servers they know about, running or not, like they currently
  do.  In addition these vote documents also state that the authority
  supports a new consensus forming method (method number 4).

  If more than two thirds of votes that an authority has received claim
  they support method 4 then this new method will be used:  The
  consensus document is formed like before but a new last step removes
  all routers from the listing that are not marked as Running.

Filename: 139-conditional-consensus-download.txt
Title: Download consensus documents only when it will be trusted
Author: Peter Palfrader
Created: 2008-04-13
Status: Closed
Implemented-In: 0.2.1.x

Overview:

  Servers only provide consensus documents to clients when it is known that
  the client will trust it.

Motivation:

  When clients[1] want a new network status consensus they request it
  from a Tor server using the URL path /tor/status-vote/current/consensus.
  Then after downloading the client checks if this consensus can be
  trusted.  Whether the client trusts the consensus depends on the
  authorities that the client trusts and how many of those
  authorities signed the consensus document.

  If the client cannot trust the consensus document it is disregarded
  and a new download is tried at a later time.  Several hundred
  kilobytes of server bandwidth were wasted by this single client's
  request.

  With hundreds of thousands of clients this will have undesirable
  consequences when the list of authorities has changed so much that a
  large number of established clients no longer can trust any consensus
  document formed.

Objective:

  The objective of this proposal is to make clients not download
  consensuses they will not trust.

Proposal:

  The list of authorities that are trusted by a client are encoded in
  the URL they send to the directory server when requesting a consensus
  document.

  The directory server then only sends back the consensus when more than
  half of the authorities listed in the request have signed the
  consensus.  If it is known that the consensus will not be trusted
  a 404 error code is sent back to the client.

  This proposal does not require directory caches to keep more than one
  consensus document.  This proposal also does not require authorities
  to verify the signature on the consensus document of authorities they
  do not recognize.

  The new URL scheme to download a consensus is
  /tor/status-vote/current/consensus/<F> where F is a list of
  fingerprints, sorted in ascending order, and concatenated using a +
  sign.

  Fingerprints are uppercase hexadecimal encodings of the authority
  identity key's digest.  Servers should also accept requests that
  use lower case or mixed case hexadecimal encodings.

  A .z URL for compressed versions of the consensus will be provided
  similarly to existing resources and is the URL that usually should
  be used by clients.

Migration:

  The old location of the consensus should continue to work
  indefinitely.  Not only is it used by old clients, but it is a useful
  resource for automated tools that do not particularly care which
  authorities have signed the consensus.

  Authorities that are known to the client a priori by being shipped
  with the Tor code are assumed to handle this format.

  When downloading a consensus document from caches that do not support this
  new format they fall back to the old download location.

  Caches support the new format starting with Tor version 0.2.1.1-alpha.

Anonymity Implications:

  By supplying the list of authorities a client trusts to the directory
  server we leak information (like likely version of Tor client) to the
  directory server.  In the current system we also leak that we are
  very old - by re-downloading the consensus over and over again, but
  only when we are so old that we no longer can trust the consensus.



Footnotes:
 1. For the purpose of this proposal a client can be any Tor instance
    that downloads a consensus document.  This includes relays,
    directory caches as well as end users.
Filename: 140-consensus-diffs.txt
Title: Provide diffs between consensuses
Author: Peter Palfrader
Created: 13-Jun-2008
Implemented-In: 0.3.1.1-alpha
Status: Closed
Ticket: https://bugs.torproject.org/13339

0. History

  22-May-2009: Restricted the ed format even more strictly for ease of
  implementation. -nickm

  25-May-2014: Adapted to the new dir-spec version 3 and made the diff urls
  backwards-compatible. -mvdan

  1-Mar-2017: Update to new stats, note newer proposals, note flavors,
  diffs, add parameters, restore diff-only URLs, say what "Digest"
  means. -nickm

  3-May-2017: Add a notion of "digest-as-signed" vs "full digest", since
  otherwise the fact that there are multiple encodings of the same valid
  consensus signatures would make clients identify which encodings they
  had been given as they asked for diffs.

  4-May-2017: Remove support for truncated digest prefixes.

1. Overview.

  Tor clients and servers need a list of which relays are on the
  network.  This list, the consensus, is created by authorities
  hourly and clients fetch a copy of it, with some delay, hourly.

  This proposal suggests that clients download diffs of consensuses
  once they have a consensus instead of hourly downloading a full
  consensus.

  This does not only apply to ordinary directory consensuses, but to the
  newer microdescriptor consensuses added in the third version of the
  directory specification.

2. Numbers

  After implementing proposal 138, which removed nodes that are not
  running from the list, a consensus document was about 92 kilobytes
  in size after compression... back in 2008 when this proposal was first
  written.

  But now in March 2017, that figure is more like 625 kilobytes.

  The diff between two consecutive consensuses, in ed format, is on
  average 37 kilobytes compressed.  So by making this change, we could
  save something like 94% of our consensus download bandwidth.

3. Proposal

3.0. Preliminaries.

  Unless otherwise specified, all digests in this document are SHA3-256
  digests, encoded in base64.  This document also uses "hash" as
  synonymous with "digest".

  A "full digest" of a consensus document covers the entire document,
  from the "network-status-version" through the newline after the final
  "-----END SIGNATURE-----".

  A "digest as signed" of a consensus document covers the same part that
  the signatures cover: the "network-status-version" through the space
  immediately after the "directory-signature" keyword on the first
  "directory-signature" line.

3.1 Clients

  If a client has a consensus that is recent enough it SHOULD
  try to download a diff to get the latest consensus rather than
  fetching a full one.

  [XXX: what is recent enough?
	time delta in hours / size of compressed diff

1:	38177
2:      66955
3:	93502
4:	118959
5:	143450
6:	167136
12:	291354
18:	404008
24:	416663
30:	431240
36:	443858
42:	454849
48:	464677
54:	476716
60:	487755
66:	497502
72:	506421

   Data suggests that for the first few hours' diffs are very useful,
   saving at least 50% for the first 12 hours.  After that, returns seem to
   be more marginal.  But note the savings from proposals like 274-276, which
   make diffs smaller over a much longer timeframe. ]


3.2 Servers

  Directory authorities and servers need to keep a number of old consensus
  documents so they can build diffs.  (See section 5 below ).  They should
  offer a diff to the most recent consensus at the following request:

  HTTP/1.0 GET /tor/status-vote/current/consensus{-Flavor}/<FPRLIST>.z
  X-Or-Diff-From-Consensus: HASH1 HASH2...

  where the hashes are the digests-as-signed of the consensuses the client
  currently has, and FPRLIST is a list of (abbreviated) fingerprints of
  authorities the client trusts.

  Servers will only return a consensus if more than half of the requested
  authorities have signed the document. Otherwise, a 404 error will be sent
  back.

  The advantage of using the same URL that is currently used for
  consensuses is that the client doesn't need to know whether a server
  supports consensus diffs.  If it doesn't, it will simply ignore the
  extra header and return the full consensus.

  If a server cannot offer a diff from one of the consensuses identified
  by one of the hashes but has a current consensus it MUST return the
  full consensus.

  [XXX: what should we do when the client already has the latest
  consensus?  I can think of the following options:
    - send back 3xx not modified
    - send back 200 ok and an empty diff
    - send back 404 nothing newer here.

    I currently lean towards the empty diff.]

  Additionally, specific diff for a given consensus digest-as-signed
  should be available a URL of the form:

    /tor/status-vote/current/consensus{-Flavor}/diff/<HASH>/<FPRLIST>.z

  This differs from the previous request type in that it should never
  return a whole consensus: if a diff is not available, it should return
  404.

4. Diff Format

  Diffs start with the token "network-status-diff-version" followed by a
  space and the version number, currently "1".

  If a document does not start with network-status-diff it is assumed
  to be a full consensus download and would therefore currently start
  with "network-status-version 3".

  Following the network-status-diff line is another header line,
  starting with the token "hash" followed by the digest-as-signed of the
  consensus that this diff applies to, and the full digest that the
  resulting consensus should have.

  Following the network-status-diff header lines is a diff, or patch, in
  limited ed format.  We choose this format because it is easy to create
  and process with standard tools (patch, diff -e, ed).  This will help
  us in developing and testing this proposal and it should make future
  debugging easier.

  [ If at one point in the future we decide that the space benefits from
    a custom diff format outweighs these benefits we can always
    introduce a new diff format and offer it at for instance
    ../diff2/... ]

  We support the following ed commands, each on a line by itself:
   - "<n1>d"          Delete line n1
   - "<n1>,<n2>d"     Delete lines n1 through n2, inclusive
   - "<n1>,$d"        Delete line n1 through the end of the file, inclusive.
   - "<n1>c"          Replace line n1 with the following block
   - "<n1>,<n2>c"     Replace lines n1 through n2, inclusive, with the
                      following block.
   - "<n1>a"          Append the following block after line n1.
   - "a"              Append the following block after the current line.

  Note that line numbers always apply to the file after all previous
  commands have already been applied.  Note also that line numbers
  are 1-indexed.

  The commands MUST apply to the file from back to front, such that
  lines are only ever referred to by their position in the original
  file.

  If there are any directory signatures on the original document, the
  first command MUST be a "<n1>,$d" form to remove all of the directory
  signatures.  Using this format ensures that the client will
  successfully apply the diff even if they have an unusual encoding for
  the signatures.

  The "current line" is either the first line of the file, if this is
  the first command, the last line of a block we added in an append or
  change command, or the line immediate following a set of lines we just
  deleted (or the last line of the file if there are no lines after
  that).

  The replace and append command take blocks.  These blocks are simply
  appended to the diff after the line with the command.  A line with
  just a period (".") ends the block (and is not part of the lines
  to add).  Note that it is impossible to insert a line with just
  a single dot.

4.1. Concatenating multiple diffs

  Directory caches may, at their discretion, return the concatenation of
  multiple diffs using the format above.  Such diffs are to be applied from
  first to last.  This allows the caches to cache a smaller number of
  compressed diffs, at the expense of some loss in bandwidth efficiency.


5. Networkstatus parameters

  The following parameters govern how relays and clients use this protocol.

     min-consensuses-age-to-cache-for-diff
       (min 0, max 744, default 6)
     max-consensuses-age-to-cache-for-diff
       (min 0, max 8192, default 72)

       These two parameters determine how much consensus history (in
       hours) relays should try to cache in order to serve diffs.

     try-diff-for-consensus-newer-than
       (min 0, max 8192, default 72)

       This parameter determines how old a consensus can be (in hours)
       before a client should no longer try to find a diff for it.
Filename: 141-jit-sd-downloads.txt
Title: Download server descriptors on demand
Author: Peter Palfrader
Created: 15-Jun-2008
Status: Obsolete

1. Overview

  Downloading all server descriptors is the most expensive part
  of bootstrapping a Tor client.  These server descriptors currently
  amount to about 1.5 Megabytes of data, and this size will grow
  linearly with network size.

  Fetching all these server descriptors takes a long while for people
  behind slow network connections.  It is also a considerable load on
  our network of directory mirrors.

  This document describes proposed changes to the Tor network and
  directory protocol so that clients will no longer need to download
  all server descriptors.

  These changes consist of moving load balancing information into
  network status documents, implementing a means to download server
  descriptors on demand in an anonymity-preserving way, and dealing
  with exit node selection.

2. What is in a server descriptor

  When a Tor client starts the first thing it will try to get is a
  current network status document: a consensus signed by a majority
  of directory authorities.  This document is currently about 100
  Kilobytes in size, tho it will grow linearly with network size.
  This document lists all servers currently running on the network.
  The Tor client will then try to get a server descriptor for each
  of the running servers.  All server descriptors currently amount
  to about 1.5 Megabytes of downloads.

  A Tor client learns several things about a server from its descriptor.
  Some of these it already learned from the network status document
  published by the authorities, but the server descriptor contains it
  again in a single statement signed by the server itself, not just by
  the directory authorities.

  Tor clients use the information from server descriptors for
  different purposes, which are considered in the following sections.

  #three ways:  One, to determine if a server will be able to handle
  #this client's request; two, to actually communicate or use the server;
  #three, for load balancing decisions.
  #
  #These three points are considered in the following subsections.

2.1 Load balancing

  The Tor load balancing mechanism is quite complex in its details, but
  it has a simple goal: The more traffic a server can handle the more
  traffic it should get.  That means the more traffic a server can
  handle the more likely a client will use it.

  For this purpose each server descriptor has bandwidth information
  which tries to convey a server's capacity to clients.

  Currently we weigh servers differently for different purposes.  There
  is a weight for when we use a server as a guard node (our entry to the
  Tor network), there is one weight we assign servers for exit duties,
  and a third for when we need intermediate (middle) nodes.

2.2 Exit information

  When a Tor wants to exit to some resource on the internet it will
  build a circuit to an exit node that allows access to that resource's
  IP address and TCP Port.

  When building that circuit the client can make sure that the circuit
  ends at a server that will be able to fulfill the request because the
  client already learned of all the servers' exit policies from their
  descriptors.

2.3 Capability information

  Server descriptors contain information about the specific version of
  the Tor protocol they understand [proposal 105].

  Furthermore the server descriptor also contains the exact version of
  the Tor software that the server is running and some decisions are
  made based on the server version number (for instance a Tor client
  will only make conditional consensus requests [proposal 139] when
  talking to Tor servers version 0.2.1.1-alpha or later).

2.4 Contact/key information

  A server descriptor lists a server's IP address and TCP ports on which
  it accepts onion and directory connections.  Furthermore it contains
  the onion key (a short lived RSA key to which clients encrypt CREATE
  cells).

2.5 Identity information

  A Tor client learns the digest of a server's key from the network
  status document.  Once it has a server descriptor this descriptor
  contains the full RSA identity key of the server.  Clients verify
  that 1) the digest of the identity key matches the expected digest
  it got from the consensus, and 2) that the signature on the descriptor
  from that key is valid.


3. No longer require clients to have copies of all SDs

3.1 Load balancing info in consensus documents

  One of the reasons why clients download all server descriptors is for
  doing load proper load balancing as described in 2.1.  In order for
  clients to not require all server descriptors this information will
  have to move into the network status document.

  Consensus documents will have a new line per router similar
  to the "r", "s", and "v" lines that already exist.  This line
  will convey weight information to clients.

   "w Bandwidth=193"

  The bandwidth number is the lesser of observed bandwidth and bandwidth
  rate limit from the server descriptor that the "r" line referenced by
  digest (1st and 3rd field of the bandwidth line in the descriptor).
  It is given in kilobytes per second so the byte value in the
  descriptor has to be divided by 1024 (and is then truncated, i.e.
  rounded down).

  Authorities will cap the bandwidth number at some arbitrary value,
  currently 10MB/sec.  If a router claims a larger bandwidth an
  authority's vote will still only show Bandwidth=10240.

  The consensus value for bandwidth is the median of all bandwidth
  numbers given in votes.  In case of an even number of votes we use
  the lower median.  (Using this procedure allows us to change the
  cap value more easily.)

  Clients should believe the bandwidth as presented in the consensus,
  not capping it again.

3.2 Fetching descriptors on demand

  As described in 2.4 a descriptor lists IP address, OR- and Dir-Port,
  and the onion key for a server.

  A client already knows the IP address and the ports from the consensus
  documents, but without the onion key it will not be able to send
  CREATE/EXTEND cells for that server.  Since the client needs the onion
  key it needs the descriptor.

  If a client only downloaded a few descriptors in an observable manner
  then that would leak which nodes it was going to use.

  This proposal suggests the following:

  1) when connecting to a guard node for which the client does not
     yet have a cached descriptor it requests the descriptor it
     expects by hash.  (The consensus document that the client holds
     has a hash for the descriptor of this server.  We want exactly
     that descriptor, not a different one.)

     It does that by sending a RELAY_REQUEST_SD cell.

     A client MAY cache the descriptor of the guard node so that it does
     not need to request it every single time it contacts the guard.

  2) when a client wants to extend a circuit that currently ends in
     server B to a new next server C, the client will send a
     RELAY_REQUEST_SD cell to server B.  This cell contains in its
     payload the hash of a server descriptor the client would like
     to obtain (C's server descriptor).  The server sends back the
     descriptor and the client can now form a valid EXTEND/CREATE cell
     encrypted to C's onion key.

     Clients MUST NOT cache such descriptors.  If they did they might
     leak that they already extended to that server at least once
     before.

  Replies to RELAY_REQUEST_SD requests need to be padded to some
  constant upper limit in order to conceal a client's destination
  from anybody who might be counting cells/bytes.

  RELAY_REQUEST_SD cells contain the following information:
    - hash of the server descriptor requested
    - hash of the identity digest of the server for which we want the SD
    - IP address and OR-port or the server for which we want the SD
    - padding factor - the number of cells we want the answer
      padded to.
      [XXX this just occured to me and it might be smart.  or it might
       be stupid.  clients would learn the padding factor they want
       to use from the consensus document.  This allows us to grow
       the replies later on should SDs become larger.]
  [XXX: figure out a decent padding size]

3.3 Protocol versions

  Server descriptors contain optional information of supported
  link-level and circuit-level protocols in the form of
  "opt protocols Link 1 2 Circuit 1".  These are not currently needed
  and will probably eventually move into the "v" (version) line in
  the consensus.  This proposal does not deal with them.

  Similarly a server descriptor contains the version number of
  a Tor node.  This information is already present in the consensus
  and is thus available to all clients immediately.

3.4 Exit selection

  Currently finding an appropriate exit node for a user's request is
  easy for a client because it has complete knowledge of all the exit
  policies of all servers on the network.

  The consensus document will once again be extended to contain the
  information required by clients.  This information will be a summary
  of each node's exit policy.  The exit policy summary will only contain
  the list of ports to which a node exits to most destination IP
  addresses.

  A summary should claim a router exits to a specific TCP port if,
  ignoring private IP addresses, the exit policy indicates that the
  router would exit to this port to most IP address.  either two /8
  netblocks, or one /8 and a couple of /12s or any other combination).
  The exact algorith used is this:  Going through all exit policy items
   - ignore any accept that is not for all IP addresses ("*"),
   - ignore rejects for these netblocks (exactly, no subnetting):
     0.0.0.0/8, 169.254.0.0/16, 127.0.0.0/8, 192.168.0.0/16, 10.0.0.0/8,
     and 172.16.0.0/12m
   - for each reject count the number of IP addresses rejected against
     the affected ports,
   - once we hit an accept for all IP addresses ("*") add the ports in
     that policy item to the list of accepted ports, if they don't have
     more than 2^25 IP addresses (that's two /8 networks) counted
     against them (i.e. if the router exits to a port to everywhere but
     at most two /8 networks).

  An exit policy summary will be included in votes and consensus as a
  new line attached to each exit node.  The line will have the format
   "p" <space> "accept"|"reject" <portlist>
  where portlist is a comma seperated list of single port numbers or
  portranges (e.g.  "22,80-88,1024-6000,6667").

  Whether the summary shows the list of accepted ports or the list of
  rejected ports depends on which list is shorter (has a shorter string
  representation).  In case of ties we choose the list of accepted
  ports.  As an exception to this rule an allow-all policy is
  represented as "accept 1-65535" instead of "reject " and a reject-all
  policy is similarly given as "reject 1-65535".

  Summary items are compressed, that is instead of "80-88,89-100" there
  only is a single item of "80-100", similarly instead of "20,21" a
  summary will say "20-21".

  Port lists are sorted in ascending order.

  The maximum allowed length of a policy summary (including the "accept "
  or "reject ") is 1000 characters.  If a summary exceeds that length we
  use an accept-style summary and list as much of the port list as is
  possible within these 1000 bytes.

3.4.1 Consensus selection

  When building a consensus, authorities have to agree on a digest of
  the server descriptor to list in the router line for each router.
  This is documented in dir-spec section 3.4.

  All authorities that listed that agreed upon descriptor digest in
  their vote should also list the same exit policy summary - or list
  none at all if the authority has not been upgraded to list that
  information in their vote.

  If we have votes with matching server descriptor digest of which at
  least one of them has an exit policy then we differ between two cases:
   a) all authorities agree (or abstained) on the policy summary, and we
      use the exit policy summary that they all listed in their vote,
   b) something went wrong (or some authority is playing foul) and we
      have different policy summaries.  In that case we pick the one
      that is most commonly listed in votes with the matching
      descriptor.  We break ties in favour of the lexigraphically larger
      vote.

  If none one of the votes with a matching server descriptor digest has
  an exit policy summary we use the most commonly listed one in all
  votes, breaking ties like in case b above.

3.4.2 Client behaviour

  When choosing an exit node for a specific request a Tor client will
  choose from the list of nodes that exit to the requested port as given
  by the consensus document.  If a client has additional knowledge (like
  cached full descriptors) that indicates the so chosen exit node will
  reject the request then it MAY use that knowledge (or not include such
  nodes in the selection to begin with).  However, clients MUST NOT use
  nodes that do not list the port as accepted in the summary (but for
  which they know that the node would exit to that address from other
  sources, like a cached descriptor).

  An exception to this is exit enclave behaviour: A client MAY use the
  node at a specific IP address to exit to any port on the same address
  even if that node is not listed as exiting to the port in the summary.

4. Migration

4.1 Consensus document changes.

  The consensus will need to include
    - bandwidth information (see 3.1)
    - exit policy summaries (3.4)

  A new consensus method (number TBD) will be chosen for this.

5. Future possibilities

  This proposal still requires that all servers have the descriptors of
  every other node in the network in order to answer RELAY_REQUEST_SD
  cells.  These cells are sent when a circuit is extended from ending at
  node B to a new node C.  In that case B would have to answer a
  RELAY_REQUEST_SD cell that asks for C's server descriptor (by SD digest).

  In order to answer that request B obviously needs a copy of C's server
  descriptor.  The RELAY_REQUEST_SD cell already has all the info that
  B needs to contact C so it can ask about the descriptor before passing it
  back to the client.

Filename: 142-combine-intro-and-rend-points.txt
Title: Combine Introduction and Rendezvous Points
Author: Karsten Loesing, Christian Wilms
Created: 27-Jun-2008
Status: Dead

Change history:

  27-Jun-2008  Initial proposal for or-dev
  04-Jul-2008  Give first security property the new name "Responsibility"
               and change new cell formats according to rendezvous protocol
               version 3 draft.
  19-Jul-2008  Added comment by Nick (but no solution, yet) that sharing of
               circuits between multiple clients is not supported by Tor.

Overview:

  Establishing a connection to a hidden service currently involves two Tor
  relays, introduction and rendezvous point, and 10 more relays distributed
  over four circuits to connect to them. The introduction point is
  established in the mid-term by a hidden service to transfer introduction
  requests from client to the hidden service. The rendezvous point is set
  up by the client for a single hidden service request and actually
  transfers end-to-end encrypted application data between client and hidden
  service.

  There are some reasons for separating the two roles of introduction and
  rendezvous point: (1) Responsibility: A relay shall not be made
  responsible that it relays data for a certain hidden service; in the
  original design as described in [1] an introduction point relays no
  application data, and a rendezvous points neither knows the hidden
  service nor can it decrypt the data. (2) Scalability: The hidden service
  shall not have to maintain a number of open circuits proportional to the
  expected number of client requests. (3) Attack resistance: The effect of
  an attack on the only visible parts of a hidden service, its introduction
  points, shall be as small as possible.

  However, elimination of a separate rendezvous connection as proposed by
  Øverlier and Syverson [2] is the most promising approach to improve the
  delay in connection establishment. From all substeps of connection
  establishment extending a circuit by only a single hop is responsible for
  a major part of delay. Reducing on-demand circuit extensions from two to
  one results in a decrease of mean connection establishment times from 39
  to 29 seconds [3]. Particularly, eliminating the delay on hidden-service
  side allows the client to better observe progress of connection
  establishment, thus allowing it to use smaller timeouts. Proposal 114
  introduced new introduction keys for introduction points and provides for
  user authorization data in hidden service descriptors; it will be shown
  in this proposal that introduction keys in combination with new
  introduction cookies provide for the first security property
  responsibility. Further, eliminating the need for a separate introduction
  connection benefits the overall network load by decreasing the number of
  circuit extensions. After all, having only one connection between client
  and hidden service reduces the overall protocol complexity.

Design:

  1. Hidden Service Configuration

  Hidden services should be able to choose whether they would like to use
  this protocol. This might be opt-in for 0.2.1.x and opt-out for later
  major releases.

  2. Contact Point Establishment

  When preparing a hidden service, a Tor client selects a set of relays to
  act as contact points instead of introduction points. The contact point
  combines both roles of introduction and rendezvous point as proposed in
  [2]. The only requirement for a relay to be picked as contact point is
  its capability of performing this role. This can be determined from the
  Tor version number that needs to be equal or higher than the first
  version that implements this proposal.

  The easiest way to implement establishment of contact points is to
  introduce v2 ESTABLISH_INTRO cells. By convention, the relay recognizes
  version 2 ESTABLISH_INTRO cells as requests to establish a contact point
  rather than an introduction point.

     V      Format byte: set to 255               [1 octet]
     V      Version byte: set to 2                [1 octet]
     KLEN   Key length                           [2 octets]
     PK     Public introduction key           [KLEN octets]
     HS     Hash of session info                [20 octets]
     SIG    Signature of above information       [variable]

  The hidden service does not create a fixed number of contact points, like
  3 in the current protocol. It uses a minimum of 3 contact points, but
  increases this number depending on the history of client requests within
  the last hour. The hidden service also increases this number depending on
  the frequency of failing contact points in order to defend against
  attacks on its contact points. When client authorization as described in
  proposal 121 is used, a hidden service can also use the number of
  authorized clients as first estimate for the required number of contact
  points.

  3. Hidden Service Descriptor Creation

  A hidden service needs to issue a fresh introduction cookie for each
  established introduction point. By requiring clients to use this cookie
  in a later connection establishment, an introduction point cannot access
  the hidden service that it works for. Together with the fresh
  introduction key that was introduced in proposal 114, this reduces
  responsibility of a contact point for a specific hidden service.

  The v2 hidden service descriptor format contains an
  "intro-authentication" field that may contain introduction-point specific
  keys. The hidden service creates a random string, comparable to the
  rendezvous cookie, and includes it in the descriptor as introduction
  cookie for auth-type "1". By convention, clients recognize existence of
  auth-type 1 as possibility to connect to a hidden service via a contact
  point rather than an introduction point. Older clients that do not
  understand this new protocol simply ignore that cookie.

  4. Connection Establishment

  When establishing a connection to a hidden service a client learns about
  the capability of using the new protocol from the hidden service
  descriptor. It may choose whether to use this new protocol or not,
  whereas older clients cannot understand the new capability and can only
  use the current protocol. Client using version 0.2.1.x should be able to
  opt-in for using the new protocol, which should change to opt-out for
  later major releases.

  When using the new capability the client creates a v2 INTRODUCE1 cell
  that extends an unversioned INTRODUCE1 cell by adding the content of an
  ESTABLISH_RENDEZVOUS cell. Further, the client sends this cell using the
  new cell type 41 RELAY_INTRODUCE1_VERSIONED to the introduction point,
  because unversioned and versioned INTRODUCE1 cells are indistinguishable:

  Cleartext
     V      Version byte: set to 2                [1 octet]
     PK_ID  Identifier for Bob's PK             [20 octets]
     RC     Rendezvous cookie                   [20 octets]
  Encrypted to introduction key:
     VER    Version byte: set to 3.               [1 octet]
     AUTHT  The auth type that is supported       [1 octet]
     AUTHL  Length of auth data                  [2 octets]
     AUTHD  Auth data                            [variable]
     RC     Rendezvous cookie                   [20 octets]
     g^x    Diffie-Hellman data, part 1        [128 octets]

  The cleartext part contains the rendezvous cookie that the contact point
  remembers just as a rendezvous point would do.

  The encrypted part contains the introduction cookie as auth data for the
  auth type 1. The rendezvous cookie is contained as before, but there is
  no further rendezvous point information, as there is no separate
  rendezvous point.

  5. Rendezvous Establishment

  The contact point recognizes a v2 INTRODUCE1 cell with auth type 1 as a
  request to be used in the new protocol. It remembers the contained
  rendezvous cookie, replies to the client with an INTRODUCE_ACK cell
  (omitting the RENDEZVOUS_ESTABLISHED cell), and forwards the encrypted
  part of the INTRODUCE1 cell as INTRODUCE2 cell to the hidden service.

  6. Introduction at Hidden Service

  The hidden services recognizes an INTRODUCE2 cell containing an
  introduction cookie as authorization data. In this case, it does not
  extend a circuit to a rendezvous point, but sends a RENDEZVOUS1 cell
  directly back to its contact point as usual.

  7. Rendezvous at Contact Point

  The contact point processes a RENDEZVOUS1 cell just as a rendezvous point
  does. The only difference is that the hidden-service-side circuit is not
  exclusive for the client connection, but shared among multiple client
  connections.

  [Tor does not allow sharing of a single circuit among multiple client
   connections easily. We need to think about a smart and efficient way to
   implement this. Comment by Nick. -KL]

Security Implications:

  (1) Responsibility

  One of the original reasons for the separation of introduction and
  rendezvous points is that a relay shall not be made responsible that it
  relays data for a certain hidden service. In the current design an
  introduction point relays no application data and a rendezvous points
  neither knows the hidden service nor can it decrypt the data.

  This property is also fulfilled in this new design. A contact point only
  learns a fresh introduction key instead of the hidden service key, so
  that it cannot recognize a hidden service. Further, the introduction
  cookie, which is unknown to the contact point, prevents it from accessing
  the hidden service itself. The only way for a contact point to access a
  hidden service is to look up whether it is contained in the descriptors
  of known hidden services. A contact point cannot directly be made
  responsible for which hidden service it is working. In addition to that,
  it cannot learn the data that it transfers, because all communication
  between client and hidden service are end-to-end encrypted.

  (2) Scalability

  Another goal of the existing hidden service protocol is that a hidden
  service does not have to maintain a number of open circuits proportional
  to the expected number of client requests. The rationale behind this is
  better scalability.

  The new protocol eliminates the need for a hidden service to extend
  circuits on demand, which has a positive effect on circuits establishment
  times and overall network load. The solution presented here to establish
  a number of contact points proportional to the history of connection
  requests reduces the number of circuits to a minimum number that fits the
  hidden service's needs.

  (3) Attack resistance

  The third goal of separating introduction and rendezvous points is to
  limit the effect of an attack on the only visible parts of a hidden
  service which are the contact points in this protocol.

  In theory, the new protocol is more vulnerable to this attack. An
  attacker who can take down a contact point does not only eliminate an
  access point to the hidden service, but also breaks current client
  connections to the hidden service using that contact point.

  Øverlier and Syverson proposed the concept of valet nodes as additional
  safeguard for introduction/contact points [4]. Unfortunately, this
  increases hidden service protocol complexity conceptually and from an
  implementation point of view. Therefore, it is not included in this
  proposal.

  However, in practice attacking a contact point (or introduction point) is
  not as rewarding as it might appear. The cost for a hidden service to set
  up a new contact point and publish a new hidden service descriptor is
  minimal compared to the efforts necessary for an attacker to take a Tor
  relay down. As a countermeasure to further frustrate this attack, the
  hidden service raises the number of contact points as a function of
  previous contact point failures.

  Further, the probability of breaking client connections due to attacking
  a contact point is minimal. It can be assumed that the probability of one
  of the other five involved relays in a hidden service connection failing
  or being shut down is higher than that of a successful attack on a
  contact point.

  (4) Resistance against Locating Attacks

  Clients are no longer able to force a hidden service to create or extend
  circuits. This further reduces an attacker's capabilities of locating a
  hidden server as described by Øverlier and Syverson [5].

Compatibility:

  The presented protocol does not raise compatibility issues with current
  Tor versions. New relay versions support both, the existing and the
  proposed protocol as introduction/rendezvous/contact points. A contact
  point acts as introduction point simultaneously. Hidden services and
  clients can opt-in to use the new protocol which might change to opt-out
  some time in the future.

References:

  [1] Roger Dingledine, Nick Mathewson, and Paul Syverson, Tor: The
  Second-Generation Onion Router. In the Proceedings of the 13th USENIX
  Security Symposium, August 2004.

  [2] Lasse Øverlier and Paul Syverson, Improving Efficiency and Simplicity
  of Tor Circuit Establishment and Hidden Services. In the Proceedings of
  the Seventh Workshop on Privacy Enhancing Technologies (PET 2007),
  Ottawa, Canada, June 2007.

  [3] Christian Wilms, Improving the Tor Hidden Service Protocol Aiming at
  Better Performance, diploma thesis, June 2008, University of Bamberg.

  [4] Lasse Øverlier and Paul Syverson, Valet Services: Improving Hidden
  Servers with a Personal Touch. In the Proceedings of the Sixth Workshop
  on Privacy Enhancing Technologies (PET 2006), Cambridge, UK, June 2006.

  [5] Lasse Øverlier and Paul Syverson, Locating Hidden Servers. In the
  Proceedings of the 2006 IEEE Symposium on Security and Privacy, May 2006.

Filename: 143-distributed-storage-improvements.txt
Title: Improvements of Distributed Storage for Tor Hidden Service Descriptors
Author: Karsten Loesing
Created: 28-Jun-2008
Status: Superseded

Change history:

  28-Jun-2008  Initial proposal for or-dev

Overview:

  An evaluation of the distributed storage for Tor hidden service
  descriptors and subsequent discussions have brought up a few improvements
  to proposal 114. All improvements are backwards compatible to the
  implementation of proposal 114.

Design:

  1. Report Bad Directory Nodes

  Bad hidden service directory nodes could deny existence of previously
  stored descriptors. A bad directory node that does this with all stored
  descriptors causes harm to the distributed storage in general, but
  replication will cope with this problem in most cases. However, an
  adversary that attempts to make a specific hidden service unavailable by
  running relays that become responsible for all of a service's
  descriptors poses a more serious threat. The distributed storage needs to
  defend against this attack by detecting and removing bad directory nodes.

  As a countermeasure hidden services try to download their descriptors
  every hour at random times from the hidden service directories that are
  responsible for storing it. If a directory node replies with 404 (Not
  found), the hidden service reports the supposedly bad directory node to
  a random selection of half of the directory authorities (with version
  numbers equal to or higher than the first version that implements this
  proposal). The hidden service posts a complaint message using HTTP 'POST'
  to a URL "/tor/rendezvous/complain" with the following message format:

    "hidden-service-directory-complaint" identifier NL

      [At start, exactly once]

      The identifier of the hidden service directory node to be
      investigated.

    "rendezvous-service-descriptor" descriptor NL

      [At end, Excatly once]

      The hidden service descriptor that the supposedly bad directory node
      does not serve.

  The directory authority checks if the descriptor is valid and the hidden
  service directory responsible for storing it. It waits for a random time
  of up to 30 minutes before posting the descriptor to the hidden service
  directory. If the publication is acknowledged, the directory authority
  waits another random time of up to 30 minutes before attempting to
  request the descriptor that it has posted. If the directory node replies
  with 404 (Not found), it will be blacklisted for being a hidden service
  directory node for the next 48 hours.

  A blacklisted hidden service directory is assigned the new flag BadHSDir
  instead of the HSDir flag in the vote that a directory authority creates.
  In a consensus a relay is only assigned a HSDir flag if the majority of
  votes contains a HSDir flag and no more than one third of votes contains
  a BadHSDir flag. As a result, clients do not have to learn about the
  BadHSDir flag. A blacklisted directory node will simply not be assigned
  the HSDir flag in the consensus.

  In order to prevent an attacker from setting up new nodes as replacement
  for blacklisted directory nodes, all directory nodes in the same /24
  subnet are blacklisted, too. Furthermore, if two or more directory nodes
  are blacklisted in the same /16 subnet concurrently, all other directory
  nodes in that /16 subnet are blacklisted, too. Blacklisting holds for at
  most 48 hours.

  2. Publish Fewer Replicas

  The evaluation has shown that the probability of a directory node to
  serve a previously stored descriptor is 85.7% (more precisely, this is
  the 0.001-quantile of the empirical distribution with the rationale that
  it holds for 99.9% of all empirical cases). If descriptors are replicated
  to x directory nodes, the probability of at least one of the replicas to
  be available for clients is 1 - (1 - 85.7%) ^ x. In order to achieve an
  overall availability of 99.9%, x = 3.55 replicas need to be stored. From
  this follows that 4 replicas are sufficient, rather than the currently
  stored 6 replicas.

  Further, the current design stores 2 sets of descriptors on 3 directory
  nodes with consecutive identities. Originally, this was meant to
  facilitate replication between directory nodes, which has not been and
  will not be implemented (the selection criterion of 24 hours uptime does
  not make it necessary). As a result, storing descriptors on directory
  nodes with consecutive identities is not required. In fact it should be
  avoided to enable an attacker to create "black holes" in the identifier
  ring.

  Hidden services should store their descriptors on 4 non-consecutive
  directory nodes, and clients should request descriptors from these
  directory nodes only. For compatibility reasons, hidden services also
  store their descriptors on 2 consecutive directory nodes. Hence, 0.2.0.x
  clients will be able to retrieve 4 out of 6 descriptors, but will fail
  for the remaining 2 descriptors, which is sufficient for reliability. As
  soon as 0.2.0.x is deprecated, hidden services can stop publishing the
  additional 2 replicas.

  3. Change Default Value of Being Hidden Service Directory

  The requirements for becoming a hidden service directory node are an open
  directory port and an uptime of at least 24 hours. The evaluation has
  shown that there are 300 hidden service directory candidates in the mean,
  but only 6 of them are configured to act as hidden service directories.
  This is bad, because those 6 nodes need to serve a large share of all
  hidden service descriptors. Optimally, there should be hundreds of hidden
  service directories. Having a large number of 0.2.1.x directory nodes
  also has a positive effect on 0.2.0.x hidden services and clients.

  Therefore, the new default of HidServDirectoryV2 should be 1, so that a
  Tor relay that has an open directory port automatically accepts and
  serves v2 hidden service descriptors. A relay operator can still opt-out
  running a hidden service directory by changing HidServDirectoryV2 to 0.
  The additional bandwidth requirements for running a hidden service
  directory node in addition to being a directory cache are negligible.

  4. Make Descriptors Persistent on Directory Nodes

  Hidden service directories that are restarted by their operators or after
  a failure will not be selected as hidden service directories within the
  next 24 hours. However, some clients might still think that these nodes
  are responsible for certain descriptors, because they work on the basis
  of network consensuses that are up to three hours old. The directory
  nodes should be able to serve the previously received descriptors to
  these clients. Therefore, directory nodes make all received descriptors
  persistent and load previously received descriptors on startup.

  5. Store and Serve Descriptors Regardless of Responsibility

  Currently, directory nodes only accept descriptors for which they think
  they are responsible. This may lead to problems when a directory node
  uses an older or newer network consensus than hidden service or client
  or when a directory node has been restarted recently. In fact, there are
  no security issues in storing or serving descriptors for which a
  directory node thinks it is not responsible. To the contrary, doing so
  may improve reliability in border cases. As a result, a directory node
  does not pay attention to responsibilty when receiving a publication or
  fetch request, but stores or serves the requested descriptor. Likewise,
  the directory node does not remove descriptors when it thinks it is not
  responsible for them any more.

  6. Avoid Periodic Descriptor Re-Publication

  In the current implementation a hidden service re-publishes its
  descriptor either when its content changes or an hour elapses. However,
  the evaluation has shown that failures of hidden service directory nodes,
  i.e. of nodes that have not failed within the last 24 hours, are very
  rare. Together with making descriptors persistent on directory nodes,
  there is no necessity to re-publish descriptors hourly.

  The only two events leading to descriptor re-publication should be a
  change of the descriptor content and a new directory node becoming
  responsible for the descriptor. Hidden services should therefore consider
  re-publication every time they learn about a new network consensus
  instead of hourly.

  7. Discard Expired Descriptors

  The current implementation lets directory nodes keep a descriptor for two
  days before discarding it. However, with the v2 design, descriptors are
  only valid for at most one day. Directory nodes should determine the
  validity of stored descriptors and discard them one hour after they have
  expired (to compensate wrong clocks on clients).

  8. Shorten Client-Side Descriptor Fetch History

  When clients try to download a hidden service descriptor, they memorize
  fetch requests to directory nodes for up to 15 minutes. This allows them
  to request all replicas of a descriptor to avoid bad or failing directory
  nodes, but without querying the same directory node twice.

  The downside is that a client that has requested a descriptor without
  success, will not be able to find a hidden service that has been started
  during the following 15 minutes after the client's last request.

  This can be improved by shortening the fetch history to only 5 minutes.
  This time should be sufficient to complete requests for all replicas of a
  descriptor, but without ending in an infinite request loop.

Compatibility:

  All proposed improvements are compatible to the currently implemented
  design as described in proposal 114.

Filename: 144-enforce-distinct-providers.txt
Title: Increase the diversity of circuits by detecting nodes belonging the
   same provider
Author: Mfr
Created: 2008-06-15
Status: Obsolete

Overview:

  Increase network security by reducing the capacity of the relay or
  ISPs monitoring personally or requisition, a large part of traffic
  Tor trying to break circuits privacy.  A way to increase the
  diversity of circuits without killing the network performance.

Motivation:

  Since 2004, Roger an Nick publication about diversity [1], very fast
  relays Tor running are focused among an half dozen of providers,
  controlling traffic of some dozens of routers [2].

  In the same way the generalization of VMs clonables paid by hour,
  allowing starting in few minutes and for a small cost, a set of very
  high-speed relay whose in a few hours can attract a big traffic that
  can be analyzed, increasing the vulnerability of the network.

  Whether ISPs or domU providers, these usually have several groups of
  IP Class B.  Also the restriction in place EnforceDistinctSubnets
  automatically excluding IP subnet class B is only partially
  effective. By contrast a restriction at the class A will be too
  restrictive.

 Therefore it seems necessary to consider another approach.

Proposal:

  Add a provider control based on AS number added by the router on is
  descriptor, controlled by Directories Authorities, and used like the
  declarative family field for circuit creating.

Design:

Step 1 :

 Add to the router descriptor a provider information get request [4]
  by the router itself.

         "provider" name NL

            'names' is the AS number of the router formated like this:
            'ASxxxxxx' where AS is fixed and xxxxxx is the AS number,
            left aligned ( ex: AS98304 , AS4096,AS1 ) or if AS number
            is missing the network A class number is used like that:
            'ANxxx' where AN is fixed and xxx is the first 3 digits of
            the IP (ex: for the IP 1.1.1.2 AN1) or an 'L' value is set
            if it's a local network IP.

            If two ORs list one another in their "provider" entries,
            then OPs should treat them as a single OR for the purpose
            of path selection.

            For example, if node A's descriptor contains "provider B",
            and node B's descriptor contains "provider A", then node A
            and node B should never be used on the same circuit.

    Add the regarding config option in torrc

            EnforceDistinctProviders set to 1 by default.
            Permit building circuits with relays in the same provider
            if set to 0.
            Regarding to proposal 135 if TestingTorNetwork is set
            need to be EnforceDistinctProviders is unset.

    Control by Authorities Directories of the AS numbers

         The Directories Authority control the AS numbers of the new node
         descriptor uploaded.

            If an old version is operated by the node this test is
            bypassed.

            If AS number get by request is different from the
            description, router is flagged as non-Valid by the testing
            Authority for the voting process.

Step 2     When a ' significant number of nodes' of valid routers are
generating descriptor with provider information.

        Add missing provider information get by DNS request
functionality for the circuit user:

                During circuit building, computing, OP apply first
                family check and EnforceDistinctSubnets directives for
                performance, then if provider info is needed and
                missing in router descriptor try to get AS provider
                info by DNS request [4].  This information could be
                DNS cached.  AN ( class A number) is never generated
                during this process to prevent DNS block problems.  If
                DNS request fails ignore and continue building
                circuit.

Step 3 When the 'whole majority' of valid Tor clients are providing
DNS request.

        Older versions are deprecated and mark as no-Valid.

  EnforceDistinctProviders replace EnforceDistinctSubnets functionnality.

        EnforceDistinctSubnets is removed.

        Functionalities deployed in step 2 are removed.

Security implications:

      This providermeasure will increase the number of providers
      addresses that an attacker must use in order to carry out
      traffic analysis.

Compatibility:

        The presented protocol does not raise compatibility issues
        with current Tor versions. The compatibility is preserved by
        implementing this functionality in 3 steps, giving time to
        network users to upgrade clients and routers.

Performance and scalability notes:

        Provider change for all routers could reduce a little
        performance if the circuit to long.

        During step 2 Get missing provider information could increase
        building path time and should have a time out.

Possible Attacks/Open Issues/Some thinking required:

        These proposal seems be compatible with proposal 135 Simplify
        Configuration of Private Tor Networks.

        This proposal does not resolve multiples AS owners and top
        providers traffic monitoring attacks [5].

        Unresolved AS number are treated as a Class A network. Perhaps
        should be marked as invalid.  But there's only fives items on
        last check see [2].

        Need to define what's a 'significant number of nodes' and
        'whole majority' ;-)

References:
[1] Location Diversity in Anonymity Networks by Nick Feamster and Roger
Dingledine.
In the Proceedings of the Workshop on Privacy in the Electronic Society
(WPES 2004), Washington, DC, USA, October 2004
http://freehaven.net/anonbib/#feamster:wpes2004
[2] http://as4jtw5gc6efb267.onion/IPListbyAS.txt
[3] see Goodell Tor Exit Page
http://cassandra.eecs.harvard.edu/cgi-bin/exit.py
[4] see the great IP to ASN DNS Tool
http://www.team-cymru.org/Services/ip-to-asn.html
[5] Sampled Traffic Analysis by Internet-Exchange-Level Adversaries by
Steven J. Murdoch and Piotr Zielinski.
In the Proceedings of the Seventh Workshop on Privacy Enhancing Technologies

(PET 2007), Ottawa, Canada, June 2007.
http://freehaven.net/anonbib/#murdoch-pet2007
[5] http://bugs.noreply.org/flyspray/index.php?do=details&id=690
Filename: 145-newguard-flag.txt
Title: Separate "suitable as a guard" from "suitable as a new guard"
Author: Nick Mathewson
Created: 1-Jul-2008
Status: Superseded

[This could be obsoleted by proposal 141, which could replace NewGuard
with a Guard weight.]

[This _is_ superseded by 236, which adds guard weights for real.]

Overview

   Right now, Tor has one flag that clients use both to tell which
   nodes should be kept as guards, and which nodes should be picked
   when choosing new guards.  This proposal separates this flag into
   two.

Motivation

   Balancing clients amoung guards is not done well by our current
   algorithm.  When a new guard appears, it is chosen by clients
   looking for a new guard with the same probability as all existing
   guards... but new guards are likelier to be under capacity, whereas
   old guards are likelier to be under more use.

Implementation

   We add a new flag, NewGuard.  Clients will change so that when they
   are choosing new guards, they only consider nodes with the NewGuard
   flag set.

   For now, authorities will always set NewGuard if they are setting
   the Guard flag.  Later, it will be easy to migrate authorities to
   set NewGuard for underused guards.

Alternatives

   We might instead have authorities list weights with which nodes
   should be picked as guards.
Filename: 146-long-term-stability.txt
Title: Add new flag to reflect long-term stability
Author: Nick Mathewson
Created: 19-Jun-2008
Status: Superseded
Superseded-by: 206

Status:

  The applications of this design are achieved by proposal 206 instead.
  Instead of having the authorities track long-term stability for nodes
  that might be useful as directories in a fallback consensus, we
  eliminated the idea of a fallback consensus, and just have a DirSource
  configuration option.  (Nov 2013)


Overview

  This document proposes a new flag to indicate that a router has
  existed at the same address for a long time, describes how to
  implement it, and explains what it's good for.

Motivation

  Tor has had three notions of "stability" for servers.  Older
  directory protocols based a server's stability on its
  (self-reported) uptime: a server that had been running for a day was
  more stable than a server that had been running for five minutes,
  regardless of their past history.  Current directory protocols track
  weighted mean time between failure (WMTBF) and weighted fractional
  uptime (WFU).  WFU is computed as the fraction of time for which the
  server is running, with measurements weighted to exponentially
  decay such that old days count less.  WMTBF is computed as the
  average length of intervals for which the server runs between
  downtime, with old intervals weighted to count less.

  WMTBF is useful in answering the question: "If a server is running
  now, how long is it likely to stay running?"  This makes it a good
  choice for picking servers for streams that need to be long-lived.
  WFU is useful in answering the question: "If I try connecting to
  this server at an arbitrary time, is it likely to be running?"  This
  makes it an important factor for picking guard nodes, since we want
  guard nodes to be usually-up.

  There are other questions that clients want to answer, however, for
  which the current flags aren't very useful.   The one that this
  proposal addresses is,

       "If I found this server in an old consensus, is it likely to
       still be running at the same address?"

  This one is useful when we're trying to find directory mirrors in a
  fallback-consensus file.  This property is equivalent to,

       "If I find this server in a current consensus, how long is it
       likely to exist on the network?"

  This one is useful if we're trying to pick introduction points or
  something and care more about churn rate than about whether every IP
  will be up all the time.

Implementation:

  I propose we add a new flag, called "Longterm."  Authorities should
  set this flag for routers if their Longevity is in the upper
  quartile of all routers.  A router's Longevity is computed as the
  total amount of days in the last year or so[*] for which the router has
  been Running at least once at its current IP:orport pair.

  Clients should use directory servers from a fallback-consensus only
  if they have the Longterm flag set.

  Authority ops should be able to mark particular routers as not
  Longterm, regardless of history.  (For instance, it makes sense to
  remove the Longterm flag from a router whose op says that it will
  need to shutdown in a month.)

  [*] This is deliberately vague, to permit efficient implementations.

Compatibility and migration issues:

  The voting protocol already acts gracefully when new flags are
  added, so no change to the voting protocol is needed.

  Tor won't have collected this data, however.  It might be desirable
  to bootstrap it from historical consensuses.  Alternatively, we can
  just let the algorithm run for a month or two.

Issues and future possibilities:

  Longterm is a really awkward name.


Filename: 147-prevoting-opinions.txt
Title: Eliminate the need for v2 directories in generating v3 directories
Author: Nick Mathewson
Created: 2-Jul-2008
Status: Rejected
Target: 0.2.4.x

Overview

  We propose a new v3 vote document type to replace the role of v2
  networkstatus information in generating v3 consensuses.

Motivation

  When authorities vote on which descriptors are to be listed in the
  next consensus, it helps if they all know about the same descriptors
  as one another.  But a hostile, confused, or out-of-date server may
  upload a descriptor to only some authorities.  In the current v3
  directory design, the authorities don't have a good way to tell one
  another about the new descriptor until they exchange votes... but by
  the time this happens, they are already committed to their votes,
  and they can't add anybody they learn about from other authorities
  until the next voting cycle.  That's no good!

  The current Tor implementation avoids this problem by having
  authorities also look at v2 networkstatus documents, but we'd like
  in the long term to eliminate these, once 0.1.2.x is obsolete.

Design:

  We add a new value for vote-status in v3 consensus documents in
  addition to "consensus" and "vote": "opinion".  Authorities generate
  and sign an opinion document as if they were generating a vote,
  except that they generate opinions earlier than they generate votes.

  [This proposal doesn't say what lines must be contained in opinion
   documents.  It seems that an authority that parses an opinion
   document is only interested in a) relay fingerprint, b) descriptor
   publication time, and c) descriptor digest; unless there's more
   information that helps authorities decide whether "they might
   accept" a descriptor.  If not, opinion documents only need to
   contain a small subset of headers and all the "r" lines that would
   be contained in a later vote. -KL]
  [This seems okay.  It would however mean that we can't use the same
   parsing logic as we use for regular votes. -NM]

  [Authorities should use the same "valid-after", "fresh-until",
   and "valid-until" lines in opinion documents as they are going to
   use in their next vote. -KL]
  [Maybe these lines should just get ignored on opinions.  Or
   omitted. -NM]

  Authorities don't need to generate more than one opinion document
  per voting interval, but may.  They should send it to the other
  authorities they know about, at
     http://<hostname>/tor/post/opinion ,
  before the authorities begin voting, so that enough time remains for
  the authorities to fetch new descriptors.

  Additionally, authories make their opinions available at
     http://<hostname>/tor/status-vote/next/opinion.z
  and download opinions from authorities they haven't heard from in a
  while.

  Authorities SHOULD send their opinion document to all other
  authorities OpinionSeconds seconds before voting and request
  missing opinion documents OpinionSeconds/2 seconds before voting.
  OpinionSeconds SHOULD be defined as part of "voting-delay" lines
  and otherwise default to the same number of seconds as VoteSeconds.

  Authorities MAY generate opinions on demand.

  Upon receiving an opinion document, authorities scan it for any
  descriptors that:
     - They might accept.
     - Are for routers they don't know about, or are published more
       recently than any descriptor they have for that router.
  Authorities then begin downloading such descriptors from authorities
  that claim to have them.

  Authorities also download corresponding extra-info descriptors for
  any router descriptor they learned from parsing an opinion document.

  Authorities MAY cache opinion documents, but don't need to.

Reasons for rejection:

  1. Authorities learn about new relays from each others' vote documents.

  See git commits 2e692bd8 and eaf5487d, which went into 0.2.2.12-alpha:
  o Major bugfixes:
    - Many relays have been falling out of the consensus lately because
      not enough authorities know about their descriptor for them to get
      a majority of votes. When we deprecated the v2 directory protocol,
      we got rid of the only way that v3 authorities can hear from each
      other about other descriptors. Now authorities examine every v3
      vote for new descriptors, and fetch them from that authority. Bugfix
      on 0.2.1.23.

  2. Authorities don't serve version 2 statuses anymore.

  Since January 2013, there was only a single version 3 directory
  authority left that served version 2 statuses: dizum.  moria1 and tor26
  have been rejecting version 2 requests for a long time, and it was
  mostly an oversight that dizum still served them.  As of January 2014,
  dizum does not serve version 2 statuses anymore.  The other six
  authorities have never generated version 2 statuses for others to be
  used as pre-voting opinions.

  3. Vote documents indicate that pre-voting opinions wouldn't help much.

  From January 1 to 7, 2014, only 0.4 relays on average were not included
  in a consensus because they were listed in less than 5 votes.  These 0.4
  relays could probably have been included with pre-voting opinions.

  (Here's how to find out: extract the votes-2014-01.tar.bz2 tarball, run
  `grep -R "^r " 0[1-7] | cut -c 4-22,112- | cut -d" " -f1,3 | sort | uniq
  -c | sort | grep " [1-4] " | wc -l`, result is 63, divide by 7*24
  published consensuses, obtain 0.375 as end result.)

Filename: 148-uniform-client-end-reason.txt
Title: Stream end reasons from the client side should be uniform
Author: Roger Dingledine
Created: 2-Jul-2008
Status: Closed
Implemented-In: 0.2.1.9-alpha

Overview

  When a stream closes before it's finished, the end relay cell that's
  sent includes an "end stream reason" to tell the other end why it
  closed. It's useful for the exit relay to send a reason to the client,
  so the client can choose a different circuit, inform the user, etc. But
  there's no reason to include it from the client to the exit relay,
  and in some cases it can even harm anonymity.

  We should pick a single reason for the client-to-exit-relay direction
  and always just send that.

Motivation

  Back when I first deployed the Tor network, it was useful to have
  the Tor relays learn why a stream closed, so I could debug both ends
  of the stream at once. Now that streams have worked for many years,
  there's no need to continue telling the exit relay whether the client
  gave up on a stream because of "timeout" or "misc" or what.

  Then in Tor 0.2.0.28-rc, I fixed this bug:
    - Fix a bug where, when we were choosing the 'end stream reason' to
      put in our relay end cell that we send to the exit relay, Tor
      clients on Windows were sometimes sending the wrong 'reason'. The
      anonymity problem is that exit relays may be able to guess whether
      the client is running Windows, thus helping partition the anonymity
      set. Down the road we should stop sending reasons to exit relays,
      or otherwise prevent future versions of this bug.

  It turned out that non-Windows clients were choosing their reason
  correctly, whereas Windows clients were potentially looking at errno
  wrong and so always choosing 'misc'.

  I fixed that particular bug, but I think we should prevent future
  versions of the bug too.

  (We already fixed it so *circuit* end reasons don't get sent from
  the client to the exit relay. But we appear to be have skipped over
  stream end reasons thus far.)

Design:

  One option would be to no longer include any 'reason' field in end
  relay cells. But that would introduce a partitioning attack ("users
  running the old version" vs "users running the new version").

  Instead I suggest that clients all switch to sending the "misc" reason,
  like most of the Windows clients currently do and like the non-Windows
  clients already do sometimes.

Filename: 149-using-netinfo-data.txt
Title: Using data from NETINFO cells
Author: Nick Mathewson
Created: 2-Jul-2008
Status: Superseded
Target: 0.2.1.x

[Partially done: we do the anti-MITM part.  Not entirely done: we don't do
the time part.]

Overview

   Current Tor versions send signed IP and timestamp information in
   NETINFO cells, but don't use them to their fullest.  This proposal
   describes how they should start using this info in 0.2.1.x.

Motivation

   Our directory system relies on clients and routers having
   reasonably accurate clocks to detect replayed directory info, and
   to set accurate timestamps on directory info they publish
   themselves.  NETINFO cells contain timestamps.

   Also, the directory system relies on routers having a reasonable
   idea of their own IP addresses, so they can publish correct
   descriptors.  This is also in NETINFO cells.

Learning the time and IP address

   We need to think about attackers here.  Just because a router tells
   us that we have a given IP or a given clock skew doesn't mean that
   it's true.  We believe this information only if we've heard it from
   a majority of the routers we've connected to recently, including at
   least 3 routers.  Routers only believe this information if the
   majority includes at least one authority.

Avoiding MITM attacks

   Current Tors use the IP addresses published in the other router's
   NETINFO cells to see whether the connection is "canonical".  Right
   now, we prefer to extend circuits over "canonical" connections.  In
   0.2.1.x, we should refuse to extend circuits over non-canonical
   connections without first trying to build a canonical one.


Filename: 150-exclude-exit-nodes.txt
Title: Exclude Exit Nodes from a circuit
Author: Mfr
Created: 2008-06-15
Status: Closed
Implemented-In: 0.2.1.3-alpha

Overview

   Right now, Tor users can manually exclude a node from all positions
   in their circuits created using the directive ExcludeNodes.
   This proposal makes this exclusion less restrictive, allowing users to
   exclude a node only from the exit part of a circuit.

Motivation

   This feature would Help the integration into vidalia (tor exit
   branch) or other tools, of features to exclude a country for exit
   without reducing circuits possibilities, and privacy.  This feature
   could help people from a country were many sites are blocked to
   exclude this country for browsing, giving them a more stable
   navigation.  It could also add the possibility for the user to
   exclude a currently used exit node.

Implementation

   ExcludeExitNodes is similar to ExcludeNodes except it's only
   the exit node which is excluded for circuit build.

   Tor doesn't warn if node from this list is not an exit node.

Security implications:

   Open also possibilities for a future user bad exit reporting

Risks:

   Use of this option can make users partitionable under certain attack
   assumptions.  However, ExitNodes already creates this possibility,
   so there isn't much increased risk in ExcludeExitNodes.

   We should still encourage people who exclude an exit node because
   of bad behavior to report it instead of just adding it to their
   ExcludeExit list.  It would be unfortunate if we didn't find out
   about broken exits because of this option.  This issue can probably
   be addressed sufficiently with documentation.

Filename: 151-path-selection-improvements.txt
Title: Improving Tor Path Selection
Author: Fallon Chen, Mike Perry
Created: 5-Jul-2008
Status: Closed
In-Spec: path-spec.txt
Implemented-In: 0.2.2.2-alpha

Overview

  The performance of paths selected can be improved by adjusting the
  CircuitBuildTimeout and avoiding failing guard nodes. This proposal
  describes a method of tracking buildtime statistics at the client, and
  using those statistics to adjust the CircuitBuildTimeout.

Motivation

  Tor's performance can be improved by excluding those circuits that
  have long buildtimes (and by extension, high latency). For those Tor
  users who require better performance and have lower requirements for
  anonymity, this would be a very useful option to have.

Implementation

  Gathering Build Times

    Circuit build times are stored in the circular array
    'circuit_build_times' consisting of uint32_t elements as milliseconds.
    The total size of this array is based on the number of circuits
    it takes to converge on a good fit of the long term distribution of
    the circuit builds for a fixed link. We do not want this value to be
    too large, because it will make it difficult for clients to adapt to
    moving between different links.

    From our observations, the minimum value for a reasonable fit appears
    to be on the order of 500 (MIN_CIRCUITS_TO_OBSERVE). However, to keep
    a good fit over the long term, we store 5000 most recent circuits in
    the array (NCIRCUITS_TO_OBSERVE).

    The Tor client will build test circuits at a rate of one per
    minute (BUILD_TIMES_TEST_FREQUENCY) up to the point of
    MIN_CIRCUITS_TO_OBSERVE. This allows a fresh Tor to have
    a CircuitBuildTimeout estimated within 8 hours after install,
    upgrade, or network change (see below).

  Long Term Storage

    The long-term storage representation is implemented by storing a
    histogram with BUILDTIME_BIN_WIDTH millisecond buckets (default 50) when
    writing out the statistics to disk. The format this takes in the
    state file is 'CircuitBuildTime <bin-ms> <count>', with the total
    specified as 'TotalBuildTimes <total>'
    Example:

    TotalBuildTimes 100
    CircuitBuildTimeBin 25 50
    CircuitBuildTimeBin 75 25
    CircuitBuildTimeBin 125 13
    ...

    Reading the histogram in will entail inserting <count> values
    into the circuit_build_times array each with the value of
    <bin-ms> milliseconds. In order to evenly distribute the values
    in the circular array, the Fisher-Yates shuffle will be performed
    after reading values from the bins.

  Learning the CircuitBuildTimeout

    Based on studies of build times, we found that the distribution of
    circuit buildtimes appears to be a Frechet distribution. However,
    estimators and quantile functions of the Frechet distribution are
    difficult to work with and slow to converge. So instead, since we
    are only interested in the accuracy of the tail, we approximate
    the tail of the distribution with a Pareto curve starting at
    the mode of the circuit build time sample set.

    We will calculate the parameters for a Pareto distribution
    fitting the data using the estimators at
    http://en.wikipedia.org/wiki/Pareto_distribution#Parameter_estimation.

    The timeout itself is calculated by using the Quartile function (the
    inverted CDF) to give us the value on the CDF such that
    BUILDTIME_PERCENT_CUTOFF (80%) of the mass of the distribution is
    below the timeout value.

    Thus, we expect that the Tor client will accept the fastest 80% of
    the total number of paths on the network.

  Detecting Changing Network Conditions

    We attempt to detect both network connectivity loss and drastic
    changes in the timeout characteristics.

    We assume that we've had network connectivity loss if 3 circuits
    timeout and we've received no cells or TLS handshakes since those
    circuits began. We then set the timeout to 60 seconds and stop
    counting timeouts.

    If 3 more circuits timeout and the network still has not been
    live within this new 60 second timeout window, we then discard
    the previous timeouts during this period from our history.

    To detect changing network conditions, we keep a history of
    the timeout or non-timeout status of the past RECENT_CIRCUITS (20)
    that successfully completed at least one hop. If more than 75%
    of these circuits timeout, we discard all buildtimes history,
    reset the timeout to 60, and then begin recomputing the timeout.

  Testing

    After circuit build times, storage, and learning are implemented,
    the resulting histogram should be checked for consistency by
    verifying it persists across successive Tor invocations where
    no circuits are built. In addition, we can also use the existing
    buildtime scripts to record build times, and verify that the histogram
    the python produces matches that which is output to the state file in Tor,
    and verify that the Pareto parameters and cutoff points also match.

    We will also verify that there are no unexpected large deviations from
    node selection, such as nodes from distant geographical locations being
    completely excluded.

  Dealing with Timeouts

    Timeouts should be counted as the expectation of the region of
    of the Pareto distribution beyond the cutoff. This is done by
    generating a random sample for each timeout at points on the
    curve beyond the current timeout cutoff.

  Future Work

    At some point, it may be desirable to change the cutoff from a
    single hard cutoff that destroys the circuit to a soft cutoff and
    a hard cutoff, where the soft cutoff merely triggers the building
    of a new circuit, and the hard cutoff triggers destruction of the
    circuit.

    It may also be beneficial to learn separate timeouts for each
    guard node, as they will have slightly different distributions.
    This will take longer to generate initial values though.

Issues

  Impact on anonymity

    Since this follows a Pareto distribution, large reductions on the
    timeout can be achieved without cutting off a great number of the
    total paths. This will eliminate a great deal of the performance
    variation of Tor usage.
Filename: 152-single-hop-circuits.txt
Title: Optionally allow exit from single-hop circuits 
Author: Geoff Goodell
Created: 13-Jul-2008
Status: Closed
Implemented-In: 0.2.1.6-alpha

Overview

    Provide a special configuration option that adds a line to descriptors
    indicating that a router can be used as an exit for one-hop circuits,
    and allow clients to attach streams to one-hop circuits provided
    that the descriptor for the router in the circuit includes this
    configuration option.

Motivation

    At some point, code was added to restrict the attachment of streams
    to one-hop circuits.

    The idea seems to be that we can use the cost of forking and
    maintaining a patch as a lever to prevent people from writing
    controllers that jeopardize the operational security of routers
    and the anonymity properties of the Tor network by creating and
    using one-hop circuits rather than the standard three-hop circuits.
    It may be, for example, that some users do not actually seek true
    anonymity but simply reachability through network perspectives
    afforded by the Tor network, and since anonymity is stronger in
    numbers, forcing users to contribute to anonymity and decrease the
    risk to server operators by using full-length paths may be reasonable.

    As presently implemented, the sweeping restriction of one-hop circuits
    for all routers limits the usefulness of Tor as a general-purpose
    technology for building circuits.  In particular, we should allow
    for controllers, such as Blossom, that create and use single-hop
    circuits involving routers that are not part of the Tor network.

Design

    Introduce a configuration option for Tor servers that, when set,
    indicates that a router is willing to provide exit from one-hop
    circuits.  Routers with this policy will not require that a circuit
    has at least two hops when it is used as an exit.

    In addition, routers for which this configuration option
    has been set will have a line in their descriptors, "opt
    exit-from-single-hop-circuits".  Clients will keep track of which
    routers have this option and allow streams to be attached to
    single-hop circuits that include such routers.

Security Considerations

    This approach seems to eliminate the worry about operational router
    security, since server operators will not set the configuraiton
    option unless they are willing to take on such risk.

    To reduce the impact on anonymity of the network resulting
    from including such "risky" routers in regular Tor path
    selection, clients may systematically exclude routers with "opt
    exit-from-single-hop-circuits" when choosing random paths through
    the Tor network.

Filename: 153-automatic-software-update-protocol.txt
Title: Automatic software update protocol
Author: Jacob Appelbaum 
Created: 14-July-2008
Status: Superseded

[Superseded by thandy-spec.txt]


                      Automatic Software Update Protocol Proposal

0.0 Introduction

The Tor project and its users require a robust method to update shipped
software bundles. The software bundles often includes Vidalia, Privoxy, Polipo,
Torbutton and of course Tor itself. It is not inconcievable that an update
could include all of the Tor Browser Bundle. It seems reasonable to make this 
a standalone program that can be called in shell scripts, cronjobs or by
various Tor controllers.

0.1 Minimal Tasks To Implement Automatic Updating

At the most minimal, an update must be able to do the following: 

    0 - Detect the curent Tor version, note the working status of Tor.
    1 - Detect the latest Tor version. 
    2 - Fetch the latest version in the form of a platform specific package(s).
    3 - Verify the itegrity of the downloaded package(s).
    4 - Install the verified package(s).
    5 - Test that the new package(s) works properly.

0.2 Specific Enumeration Of Minimal Tasks

To implement requirement 0, we need to detect the current Tor version of both 
the updater and the current running Tor. The update program itself should be 
versioned internally. This requirement should also test connecting through Tor 
itself and note if such connections are possible.

To implement requirement 1, we need to learn the concensus from the directory 
authorities or fail back to a known good URL with cryptographically signed 
content.

To implement requirement 2, we need to download Tor - hopefully over Tor.

To implement requirement 3, we need to verify the package signature.

To implement requirement 4, we need to use a platform specific method of 
installation. The Tor controller performing the update perform these platform 
specific methods.

To implement requirement 5, we need to be able to extend circuits and reach 
the internet through Tor.

0.x Implementation Goals

The update system will be cross platform and rely on as little external code 
as possible. If the update system uses it, it must be updated by the update 
system itself. It will consist only of free software and will not rely on any 
non-free components until the actual installation phase. If a package manager 
is in use, it will be platform specific and thus only invoked by the update 
system implementing the update protocol.

The update system itself will attempt to perform update related network 
activity over Tor. Possibly it will attempt to use a hidden service first.
It will attempt to use novel and not so novel caching 
when possible, it will always verify cryptographic signatures before any 
remotely fetched code is executed. In the event of an unusable Tor system, 
it will be able to attempt to fetch updates without Tor. This should be user 
configurable, some users will be unwilling to update without the protection of 
using Tor - others will simply be unable because of blocking of the main Tor 
website.

The update system will track current version numbers of Tor and supporting 
software. The update system will also track known working versions to assist 
with automatic The update system itself will be a standalone library. It will be 
strongly versioned internally to match the Tor bundle it was shiped with. The 
update system will keep track of the given platform, cpu architecture, lsb_release, 
package management functionality and any other platform specific metadata.

We have referenced two popular automatic update systems, though neither fit 
our needs, both are useful as an idea of what others are doing in the same 
area.

The first is sparkle[0] but it is sadly only available for Cocoa 
environments and is written in Objective C. This doesn't meet our requirements 
because it is directly tied into the private Apple framework.

The second is the Mozilla Automatic Update System[1]. It is possibly useful 
as an idea of how other free software projects automatically update. It is 
however not useful in its currently documented form.


    [0] http://sparkle.andymatuschak.org/documentation/
    [1] http://wiki.mozilla.org/AUS:Manual

0.x Previous methods of Tor and related software update

Previously, Tor users updated their Tor related software by hand. There has
been no fully automatic method for any user to update. In addition, there
hasn't been any specific way to find out the most current stable version of Tor
or related software as voted on by the directory authority concensus.

0.x Changes to the directory specification

We will want to supplement client-versions and server-versions in the 
concensus voting with another version identifier known as 
'auto-update-versions'. This will keep track of the current concensus of 
specific versions that are best per platform and per architecture. It should 
be noted that while the Mac OS X universal binary may be the best for x86 
processers with Tiger, it may not be the best for PPC users on Panther. This 
goes for all of the package updates. We want to prevent updates that cause Tor 
to break even if the updating program can recover gracefully.

x.x Assumptions About Operating System Package Management

It is assumed that users will use their package manager unless they are on 
Microsoft Windows (any version) or Mac OS X (any version). Microsoft Windows 
users will have integration with the normal "add/remove program" functionality 
that said users would expect.

x.x Package Update System Failure Modes

The package update will try to ensure that a user always has a working Tor at 
the very least. It will keep state to remember versions of Tor that were able 
to bootstrap properly and reach the rest of the Tor network. It will also keep 
note of which versions broke. It will select the best Tor that works for the 
user. It will also allow for anonymized bug reporting on the packages 
available and tested by the auto-update system.

x.x Package Signature Verification

The update system will be aware of replay attacks against the update signature 
system itself. It will not allow package update signatures that are radically 
out of date. It will be a multi-key system to prevent any single party from 
forging an update. The key will be updated regularly. This is like authority 
key (see proposal 103) usage.

x.x Package Caching

The update system will iterate over different update methods. Whichever method 
is picked will have caching functionality. Each Tor server itself should be 
able to serve cached update files. This will be an option that friendly server 
administrators can turn on should they wish to support caching. In addition, 
it is possible to cache the full contents of a package in an 
authoratative DNS zone. Users can then query the DNS zone for their package. 
If we wish to further distribute the update load, we can also offer packages 
with encrypted bittorrent. Clients who wish to share the updates but do not 
wish to be a server can help distribute Tor updates. This can be tied together 
with the DNS caching[2][3] if needed.

    [2] http://www.netrogenic.com/dnstorrent/
    [3] http://www.doxpara.com/ozymandns_src_0.1.tgz

x.x Helping Our Users Spread Tor

There should be a way for a user to participate in the packaging caching as 
described in section x.x. This option should be presented by the Tor 
controller.

x.x Simple HTTP Proxy To The Tor Project Website

It has been suggested that we should provide a simple proxy that allows a user 
to visit the main Tor website to download packages. This was part of a 
previous proposal and has not been closely examined.

x.x Package Installation

Platform specific methods for proper package installation will be left to the 
controller that is calling for an update. Each platform is different, the 
installation options and user interface will be specific to the controller in 
question.

x.x Other Things

Other things should be added to this proposal. What are they?
Filename: 154-automatic-updates.txt
Title: Automatic Software Update Protocol
Author: Matt Edman
Created: 30-July-2008
Status: Superseded
Target: 0.2.1.x

Superseded by thandy-spec.txt

Scope

  This proposal specifies the method by which an automatic update client can
  determine the most recent recommended Tor installation package for the
  user's platform, download the package, and then verify that the package was
  downloaded successfully. While this proposal focuses on only the Tor
  software, the protocol defined is sufficiently extensible such that other
  components of the Tor bundles, like Vidalia, Polipo, and Torbutton, can be
  managed and updated by the automatic update client as well.

  The initial target platform for the automatic update framework is Windows,
  given that's the platform used by a majority of our users and that it lacks
  a sane package management system that many Linux distributions already have.
  Our second target platform will be Mac OS X, and so the protocol will be
  designed with this near-future direction in mind.

  Other client-side aspects of the automatic update process, such as user
  interaction, the interface presented, and actual package installation
  procedure, are outside the scope of this proposal.


Motivation

  Tor releases new versions frequently, often with important security,
  anonymity, and stability fixes. Thus, it is important for users to be able
  to promptly recognize when new versions are available and to easily
  download, authenticate, and install updated Tor and Tor-related software
  packages.

  Tor's control protocol [2] provides a method by which controllers can
  identify when the user's Tor software is obsolete or otherwise no longer
  recommended. Currently, however, no mechanism exists for clients to
  automatically download and install updated Tor and Tor-related software for
  the user.


Design Overview

  The core of the automatic update framework is a well-defined file called a
  "recommended-packages" file. The recommended-packages file is accessible via
  HTTP[S] at one or more well-defined URLs. An example recommended-packages
  URL may be:

    https://updates.torproject.org/recommended-packages

  The recommended-packages document is formatted according to Section 1.2
  below and specifies the most recent recommended installation package
  versions for Tor or Tor-related software, as well as URLs at which the
  packages and their signatures can be downloaded.

  An automatic update client process runs on the Tor user's computer and
  periodically retrieves the recommended-packages file according to the method
  described in Section 2.0. As described further in Section 1.2, the
  recommended-packages file is signed and can be verified by the automatic
  update client with one or more public keys included in the client software.
  Since it is signed, the recommended-packages file can be mirrored by
  multiple hosts (e.g., Tor directory authorities), whose URLs are included in
  the automatic update client's configuration.

  After retrieving and verifying the recommended-packages file, the automatic
  update client compares the versions of the recommended software packages
  listed in the file with those currently installed on the end-user's
  computer. If one or more of the installed packages is determined to be out
  of date, an updated package and its signature will be downloaded from one of
  the package URLs listed in the recommended-packages file as described in
  Section 2.2.

  The automatic update system uses a multilevel signing key scheme for package
  signatures. There are a small number of entities we call "packaging
  authorities" that each have their own signing key. A packaging authority is
  responsible for signing and publishing the recommended-packages file.
  Additionally, each individual packager responsible for producing an
  installation package for one or more platforms has their own signing key.
  Every packager's signing key must be signed by at least one of the packaging
  authority keys.


Specification

  1. recommended-packages Specification

  In this section we formally specify the format of the published
  recommended-packages file.

  1.1. Document Meta-format

  The recommended-packages document follows the lightweight extensible
  information format defined in Tor's directory protocol specification [1]. In
  the interest of self-containment, we have reproduced the relevant portions
  of that format's specification in this Section. (Credits to Nick Mathewson
  for much of the original format definition language.)

  The highest level object is a Document, which consists of one or more
  Items.  Every Item begins with a KeywordLine, followed by zero or more
  Objects. A KeywordLine begins with a Keyword, optionally followed by
  whitespace and more non-newline characters, and ends with a newline.  A
  Keyword is a sequence of one or more characters in the set [A-Za-z0-9-].
  An Object is a block of encoded data in pseudo-Open-PGP-style
  armor. (cf. RFC 2440)

  More formally:

    Document     ::= (Item | NL)+
    Item         ::= KeywordLine Object*
    KeywordLine  ::= Keyword NL | Keyword WS ArgumentChar+ NL
    Keyword      ::= KeywordChar+
    KeywordChar  ::= 'A' ... 'Z' | 'a' ... 'z' | '0' ... '9' | '-'
    ArgumentChar ::= any printing ASCII character except NL.
    WS           ::= (SP | TAB)+
    Object       ::= BeginLine Base-64-encoded-data EndLine
    BeginLine    ::= "-----BEGIN " Keyword "-----" NL
    EndLine      ::= "-----END " Keyword "-----" NL

    The BeginLine and EndLine of an Object must use the same keyword.

  In our Document description below, we also tag Items with a multiplicity in
  brackets. Possible tags are:

    "At start, exactly once": These items MUST occur in every instance of the
    document type, and MUST appear exactly once, and MUST be the first item in
    their documents.

    "Exactly once": These items MUST occur exactly one time in every
    instance of the document type.

    "Once or more": These items MUST occur at least once in any instance
    of the document type, and MAY occur more than once.

    "At end, exactly once": These items MUST occur in every instance of
    the document type, and MUST appear exactly once, and MUST be the
    last item in their documents.

  1.2. recommended-packages Document Format

  When interpreting a recommended-packages Document, software MUST ignore
  any KeywordLine that starts with a keyword it doesn't recognize; future
  implementations MUST NOT require current automatic update clients to
  understand any KeywordLine not currently described.

  In lines that take multiple arguments, extra arguments SHOULD be
  accepted and ignored.

  The currently defined Items contained in a recommended-packages document
  are:

    "recommended-packages-format" SP number NL

      [Exactly once]

      This Item specifies the version of the recommended-packages format that
      is contained in the subsequent document. The version defined in this
      proposal is version "1". Subsequent iterations of this protocol MUST
      increment this value if they introduce incompatible changes to the
      document format and MAY increment this value if they only introduce
      additional Keywords.

    "published" SP YYYY-MM-DD SP HH:MM:SS NL

      [Exactly once]

      The time, in GMT, when this recommended-packages document was generated.
      Automatic update clients SHOULD ignore Documents over 60 days old.

    "tor-stable-win32-version" SP TorVersion NL

      [Exactly once]

      This keyword specifies the latest recommended release of Tor's "stable"
      branch for the Windows platform that has an installation package
      available. Note that this version does not necessarily correspond to the
      most recently tagged stable Tor version, since that version may not yet
      have an installer package available, or may have known issues on
      Windows.

      The TorVersion field is formatted according to Section 2 of Tor's
      version specification [3].

    "tor-stable-win32-package" SP Url NL

      [Once or more]

      This Item specifies the location from which the most recent
      recommended Windows installation package for Tor's stable branch can be
      downloaded.

      When this Item appears multiple times within the Document, automatic
      update clients SHOULD select randomly from the available package
      mirrors.

    "tor-dev-win32-version" SP TorVersion NL

      [Exactly once]

      This Item specifies the latest recommended release of Tor's
      "development" branch for the Windows platform that has an installation
      package available. The same caveats from the description of
      "tor-stable-win32-version" also apply to this keyword.

      The TorVersion field is formatted according to Section 2 of Tor's
      version specification [3].

    "tor-dev-win32-package" SP Url NL

      [Once or more]

      This Item specifies the location from which the most recent recommended
      Windows installation package and its signature for Tor's development
      branch can be downloaded.

      When this Keyword appears multiple times within the Document, automatic
      update clients SHOULD select randomly from the available package
      mirrors.

    "signature" NL SIGNATURE NL

      [At end, exactly once]

      The "SIGNATURE" Object contains a PGP signature (using a packaging
      authority signing key) of the entire document, taken from the beginning
      of the "recommended-packages-format" keyword, through the newline after
      the "signature" Keyword.


  2. Automatic Update Client Behavior

  The client-side component of the automatic update framework is an
  application that runs on the end-user's machine. It is responsible for
  fetching and verifying a recommended-packages document, as well as
  downloading, verifying, and subsequently installing any necessary updated
  software packages.

  2.1. Download and verify a recommended-packages document

  The first step in the automatic update process is for the client to download
  a copy of the recommended-packages file. The automatic update client
  contains a (hardcoded and/or user-configurable) list of URLs from which it
  will attempt to retrieve a recommended-packages file.

  Connections to each of the recommended-packages URLs SHOULD be attempted in
  the following order:

    1) HTTPS over Tor
    2) HTTP over Tor
    3) Direct HTTPS
    4) Direct HTTP

  If the client fails to retrieve a recommended-packages document via any of
  the above connection methods from any of the configured URLs, the client
  SHOULD retry its download attempts following an exponential back-off
  algorithm. After the first failed attempt, the client SHOULD delay one hour
  before attempting again, up to a maximum of 24 hours delay between retry
  attempts.

  After successfully downloading a recommended-packages file, the automatic
  update client will verify the signature using one of the public keys
  distributed with the client software. If more than one recommended-packages
  file is downloaded and verified, the file with the most recent "published"
  date that is verified will be retained and the rest discarded.

  2.2. Download and verify the updated packages

  The automatic update client next compares the latest recommended package
  version from the recommended-packages document with the currently installed
  Tor version. If the user currently has installed a Tor version from Tor's
  "development" branch, then the version specified in "tor-dev-*-version" Item
  is used for comparison. Similarly, if the user currently has installed a Tor
  version from Tor's "stable" branch, then the version specified in the
  "tor-stable-*version" Item is used for comparison. Version comparisons are
  done according to Tor's version specification [3].

  If the automatic update client determines an installation package newer than
  the user's currently installed version is available, it will attempt to
  download a package appropriate for the user's platform and Tor branch from a
  URL specified by a "tor-[branch]-[platform]-package" Item. If more than one
  mirror for the selected package is available, a mirror will be chosen at
  random from all those available.

  The automatic update client must also download a ".asc" signature file for
  the retrieved package. The URL for the package signature is the same as that
  for the package itself, except with the extension ".asc" appended to the
  package URL.

  Connections to download the updated package and its signature SHOULD be
  attempted in the same order described in Section 2.1.

  After completing the steps described in Sections 2.1 and 2.2, the automatic
  update client will have downloaded and verified a copy of the latest Tor
  installation package. It can then take whatever subsequent platform-specific
  steps are necessary to install the downloaded software updates.

  2.3. Periodic checking for updates

  The automatic update client SHOULD maintain a local state file in which it
  records (at a minimum) the timestamp at which it last retrieved a
  recommended-packages file and the timestamp at which the client last
  successfully downloaded and installed a software update.

  Automatic update clients SHOULD check for an updated recommended-packages
  document at most once per day but at least once every 30 days.


  3. Future Extensions

  There are several possible areas for future extensions of this framework.
  The extensions below are merely suggestions and should be the subject of
  their own proposal before being implemented.

  3.1. Additional Software Updates

  There are several software packages often included in Tor bundles besides
  Tor, such as Vidalia, Privoxy or Polipo, and Torbutton. The versions and
  download locations of updated installation packages for these bundle
  components can be easily added to the recommended-packages document
  specification above.

  3.2. Including ChangeLog Information

  It may be useful for automatic update clients to be able to display for
  users a summary of the changes made in the latest Tor or Tor-related
  software release, before the user chooses to install the update. In the
  future, we can add keywords to the specification in Section 1.2 that specify
  the location of a ChangeLog file for the latest recommended package
  versions. It may also be desirable to allow localized ChangeLog information,
  so that the automatic update client can fetch release notes in the
  end-user's preferred language.

  3.3. Weighted Package Mirror Selection

  We defined in Section 1.2 a method by which automatic update clients can
  select from multiple available package mirrors. We may want to add a Weight
  argument to the "*-package" Items that allows the recommended-packages file
  to suggest to clients the probability with which a package mirror should be
  chosen. This will allow clients to more appropriately distribute package
  downloads across available mirrors proportional to their approximate
  bandwidth.


Implementation

  Implementation of this proposal will consist of two separate components.

  The first component is a small "au-publish" tool that takes as input a
  configuration file specifying the information described in Section 1.2 and a
  private key. The tool is run by a "packaging authority" (someone responsible
  for publishing updated installation packages), who will be prompted to enter
  the passphrase for the private key used to sign the recommended-packages
  document. The output of the tool is a document formatted according to
  Section 1.2, with a signature appended at the end. The resulting document
  can then be published to any of the update mirrors.

  The second component is an "au-client" tool that is run on the end-user's
  machine. It periodically checks for updated installation packages according
  to Section 2 and fetches the packages if necessary. The public keys used
  to sign the recommended-packages file and any of the published packages are
  included in the "au-client" tool.


References

  [1] Tor directory protocol (version 3),
  https://tor-svn.freehaven.net/svn/tor/trunk/doc/spec/dir-spec.txt

  [2] Tor control protocol (version 2),
  https://tor-svn.freehaven.net/svn/tor/trunk/doc/spec/control-spec.txt

  [3] Tor version specification,
  https://tor-svn.freehaven.net/svn/tor/trunk/doc/spec/version-spec.txt

Filename: 155-four-hidden-service-improvements.txt
Title: Four Improvements of Hidden Service Performance
Author: Karsten Loesing, Christian Wilms
Created: 25-Sep-2008
Status: Closed
Implemented-In: 0.2.1.x

Change history:

  25-Sep-2008  Initial proposal for or-dev

Overview:

  A performance analysis of hidden services [1] has brought up a few
  possible design changes to reduce advertisement time of a hidden service
  in the network as well as connection establishment time. Some of these
  design changes have side-effects on anonymity or overall network load
  which had to be weighed up against individual performance gains. A
  discussion of seven possible design changes [2] has led to a selection
  of four changes [3] that are proposed to be implemented here.

Design:

  1. Shorter Circuit Extension Timeout

  When establishing a connection to a hidden service a client cannibalizes
  an existing circuit and extends it by one hop to one of the service's
  introduction points. In most cases this can be accomplished within a few
  seconds. Therefore, the current timeout of 60 seconds for extending a
  circuit is far too high.

  Assuming that the timeout would be reduced to a lower value, for example
  30 seconds, a second (or third) attempt to cannibalize and extend would
  be started earlier. With the current timeout of 60 seconds, 93.42% of all
  circuits can be established, whereas this fraction would have been only
  0.87% smaller at 92.55% with a timeout of 30 seconds.

  For a timeout of 30 seconds the performance gain would be approximately 2
  seconds in the mean as opposed to the current timeout of 60 seconds. At
  the same time a smaller timeout leads to discarding an increasing number
  of circuits that might have been completed within the current timeout of
  60 seconds.

  Measurements with simulated low-bandwidth connectivity have shown that
  there is no significant effect of client connectivity on circuit
  extension times. The reason for this might be that extension messages are
  small and thereby independent of the client bandwidth. Further, the
  connection between client and entry node only constitutes a single hop of
  a circuit, so that its influence on the whole circuit is limited.

  The exact value of the new timeout does not necessarily have to be 30
  seconds, but might also depend on the results of circuit build timeout
  measurements as described in proposal 151.

  2. Parallel Connections to Introduction Points

  An additional approach to accelerate extension of introduction circuits
  is to extend a second circuit in parallel to a different introduction
  point. Such parallel extension attempts should be started after a short
  delay of, e.g., 15 seconds in order to prevent unnecessary circuit
  extensions and thereby save network resources. Whichever circuit
  extension succeeds first is used for introduction, while the other
  attempt is aborted.

  An evaluation has been performed for the more resource-intensive approach
  of starting two parallel circuits immediately instead of waiting for a
  short delay. The result was a reduction of connection establishment times
  from 27.4 seconds in the original protocol to 22.5 seconds.

  While the effect of the proposed approach of delayed parallelization on
  mean connection establishment times is expected to be smaller,
  variability of connection attempt times can be reduced significantly.

  3. Increase Count of Internal Circuits

  Hidden services need to create or cannibalize and extend a circuit to a
  rendezvous point for every client request. Really popular hidden services
  require more than two internal circuits in the pool to answer multiple
  client requests at the same time. This scenario was not yet analyzed, but
  will probably exhibit worse performance than measured in the previous
  analysis. The number of preemptively built internal circuits should be a
  function of connection requests in the past to adapt to changing needs.
  Furthermore, an increased number of internal circuits on client side
  would allow clients to establish connections to more than one hidden
  service at a time.

  Under the assumption that a popular hidden service cannot make use of
  cannibalization for connecting to rendezvous points, the circuit creation
  time needs to be added to the current results. In the mean, the
  connection establishment time to a popular hidden service would increase
  by 4.7 seconds.

  4. Build More Introduction Circuits

  When establishing introduction points, a hidden service should launch 5
  instead of 3 introduction circuits at the same time and use only the
  first 3 that could be established. The remaining two circuits could still
  be used for other purposes afterwards.

  The effect has been simulated using previously measured data, too.
  Therefore, circuit establishment times were derived from log files and
  written to an array. Afterwards, a simulation with 10,000 runs was
  performed picking 5 (4, 6) random values and using the 3 lowest values in
  contrast to picking only 3 values at random. The result is that the mean
  time of the 3-out-of-3 approach is 8.1 seconds, while the mean time of
  the 3-out-of-5 approach is 4.4 seconds.

  The effect on network load is minimal, because the hidden service can
  reuse the slower internal circuits for other purposes, e.g., rendezvous
  circuits. The only change is that a hidden service starts establishing
  more circuits at once instead of subsequently doing so.

References:

  [1] http://freehaven.net/~karsten/hidserv/perfanalysis-2008-06-15.pdf

  [2] http://freehaven.net/~karsten/hidserv/discussion-2008-07-15.pdf

  [3] http://freehaven.net/~karsten/hidserv/design-2008-08-15.pdf

Filename: 156-tracking-blocked-ports.txt
Title: Tracking blocked ports on the client side
Author: Robert Hogan
Created: 14-Oct-2008
Status: Superseded

[Superseded by 156, which recognizes the security issues here.]


Motivation:
Tor clients that are behind extremely restrictive firewalls can end up
waiting a while for their first successful OR connection to a node on the
network.  Worse, the more restrictive their firewall the more susceptible
they are to an attacker guessing their entry nodes. Tor routers that
are behind extremely restrictive firewalls can only offer a limited,
'partitioned' service to other routers and clients on the network. Exit
nodes behind extremely restrictive firewalls may advertise ports that they
are actually not able to connect to, wasting network resources in circuit
constructions that are doomed to fail at the last hop on first use.

Proposal:

When a client attempts to connect to an entry guard it should avoid
further attempts on ports that fail once until it has connected to at
least one entry guard successfully. (Maybe it should wait for more than
one failure to reduce the skew on the first node selection.) Thereafter
it should select entry guards regardless of port and warn the user if
it observes that connections to a given port have failed every multiple
of 5 times without success or since the last success.

Tor should warn the operators of exit, middleman and entry nodes if it
observes that connections to a given port have failed a multiple of 5
times without success or since the last success. If attempts on a port
fail 20 or more times without or since success, Tor should add the port
to a 'blocked-ports' entry in its descriptor's extra-info. Some thought
needs to be given to what the authorities might do with this information.

Related TODO item:
    "- Automatically determine what ports are reachable and start using
      those, if circuits aren't working and it's a pattern we
      recognize ("port 443 worked once and port 9001 keeps not
      working")."


I've had a go at implementing all of this in the attached.

Addendum:
Just a note on the patch, storing the digest of each router that uses the port
is a bit of a memory hog, and its only real purpose is to provide a count of
routers using that port when warning the user. That could be achieved when
warning the user by iterating through the routerlist instead.

Index: src/or/connection_or.c
===================================================================
--- src/or/connection_or.c	(revision 17104)
+++ src/or/connection_or.c	(working copy)
@@ -502,6 +502,9 @@
 connection_or_connect_failed(or_connection_t *conn,
                              int reason, const char *msg)
 {
+  if ((reason == END_OR_CONN_REASON_NO_ROUTE) ||
+      (reason == END_OR_CONN_REASON_REFUSED))
+    or_port_hist_failure(conn->identity_digest,TO_CONN(conn)->port);
   control_event_or_conn_status(conn, OR_CONN_EVENT_FAILED, reason);
   if (!authdir_mode_tests_reachability(get_options()))
     control_event_bootstrap_problem(msg, reason);
@@ -580,6 +583,7 @@
     /* already marked for close */
     return NULL;
   }
+
   return conn;
 }
 
@@ -909,6 +913,7 @@
   control_event_or_conn_status(conn, OR_CONN_EVENT_CONNECTED, 0);
 
   if (started_here) {
+    or_port_hist_success(TO_CONN(conn)->port);
     rep_hist_note_connect_succeeded(conn->identity_digest, now);
     if (entry_guard_register_connect_status(conn->identity_digest,
                                             1, now) < 0) {
Index: src/or/rephist.c
===================================================================
--- src/or/rephist.c	(revision 17104)
+++ src/or/rephist.c	(working copy)
@@ -18,6 +18,7 @@
 static void bw_arrays_init(void);
 static void predicted_ports_init(void);
 static void hs_usage_init(void);
+static void or_port_hist_init(void);
 
 /** Total number of bytes currently allocated in fields used by rephist.c. */
 uint64_t rephist_total_alloc=0;
@@ -89,6 +90,25 @@
   digestmap_t *link_history_map;
 } or_history_t;
 
+/** or_port_hist_t contains our router/client's knowledge of
+    all OR ports offered on the network, and how many servers with each port we
+    have succeeded or failed to connect to. */
+typedef struct {
+  /** The port this entry is tracking. */
+  uint16_t or_port;
+  /** Have we ever connected to this port on another OR?. */
+  unsigned int success:1;
+  /** The ORs using this port. */
+  digestmap_t *ids;
+  /** The ORs using this port we have failed to connect to. */
+  digestmap_t *failure_ids;
+  /** Are we excluding ORs with this port during entry selection?*/
+  unsigned int excluded;
+} or_port_hist_t;
+
+static unsigned int still_searching = 0;
+static smartlist_t *or_port_hists;
+
 /** When did we last multiply all routers' weighted_run_length and
  * total_run_weights by STABILITY_ALPHA? */
 static time_t stability_last_downrated = 0;
@@ -164,6 +184,16 @@
   tor_free(hist);
 }
 
+/** Helper: free storage held by a single OR port history entry. */
+static void
+or_port_hist_free(or_port_hist_t *p)
+{
+  tor_assert(p);
+  digestmap_free(p->ids,NULL);
+  digestmap_free(p->failure_ids,NULL);
+  tor_free(p);
+}
+
 /** Update an or_history_t object <b>hist</b> so that its uptime/downtime
  * count is up-to-date as of <b>when</b>.
  */
@@ -1639,7 +1669,7 @@
     tmp_time = smartlist_get(predicted_ports_times, i);
     if (*tmp_time + PREDICTED_CIRCS_RELEVANCE_TIME < now) {
       tmp_port = smartlist_get(predicted_ports_list, i);
-      log_debug(LD_CIRC, "Expiring predicted port %d", *tmp_port);
+      log_debug(LD_HIST, "Expiring predicted port %d", *tmp_port);
       smartlist_del(predicted_ports_list, i);
       smartlist_del(predicted_ports_times, i);
       rephist_total_alloc -= sizeof(uint16_t)+sizeof(time_t);
@@ -1821,6 +1851,12 @@
   tor_free(last_stability_doc);
   built_last_stability_doc_at = 0;
   predicted_ports_free();
+  if (or_port_hists) {
+    SMARTLIST_FOREACH(or_port_hists, or_port_hist_t *, p,
+                      or_port_hist_free(p));
+    smartlist_free(or_port_hists);
+    or_port_hists = NULL;
+  }
 }
 
 /****************** hidden service usage statistics ******************/
@@ -2356,3 +2392,225 @@
   tor_free(fname);
 }
 
+/** Create a new entry in the port tracking cache for the or_port in
+  * <b>ri</b>. */
+void
+or_port_hist_new(const routerinfo_t *ri)
+{
+  or_port_hist_t *result;
+  const char *id=ri->cache_info.identity_digest;
+
+  if (!or_port_hists)
+    or_port_hist_init();
+
+  SMARTLIST_FOREACH(or_port_hists, or_port_hist_t *, tp,
+    {
+      /* Cope with routers that change their advertised OR port or are
+         dropped from the networkstatus. We don't discard the failures of
+         dropped routers because they are still valid when counting
+         consecutive failures on a port.*/
+      if (digestmap_get(tp->ids, id) && (tp->or_port != ri->or_port)) {
+        digestmap_remove(tp->ids, id);
+      }
+      if (tp->or_port == ri->or_port) {
+        if (!(digestmap_get(tp->ids, id)))
+          digestmap_set(tp->ids, id, (void*)1);
+        return;
+      }
+    });
+
+  result = tor_malloc_zero(sizeof(or_port_hist_t));
+  result->or_port=ri->or_port;
+  result->success=0;
+  result->ids=digestmap_new();
+  digestmap_set(result->ids, id, (void*)1);
+  result->failure_ids=digestmap_new();
+  result->excluded=0;
+  smartlist_add(or_port_hists, result);
+}
+
+/** Create the port tracking cache. */
+/*XXX: need to call this when we rebuild/update our network status */
+static void
+or_port_hist_init(void)
+{
+  routerlist_t *rl = router_get_routerlist();
+
+  if (!or_port_hists)
+    or_port_hists=smartlist_create();
+
+  if (rl && rl->routers) {
+    SMARTLIST_FOREACH(rl->routers, routerinfo_t *, ri,
+    {
+      or_port_hist_new(ri);
+    });
+  }
+}
+
+#define NOT_BLOCKED 0
+#define FAILURES_OBSERVED 1
+#define POSSIBLY_BLOCKED 5
+#define PROBABLY_BLOCKED 10
+/** Return the list of blocked ports for our router's extra-info.*/
+char *
+or_port_hist_get_blocked_ports(void)
+{
+  char blocked_ports[2048];
+  char *bp;
+  
+  tor_snprintf(blocked_ports,sizeof(blocked_ports),"blocked-ports");
+  SMARTLIST_FOREACH(or_port_hists, or_port_hist_t *, tp,
+    {
+      if (digestmap_size(tp->failure_ids) >= PROBABLY_BLOCKED)
+        tor_snprintf(blocked_ports+strlen(blocked_ports),
+                     sizeof(blocked_ports)," %u,",tp->or_port);
+    });
+  if (strlen(blocked_ports) == 13)
+    return NULL;
+  bp=tor_strdup(blocked_ports);
+  bp[strlen(bp)-1]='\n';
+  bp[strlen(bp)]='\0';
+  return bp;
+}
+
+/** Revert to client-only mode if we have seen to many failures on a port or
+  * range of ports.*/
+static void
+or_port_hist_report_block(unsigned int min_severity)
+{
+  or_options_t *options=get_options();
+  char failures_observed[2048],possibly_blocked[2048],probably_blocked[2048];
+  char port[1024];
+
+  memset(failures_observed,0,sizeof(failures_observed));
+  memset(possibly_blocked,0,sizeof(possibly_blocked));
+  memset(probably_blocked,0,sizeof(probably_blocked));
+
+  SMARTLIST_FOREACH(or_port_hists, or_port_hist_t *, tp,
+    {
+      unsigned int failures = digestmap_size(tp->failure_ids);
+      if (failures >= min_severity) {
+        tor_snprintf(port, sizeof(port), " %u (%u failures %s out of %u on the"
+                     " network)",tp->or_port,failures,
+                     (!tp->success)?"and no successes": "since last success",
+                     digestmap_size(tp->ids));
+        if (failures >= PROBABLY_BLOCKED) {
+          strlcat(probably_blocked, port, sizeof(probably_blocked));
+        } else if (failures >= POSSIBLY_BLOCKED)
+          strlcat(possibly_blocked, port, sizeof(possibly_blocked));
+        else if (failures >= FAILURES_OBSERVED)
+          strlcat(failures_observed, port, sizeof(failures_observed));
+      }
+    });
+
+  log_warn(LD_HIST,"%s%s%s%s%s%s%s%s",
+           server_mode(options) &&
+           ((min_severity==FAILURES_OBSERVED) || strlen(probably_blocked))?
+           "You should consider disabling your Tor server.":"",
+           (min_severity==FAILURES_OBSERVED)?
+           "Tor appears to be blocked from connecting to a range of ports "
+           "with the result that it cannot connect to one tenth of the Tor "
+           "network. ":"",
+           strlen(failures_observed)?
+           "Tor has observed failures on the following ports: ":"",
+           failures_observed,
+           strlen(possibly_blocked)?
+           "Tor is possibly blocked on the following ports: ":"",
+           possibly_blocked,
+           strlen(probably_blocked)?
+           "Tor is almost certainly blocked on the following ports: ":"",
+           probably_blocked);
+
+}
+
+/** Record the success of our connection to <b>digest</b>'s
+  * OR port. */
+void
+or_port_hist_success(uint16_t or_port)
+{
+  SMARTLIST_FOREACH(or_port_hists, or_port_hist_t *, tp,
+    {
+      if (tp->or_port != or_port)
+        continue;
+      /*Reset our failure stats so we can notice if this port ever gets
+        blocked again.*/
+      tp->success=1;
+      if (digestmap_size(tp->failure_ids)) {
+        digestmap_free(tp->failure_ids,NULL);
+        tp->failure_ids=digestmap_new();
+      }
+      if (still_searching) {
+        still_searching=0;
+        SMARTLIST_FOREACH(or_port_hists,or_port_hist_t *,t,t->excluded=0;);
+      }
+      return;
+    });
+}
+/** Record the failure of our connection to <b>digest</b>'s
+  * OR port. Warn, exclude the port from future entry guard selection, or
+  * add port to blocked-ports in our server's extra-info as appropriate. */
+void
+or_port_hist_failure(const char *digest, uint16_t or_port)
+{
+  int total_failures=0, ports_excluded=0, report_block=0;
+  int total_routers=smartlist_len(router_get_routerlist()->routers);
+
+  SMARTLIST_FOREACH(or_port_hists, or_port_hist_t *, tp,
+    {
+      ports_excluded += tp->excluded;
+      total_failures+=digestmap_size(tp->failure_ids);
+      if (tp->or_port != or_port)
+        continue;
+      /* We're only interested in unique failures */
+      if (digestmap_get(tp->failure_ids, digest))
+        return;
+
+      total_failures++;
+      digestmap_set(tp->failure_ids, digest, (void*)1);
+      if (still_searching && !tp->success) {
+        tp->excluded=1;
+        ports_excluded++;
+      }
+      if ((digestmap_size(tp->ids) >= POSSIBLY_BLOCKED) &&
+         !(digestmap_size(tp->failure_ids) % POSSIBLY_BLOCKED))
+        report_block=POSSIBLY_BLOCKED;
+    });
+
+  if (total_failures >= (int)(total_routers/10))
+    or_port_hist_report_block(FAILURES_OBSERVED);
+  else if (report_block)
+    or_port_hist_report_block(report_block);
+
+  if (ports_excluded >= smartlist_len(or_port_hists)) {
+    log_warn(LD_HIST,"During entry node selection Tor tried every port "
+             "offered on the network on at least one server "
+             "and didn't manage a single "
+             "successful connection. This suggests you are behind an "
+             "extremely restrictive firewall. Tor will keep trying to find "
+             "a reachable entry node.");
+    SMARTLIST_FOREACH(or_port_hists, or_port_hist_t *, tp, tp->excluded=0;);
+  }
+}
+
+/** Add any ports marked as excluded in or_port_hist_t to <b>rt</b> */
+void
+or_port_hist_exclude(routerset_t *rt)
+{
+  SMARTLIST_FOREACH(or_port_hists, or_port_hist_t *, tp,
+    {
+      char portpolicy[9];
+      if (tp->excluded) {
+        tor_snprintf(portpolicy,sizeof(portpolicy),"*:%u", tp->or_port);
+        log_warn(LD_HIST,"Port %u may be blocked, excluding it temporarily "
+                          "from entry guard selection.", tp->or_port);
+        routerset_parse(rt, portpolicy, "Ports");
+      }
+    });
+}
+
+/** Allow the exclusion of ports during our search for an entry node. */
+void
+or_port_hist_search_again(void)
+{
+    still_searching=1;
+}
Index: src/or/or.h
===================================================================
--- src/or/or.h	(revision 17104)
+++ src/or/or.h	(working copy)
@@ -3864,6 +3864,13 @@
 int any_predicted_circuits(time_t now);
 int rep_hist_circbuilding_dormant(time_t now);
 
+void or_port_hist_failure(const char *digest, uint16_t or_port);
+void or_port_hist_success(uint16_t or_port);
+void or_port_hist_new(const routerinfo_t *ri);
+void or_port_hist_exclude(routerset_t *rt);
+void or_port_hist_search_again(void);
+char *or_port_hist_get_blocked_ports(void);
+
 /** Possible public/private key operations in Tor: used to keep track of where
  * we're spending our time. */
 typedef enum {
Index: src/or/routerparse.c
===================================================================
--- src/or/routerparse.c	(revision 17104)
+++ src/or/routerparse.c	(working copy)
@@ -1401,6 +1401,8 @@
     goto err;
   }
 
+  or_port_hist_new(router);
+
   if (!router->platform) {
     router->platform = tor_strdup("<unknown>");
   }
Index: src/or/router.c
===================================================================
--- src/or/router.c	(revision 17104)
+++ src/or/router.c	(working copy)
@@ -1818,6 +1818,7 @@
   char published[ISO_TIME_LEN+1];
   char digest[DIGEST_LEN];
   char *bandwidth_usage;
+  char *blocked_ports;
   int result;
   size_t len;
 
@@ -1825,7 +1826,6 @@
                 extrainfo->cache_info.identity_digest, DIGEST_LEN);
   format_iso_time(published, extrainfo->cache_info.published_on);
   bandwidth_usage = rep_hist_get_bandwidth_lines(1);
-
   result = tor_snprintf(s, maxlen,
                         "extra-info %s %s\n"
                         "published %s\n%s",
@@ -1835,6 +1835,16 @@
   if (result<0)
     return -1;
 
+  blocked_ports = or_port_hist_get_blocked_ports();
+  if (blocked_ports) {
+      result = tor_snprintf(s+strlen(s), maxlen-strlen(s),
+                            "%s",
+                            blocked_ports);
+      tor_free(blocked_ports);
+      if (result<0)
+        return -1;
+  }
+
   if (should_record_bridge_info(options)) {
     static time_t last_purged_at = 0;
     char *geoip_summary;
Index: src/or/circuitbuild.c
===================================================================
--- src/or/circuitbuild.c	(revision 17104)
+++ src/or/circuitbuild.c	(working copy)
@@ -62,6 +62,7 @@
 
 static void entry_guards_changed(void);
 static time_t start_of_month(time_t when);
+static int num_live_entry_guards(void);
 
 /** Iterate over values of circ_id, starting from conn-\>next_circ_id,
  * and with the high bit specified by conn-\>circ_id_type, until we get
@@ -1627,12 +1628,14 @@
   smartlist_t *excluded;
   or_options_t *options = get_options();
   router_crn_flags_t flags = 0;
+  routerset_t *_ExcludeNodes;
 
   if (state && options->UseEntryGuards &&
       (purpose != CIRCUIT_PURPOSE_TESTING || options->BridgeRelay)) {
     return choose_random_entry(state);
   }
 
+  _ExcludeNodes = routerset_new();
   excluded = smartlist_create();
 
   if (state && (r = build_state_get_exit_router(state))) {
@@ -1670,12 +1673,18 @@
   if (options->_AllowInvalid & ALLOW_INVALID_ENTRY)
     flags |= CRN_ALLOW_INVALID;
 
+  if (options->ExcludeNodes)
+    routerset_union(_ExcludeNodes,options->ExcludeNodes);
+
+  or_port_hist_exclude(_ExcludeNodes);
+
   choice = router_choose_random_node(
            NULL,
            excluded,
-           options->ExcludeNodes,
+           _ExcludeNodes,
            flags);
   smartlist_free(excluded);
+  routerset_free(_ExcludeNodes);
   return choice;
 }
 
@@ -2727,6 +2736,7 @@
 entry_guards_update_state(or_state_t *state)
 {
   config_line_t **next, *line;
+  unsigned int have_reachable_entry=0;
   if (! entry_guards_dirty)
     return;
 
@@ -2740,6 +2750,7 @@
       char dbuf[HEX_DIGEST_LEN+1];
       if (!e->made_contact)
         continue; /* don't write this one to disk */
+      have_reachable_entry=1;
       *next = line = tor_malloc_zero(sizeof(config_line_t));
       line->key = tor_strdup("EntryGuard");
       line->value = tor_malloc(HEX_DIGEST_LEN+MAX_NICKNAME_LEN+2);
@@ -2785,6 +2796,11 @@
   if (!get_options()->AvoidDiskWrites)
     or_state_mark_dirty(get_or_state(), 0);
   entry_guards_dirty = 0;
+
+  /* XXX: Is this the place to decide that we no longer have any reachable
+    guards? */
+  if (!have_reachable_entry)
+    or_port_hist_search_again();
 }
 
 /** If <b>question</b> is the string "entry-guards", then dump

Filename: 157-specific-cert-download.txt
Title: Make certificate downloads specific
Author: Nick Mathewson
Created: 2-Dec-2008
Status: Closed
Target: 0.2.4.x

History:

  2008 Dec 2, 22:34
     Changed name of cross certification field to match the other authority
     certificate fields.

Status:

  As of 0.2.1.9-alpha:
    Cross-certification is implemented for new certificates, but not yet
    required.  Directories support the tor/keys/fp-sk urls.

Overview:

  Tor's directory specification gives two ways to download a certificate:
  by its identity fingerprint, or by the digest of its signing key.  Both
  are error-prone.  We propose a new download mechanism to make sure that
  clients get the certificates they want.

Motivation:

  When a client wants a certificate to verify a consensus, it has two choices
  currently:
     - Download by identity key fingerprint.  In this case, the client risks
       getting a certificate for the same authority, but with a different
       signing key than the one used to sign the consensus.

     - Download by signing key fingerprint.  In this case, the client risks
       getting a forged certificate that contains the right signing key
       signed with the wrong identity key.  (Since caches are willing to
       cache certs from authorities they do not themselves recognize, the
       attacker wouldn't need to compromise an authority's key to do this.)

Current solution:

  Clients fetch by identity keys, and re-fetch with backoff if they don't get
  certs with the signing key they want.

Proposed solution:

  Phase 1: Add a URL type for clients to download certs by identity _and_
  signing key fingerprint.  Unless both fields match, the client doesn't
  accept the certificate(s).  Clients begin using this method when their
  randomly chosen directory cache supports it.

  Phase 1A: Simultaneously, add a cross-certification element to
  certificates.

  Phase 2: Once many directory caches support phase 1, clients should prefer
  to fetch certificates using that protocol when available.

  Phase 2A: Once all authorities are generating cross-certified certificates
  as in phase 1A, require cross-certification.

Specification additions:

  The key certificate whose identity key fingerprint is <F> and whose signing
  key fingerprint is <S> should be available at:

      http://<hostname>/tor/keys/fp-sk/<F>-<S>.z

  As usual, clients may request multiple certificates using:

      http://<hostname>/tor/keys/fp-sk/<F1>-<S1>+<F2>-<S2>.z

  Clients SHOULD use this format whenever they know both key fingerprints for
  a desired certificate.


  Certificates SHOULD contain the following field (at most once):

  "dir-key-crosscert" NL CrossSignature NL

  where CrossSignature is a signature, made using the certificate's signing
  key, of the digest of the PKCS1-padded hash of the certificate's identity
  key.  For backward compatibility with broken versions of the parser, we
  wrap the base64-encoded signature in -----BEGIN ID SIGNATURE---- and
  -----END ID SIGNATURE----- tags.  (See bug 880.) Implementations MUST allow
  the "ID " portion to be omitted, however.

  When encountering a certificate with a dir-key-crosscert entry,
  implementations MUST verify that the signature is a correct signature of
  the hash of the identity key using the signing key.

  (In a future version of this specification, dir-key-crosscert entries will
  be required.)

Why cross-certify too?

  Cross-certification protects clients who haven't updated yet, by reducing
  the number of caches that are willing to hold and serve bogus certificates.

References:

  This is related to part 2 of bug 854.
Filename: 158-microdescriptors.txt
Title: Clients download consensus + microdescriptors
Author: Roger Dingledine
Created: 17-Jan-2009
Status: Closed
Implemented-In: 0.2.3.1-alpha

0. History

  15 May 2009: Substantially revised based on discussions on or-dev
  from late January.  Removed the notion of voting on how to choose
  microdescriptors; made it just a function of the consensus method.
  (This lets us avoid the possibility of "desynchronization.")
  Added suggestion to use a new consensus flavor.  Specified use of
  SHA256 for new hashes. -nickm

  15 June 2009: Cleaned up based on comments from Roger. -nickm

1. Overview

  This proposal replaces section 3.2 of proposal 141, which was
  called "Fetching descriptors on demand". Rather than modifying the
  circuit-building protocol to fetch a server descriptor inline at each
  circuit extend, we instead put all of the information that clients need
  either into the consensus itself, or into a new set of data about each
  relay called a microdescriptor.

  Descriptor elements that are small and frequently changing should go
  in the consensus itself, and descriptor elements that are small and
  relatively static should go in the microdescriptor. If we ever end up
  with descriptor elements that aren't small yet clients need to know
  them, we'll need to resume considering some design like the one in
  proposal 141.

  Note also that any descriptor element which clients need to use to
  decide which servers to fetch info about, or which servers to fetch
  info from, needs to stay in the consensus.

2. Motivation

  See
  http://archives.seul.org/or/dev/Nov-2008/msg00000.html and
  http://archives.seul.org/or/dev/Nov-2008/msg00001.html and especially
  http://archives.seul.org/or/dev/Nov-2008/msg00007.html
  for a discussion of the options and why this is currently the best
  approach.

3. Design

  There are three pieces to the proposal. First, authorities will list in
  their votes (and thus in the consensus) the expected hash of
  microdescriptor for each relay. Second, authorities will serve
  microdescriptors, directory mirrors will cache and serve
  them. Third, clients will ask for them and cache them.

3.1. Consensus changes

  If the authorities choose a consensus method of a given version or
  later, a microdescriptor format is implicit in that version.
  A microdescriptor should in every case be a pure function of the
  router descriptor and the consensus method.

  In votes, we need to include the hash of each expected microdescriptor
  in the routerstatus section. I suggest a new "m" line for each stanza,
  with the base64 of the SHA256 hash of the router's microdescriptor.

  For every consensus method that an authority supports, it includes a
  separate "m" line in each router section of its vote, containing:
    "m" SP methods 1*(SP AlgorithmName "=" digest) NL
  where methods is a comma-separated list of the consensus methods
  that the authority believes will produce "digest".

  (As with base64 encoding of SHA1 hashes in consensuses, let's
  omit the trailing =s)

  The consensus microdescriptor-elements and "m" lines are then computed
  as described in Section 3.1.2 below.

  (This means we need a new consensus-method that knows
  how to compute the microdescriptor-elements and add "m" lines.)

  The microdescriptor consensus uses the directory-signature format from
  proposal 162, with the "sha256" algorithm.


3.1.1. Descriptor elements to include for now

  In the first version, the microdescriptor should contain the
  onion-key element, and the family element from the router descriptor,
  and the exit policy summary as currently specified in dir-spec.txt.

3.1.2. Computing consensus for microdescriptor-elements and "m" lines

  When we are generating a consensus, we use whichever m line
  unambiguously corresponds to the descriptor digest that will be
  included in the consensus.

  (If different votes have different microdescriptor digests for a
  single <descriptor-digest, consensus-method> pair, then at least one
  of the authorities is broken.  If this happens, the consensus should
  contain whichever microdescriptor digest is most common.  If there is
  no winner, we break ties in the favor of the lexically earliest.
  Either way, we should log a warning: there is definitely a bug.)

  The "m" lines in a consensus contain only the digest, not a list of
  consensus methods.

3.1.3. A new flavor of consensus

  Rather than inserting "m" lines in the current consensus format,
  they should be included in a new consensus flavor (see proposal
  162).

  This flavor can safely omit descriptor digests.

  When we implement this voting method, we can remove the exit policy
  summary from the current "ns" flavor of consensus, since no current
  clients use them, and they take up about 5% of the compressed
  consensus.

  This new consensus flavor should be signed with the sha256 signature
  format as documented in proposal 162.

3.2. Directory mirrors fetch, cache, and serve microdescriptors

  Directory mirrors should fetch, catch, and serve each microdescriptor
  from the authorities.  (They need to continue to serve normal relay
  descriptors too, to handle old clients.)

  The microdescriptors with base64 hashes <D1>,<D2>,<D3> should be
  available at:
    http://<hostname>/tor/micro/d/<D1>-<D2>-<D3>.z
  (We use base64 for size and for consistency with the consensus
  format. We use -s instead of +s to separate these items, since
  the + character is used in base64 encoding.)

  All the microdescriptors from the current consensus should also be
  available at:
    http://<hostname>/tor/micro/all.z
  so a client that's bootstrapping doesn't need to send a 70KB URL just
  to name every microdescriptor it's looking for.

  Microdescriptors have no header or footer.
  The hash of the microdescriptor is simply the hash of the concatenated
  elements.

  Directory mirrors should check to make sure that the microdescriptors
  they're about to serve match the right hashes (either the hashes from
  the fetch URL or the hashes from the consensus, respectively).

  We will probably want to consider some sort of smart data structure to
  be able to quickly convert microdescriptor hashes into the appropriate
  microdescriptor. Clients will want this anyway when they load their
  microdescriptor cache and want to match it up with the consensus to
  see what's missing.

3.3. Clients fetch them and cache them

  When a client gets a new consensus, it looks to see if there are any
  microdescriptors it needs to learn. If it needs to learn more than
  some threshold of the microdescriptors (half?), it requests 'all',
  else it requests only the missing ones.  Clients MAY try to
  determine whether the upload bandwidth for listing the
  microdescriptors they want is more or less than the download
  bandwidth for the microdescriptors they do not want.

  Clients maintain a cache of microdescriptors along with metadata like
  when it was last referenced by a consensus, and which identity key
  it corresponds to.  They keep a microdescriptor
  until it hasn't been mentioned in any consensus for a week. Future
  clients might cache them for longer or shorter times.

3.3.1. Information leaks from clients

  If a client asks you for a set of microdescs, then you know she didn't
  have them cached before. How much does that leak? What about when
  we're all using our entry guards as directory guards, and we've seen
  that user make a bunch of circuits already?

  Fetching "all" when you need at least half is a good first order fix,
  but might not be all there is to it.

  Another future option would be to fetch some of the microdescriptors
  anonymously (via a Tor circuit).

  Another crazy option (Roger's phrasing) is to do decoy fetches as
  well.

4. Transition and deployment

  Phase one, the directory authorities should start voting on
  microdescriptors, and putting them in the consensus.

  Phase two, directory mirrors should learn how to serve them, and learn
  how to read the consensus to find out what they should be serving.

  Phase three, clients should start fetching and caching them instead
  of normal descriptors.

Filename: 159-exit-scanning.txt
Title: Exit Scanning
Author: Mike Perry
Created: 13-Feb-2009
Status: Informational

Overview:

This proposal describes the implementation and integration of an
automated exit node scanner for scanning the Tor network for malicious,
misconfigured, firewalled or filtered nodes.

Motivation:

Tor exit nodes can be run by anyone with an Internet connection. Often,
these users aren't fully aware of limitations of their networking
setup.  Content filters, antivirus software, advertisements injected by
their service providers, malicious upstream providers, and the resource
limitations of their computer or networking equipment have all been
observed on the current Tor network.

It is also possible that some nodes exist purely for malicious
purposes.  In the past, there have been intermittent instances of
nodes spoofing SSH keys, as well as nodes being used for purposes of
plaintext surveillance.

While it is not realistic to expect to catch extremely targeted or
completely passive malicious adversaries, the goal is to prevent
malicious adversaries from deploying dragnet attacks against large
segments of the Tor userbase.


Scanning methodology:

The first scans to be implemented are HTTP, HTML, Javascript, and
SSL scans.

The HTTP scan scrapes Google for common filetype urls such as exe, msi,
doc, dmg, etc. It then fetches these urls through Non-Tor and Tor, and
compares the SHA1 hashes of the resulting content.

The SSL scan downloads certificates for all IPs a domain will locally
resolve to and compares these certificates to those seen over Tor. The
scanner notes if a domain had rotated certificates locally in the
results for each scan.

The HTML scan checks HTML, Javascript, and plugin content for
modifications. Because of the dynamic nature of most of the web, the
scanner has a number of mechanisms built in to filter out false
positives that are used when a change is noticed between Tor and
Non-Tor.

All tests also share a URL-based false positive filter that
automatically removes results retroactively if the number of failures
exceeds a certain percentage of nodes tested with the URL.


Deployment Stages:

To avoid instances where bugs cause us to mark exit nodes as BadExit
improperly, it is proposed that we begin use of the scanner in stages.

1. Manual Review:

  In the first stage, basic scans will be run by a small number of
  people while we stabilize the scanner. The scanner has the ability
  to resume crashed scans, and to rescan nodes that fail various
  tests.

2. Human Review:

  In the second stage, results will be automatically mailed to
  an email list of interested parties for review. We will also begin
  classifying failure types into three to four different severity
  levels, based on both the reliability of the test and the nature of
  the failure.

3. Automatic BadExit Marking:

  In the final stage, the scanner will begin marking exits depending
  on the failure severity level in one of three different ways: by
  node idhex, by node IP, or by node IP mask. A potential fourth, less
  severe category of results may still be delivered via email only for
  review.

  BadExit markings will be delivered in batches upon completion
  of whole-network scans, so that the final false positive
  filter has an opportunity to filter out URLs that exhibit
  dynamic content beyond what we can filter.


Specification of Exit Marking:

Technically, BadExit could be marked via SETCONF AuthDirBadExit over
the control port, but this would allow full access to the directory
authority configuration and operation.

The approved-routers file could also be used, but currently it only
supports fingerprints, and it also contains other data unrelated to
exit scanning that would be difficult to coordinate.

Instead, we propose that a new badexit-routers file that has three
keywords:

  BadExitNet 1*[exitpattern from 2.3 in dir-spec.txt]
  BadExitFP 1*[hexdigest from 2.3 in dir-spec.txt]

BadExitNet lines would follow the codepaths used by AuthDirBadExit to
set authdir_badexit_policy, and BadExitFP would follow the codepaths
from approved-router's !badexit lines.

The scanner would have exclusive ability to write, append, rewrite,
and modify this file. Prior to building a new consensus vote, a
participating Tor authority would read in a fresh copy.


Security Implications:

Aside from evading the scanner's detection, there are two additional
high-level security considerations:

1. Ensure nodes cannot be marked BadExit by an adversary at will

It is possible individual website owners will be able to target certain
Tor nodes, but once they begin to attempt to fail more than the URL
filter percentage of the exits, their sites will be automatically
discarded.

Failing specific nodes is possible, but scanned results are fully
reproducible, and BadExits should be rare enough that humans are never
fully removed from the loop.

State (cookies, cache, etc) does not otherwise persist in the scanner
between exit nodes to enable one exit node to bias the results of a
later one.

2. Ensure that scanner compromise does not yield authority compromise

Having a separate file that is under the exclusive control of the
scanner allows us to heavily isolate the scanner from the Tor
authority, potentially even running them on separate machines.

Filename: 160-bandwidth-offset.txt
Title: Authorities vote for bandwidth offsets in consensus
Author: Roger Dingledine
Created: 4-May-2009
Status: Closed
Target: 0.2.1.x

1. Motivation

  As part of proposal 141, we moved the bandwidth value for each relay
  into the consensus. Now clients can know how they should load balance
  even before they've fetched the corresponding relay descriptors.

  Putting the bandwidth in the consensus also lets the directory
  authorities choose more accurate numbers to advertise, if we come up
  with a better algorithm for deciding weightings.

  Our original plan was to teach directory authorities how to measure
  bandwidth themselves; then every authority would vote for the bandwidth
  it prefers, and we'd take the median of votes as usual.

  The problem comes when we have 7 authorities, and only a few of them
  have smarter bandwidth allocation algorithms. So long as the majority
  of them are voting for the number in the relay descriptor, the minority
  that have better numbers will be ignored.

2. Options

  One fix would be to demand that every authority also run the
  new bandwidth measurement algorithms: in that case, part of the
  responsibility of being an authority operator is that you need to run
  this code too. But in practice we can't really require all current
  authority operators to do that; and if we want to expand the set of
  authority operators even further, it will become even more impractical.
  Also, bandwidth testing adds load to the network, so we don't really
  want to require that the number of concurrent bandwidth tests match
  the number of authorities we have.

  The better fix is to allow certain authorities to specify that they are
  voting on bandwidth measurements: more accurate bandwidth values that
  have actually been evaluated. In this way, authorities can vote on 
  the median measured value if sufficient measured votes exist for a router,
  and otherwise fall back to the median value taken from the published router
  descriptors.

3. Security implications

  If only some authorities choose to vote on an offset, then a majority of
  those voting authorities can arbitrarily change the bandwidth weighting
  for the relay. At the extreme, if there's only one offset-voting
  authority, then that authority can dictate which relays clients will
  find attractive.

  This problem isn't entirely new: we already have the worry wrt
  the subset of authorities that vote for BadExit.

  To make it not so bad, we should deploy at least three offset-voting
  authorities.

  Also, authorities that know how to vote for offsets should vote for
  an offset of zero for new nodes, rather than choosing not to vote on
  any offset in those cases.

4. Design

  First, we need a new consensus method to support this new calculation.

  Now v3 votes can have an additional value on the "w" line:
    "w Bandwidth=X Measured=" INT.

  Once we're using the new consensus method, the new way to compute the
  Bandwidth weight is by checking if there are at least 3 "Measured"
  votes. If so, the median of these is taken. Otherwise, the median
  of the "Bandwidth=" values are taken, as described in Proposal 141.

  Then the actual consensus looks just the same as it did before,
  so clients never have to know that this additional calculation is
  happening.

5. Implementation

  The Measured values will be read from a file provided by the scanners
  described in proposal 161. Files with a timestamp older than 3 days
  will be ignored.

  The file will be read in from dirserv_generate_networkstatus_vote_obj()
  in a location specified by a new config option "V3MeasuredBandwidths".
  A helper function will be called to populate new 'measured' and
  'has_measured' fields of the routerstatus_t 'routerstatuses' list with 
  values read from this file.

  An additional for_vote flag will be passed to 
  routerstatus_format_entry() from format_networkstatus_vote(), which will 
  indicate that the "Measured=" string should be appended to the "w Bandwith=" 
  line with the measured value in the struct.

  routerstatus_parse_entry_from_string() will be modified to parse the
  "Measured=" lines into routerstatus_t struct fields.

  Finally, networkstatus_compute_consensus() will set rs_out.bandwidth 
  to the median of the measured values if there are more than 3, otherwise
  it will use the bandwidth value median as normal.



Title: Computing Bandwidth Adjustments
Filename: 161-computing-bandwidth-adjustments.txt
Author: Mike Perry
Created: 12-May-2009
Target: 0.2.1.x
Status: Closed


1. Motivation

  There is high variance in the performance of the Tor network. Despite
  our efforts to balance load evenly across the Tor nodes, some nodes are
  significantly slower and more overloaded than others.

  Proposal 160 describes how we can augment the directory authorities to
  vote on measured bandwidths for routers. This proposal describes what
  goes into the measuring process.


2. Measurement Selection

  The general idea is to determine a load factor representing the ratio
  of the capacity of measured nodes to the rest of the network. This load
  factor could be computed from three potentially relevant statistics:
  circuit failure rates, circuit extend times, or stream capacity.

  Circuit failure rates and circuit extend times appear to be
  non-linearly proportional to node load. We've observed that the same
  nodes when scanned at US nighttime hours (when load is presumably
  lower) exhibit almost no circuit failure, and significantly faster
  extend times than when scanned during the day.

  Stream capacity, however, is much more uniform, even during US
  nighttime hours. Moreover, it is a more intuitive representation of
  node capacity, and also less dependent upon distance and latency
  if amortized over large stream fetches.


3. Average Stream Bandwidth Calculation

  The average stream bandwidths are obtained by dividing the network into
  slices of 50 nodes each, grouped according to advertised node bandwidth.

  Two hop circuits are built using nodes from the same slice, and a large
  file is downloaded via these circuits. The file sizes are set based
  on node percentile rank as follows:
    
     0-10: 2M
     10-20: 1M
     20-30: 512k
     30-50: 256k
     50-100: 128k

  These sizes are based on measurements performed during test scans.

  This process is repeated until each node has been chosen to participate
  in at least 5 circuits.


4. Ratio Calculation

  The ratios are calculated by dividing each measured value by the 
  network-wide average.


5. Ratio Filtering

  After the base ratios are calculated, a second pass is performed
  to remove any streams with nodes of ratios less than X=0.5 from
  the results of other nodes. In addition, all outlying streams
  with capacity of one standard deviation below a node's average
  are also removed.

  The final ratio result will be greater of the unfiltered ratio
  and the filtered ratio.


6. Pseudocode for Ratio Calculation Algorithm

  Here is the complete pseudocode for the ratio algorithm:

    Slices = {S | S is 50 nodes of similar consensus capacity}
    for S in Slices:
      while exists node N in S with circ_chosen(N) < 7:
        fetch_slice_file(build_2hop_circuit(N, (exit in S)))
      for N in S:
        BW_measured(N) = MEAN(b | b is bandwidth of a stream through N)
        Bw_stddev(N) = STDDEV(b | b is bandwidth of a stream through N)
      Bw_avg(S) = MEAN(b | b = BW_measured(N) for all N in S)  
      for N in S:
        Normal_Streams(N) = {stream via N | bandwidth >= BW_measured(N)} 
        BW_Norm_measured(N) =  MEAN(b | b is a bandwidth of Normal_Streams(N))

    Bw_net_avg(Slices) = MEAN(BW_measured(N) for all N in Slices)
    Bw_Norm_net_avg(Slices) = MEAN(BW_Norm_measured(N) for all N in Slices)

    for N in all Slices:
      Bw_net_ratio(N) = Bw_measured(N)/Bw_net_avg(Slices)
      Bw_Norm_net_ratio(N) = BW_Norm_measured(N)/Bw_Norm_net_avg(Slices)

      ResultRatio(N) = MAX(Bw_net_ratio(N), Bw_Norm_net_ratio(N))


7. Security implications

  The ratio filtering will deal with cases of sabotage by dropping
  both very slow outliers in stream average calculations, as well
  as dropping streams that used very slow nodes from the calculation
  of other nodes.

  This scheme will not address nodes that try to game the system by
  providing better service to scanners. The scanners can be detected
  at the entry by IP address, and at the exit by the destination fetch
  IP.

  Measures can be taken to obfuscate and separate the scanners' source
  IP address from the directory authority IP address. For instance,
  scans can happen offsite and the results can be rsynced into the
  authorities. The destination server IP can also change.
 
  Neither of these methods are foolproof, but such nodes can already
  lie about their bandwidth to attract more traffic, so this solution
  does not set us back any in that regard.


8. Parallelization

  Because each slice takes as long as 6 hours to complete, we will want
  to parallelize as much as possible. This will be done by concurrently
  running multiple scanners from each authority to deal with different
  segments of the network. Each scanner piece will continually loop 
  over a portion of the network, outputting files of the form:

   node_id=<idhex> SP strm_bw=<BW_measured(N)> SP 
         filt_bw=<BW_Norm_measured(N)> ns_bw=<CurrentConsensusBw(N)> NL

  The most recent file from each scanner will be periodically gathered 
  by another script that uses them to produce network-wide averages 
  and calculate ratios as per the algorithm in section 6. Because nodes 
  may shift in capacity, they may appear in more than one slice and/or 
  appear more than once in the file set. The most recently measured
  line will be chosen in this case.


9. Integration with Proposal 160

  The final results will be produced for the voting mechanism
  described in Proposal 160 by multiplying the derived ratio by
  the average published consensus bandwidth during the course of the
  scan, and taking the weighted average with the previous consensus
  bandwidth:

     Bw_new = Round((Bw_current * Alpha + Bw_scan_avg*Bw_ratio)/(Alpha + 1))

  The Alpha parameter is a smoothing parameter intended to prevent
  rapid oscillation between loaded and unloaded conditions. It is
  currently fixed at 0.333.

  The Round() step consists of rounding to the 3 most significant figures
  in base10, and then rounding that result to the nearest 1000, with 
  a minimum value of 1000.

  This will produce a new bandwidth value that will be output into a 
  file consisting of lines of the form:

     node_id=<idhex> SP bw=<Bw_new> NL
 
  The first line of the file will contain a timestamp in UNIX time()
  seconds. This will be used by the authority to decide if the 
  measured values are too old to use.
 
  This file can be either copied or rsynced into a directory readable
  by the directory authority.

Filename: 162-consensus-flavors.txt
Title: Publish the consensus in multiple flavors
Author: Nick Mathewson
Created: 14-May-2009
Implemented-In: 0.2.3.1-alpha
Status: Closed

[Implementation notes: the 'consensus index' feature never got implemented.]

Overview:

   This proposal describes a way to publish each consensus in
   multiple simultaneous formats, or "flavors".  This will reduce the
   amount of time needed to deploy new consensus-like documents, and
   reduce the size of consensus documents in the long term.

Motivation:

   In the future, we will almost surely want different fields and
   data in the network-status document.  Examples include:
      - Publishing hashes of microdescriptors instead of hashes of
        full descriptors (Proposal 158).
      - Including different digests of descriptors, instead of the
        perhaps-soon-to-be-totally-broken SHA1.

   Note that in both cases, from the client's point of view, this
   information _replaces_ older information.  If we're using a
   SHA256 hash, we don't need to see the SHA1.  If clients only want
   microdescriptors, they don't (necessarily) need to see hashes of
   other things.

   Our past approach to cases like this has been to shovel all of
   the data into the consensus document.  But this is rather poor
   for bandwidth.  Adding a single SHA256 hash to a consensus for
   each router increases the compressed consensus size by 47%.  In
   comparison, replacing a single SHA1 hash with a SHA256 hash for
   each listed router increases the consensus size by only 18%.

Design in brief:

   Let the voting process remain as it is, until a consensus is
   generated.  With future versions of the voting algorithm, instead
   of just a single consensus being generated, multiple consensus
   "flavors" are produced.

   Consensuses (all of them) include a list of which flavors are
   being generated.  Caches fetch and serve all flavors of consensus
   that are listed, regardless of whether they can parse or validate
   them, and serve them to clients.  Thus, once this design is in
   place, we won't need to deploy more cache changes in order to get
   new flavors of consensus to be cached.

   Clients download only the consensus flavor they want.

A note on hashes:

   Everything in this document is specified to use SHA256, and to be
   upgradeable to use better hashes in the future.

Spec modifications:

   1. URLs and changes to the current consensus format.

   Every consensus flavor has a name consisting of a sequence of one
   or more alphanumeric characters and dashes.  For compatibility
   current descriptor flavor is called "ns".

   The supported consensus flavors are defined as part of the
   authorities' consensus method.

   For each supported flavor, every authority calculates another
   consensus document of as-yet-unspecified format, and exchanges
   detached signatures for these documents as in the current consensus
   design.

   In addition to the consensus currently served at
   /tor/status-vote/(current|next)/consensus.z  and
   /tor/status-vote/(current|next)/consensus/<FP1>+<FP2>+<FP3>+....z ,
   authorities serve another consensus of each flavor "F" from the
   locations /tor/status-vote/(current|next)/consensus-F.z. and
   /tor/status-vote/(current|next)/consensus-F/<FP1>+....z.

   When caches serve these documents, they do so from the same
   locations.

   2. Document format: generic consensus.

   The format of a flavored consensus is as-yet-unspecified, except
   that the first line is:
      "network-status-version" SP version SP flavor NL

   where version is 3 or higher, and the flavor is a string
   consisting of alphanumeric characters and dashes, matching the
   corresponding flavor listed in the unflavored consensus.

   3. Document format: detached signatures.

   We amend the detached signature format to include more than one
   consensus-digest line, and more than one set of signatures.

   After the consensus-digest line, we allow more lines of the form:
      "additional-digest" SP flavor SP algname SP digest NL

   Before the directory-signature lines, we allow more entries of the form:
      "additional-signature" SP flavor SP algname SP identity SP
           signing-key-digest NL signature.

   [We do not use "consensus-digest" or "directory-signature" for flavored
   consensuses, since this could confuse older Tors.]

   The consensus-signatures URL should contain the signatures
   for _all_ flavors of consensus.

   4. The consensus index:

   Authorities additionally generate and serve a consensus-index
   document.  Its format is:

       Header ValidAfter ValidUntil Documents Signatures

       Header = "consensus-index" SP version NL
       ValidAfter = as in a consensus
       ValidUntil = as in a consensus
       Documents = Document*
       Document = "document" SP flavor SP SignedLength
                                    1*(SP AlgorithmName "=" Digest) NL
       Signatures = Signature*
       Signature = "directory-signature" SP algname SP identity
                           SP signing-key-digest NL signature

    There must be one Document line for each generated consensus flavor.
    Each Document line describes the length of the signed portion of
    a consensus (the signatures themselves are not included), along
    with one or more digests of that signed portion.  Digests are
    given in hex.  The algorithm "sha256" MUST be included; others
    are allowed.

    The algname part of a signature describes what algorithm was
    used to hash the identity and signing keys, and to compute the
    signature.  The algorithm "sha256" MUST be recognized;
    signatures with unrecognized algorithms MUST be ignored.
    (See below).

    The consensus index is made available at
       /tor/status-vote/(current|next)/consensus-index.z.

    Caches should fetch this document so they can check the
    correctness of the different consensus documents they fetch.
    They do not need to check anything about an unrecognized
    consensus document beyond its digest and length.

    4.1. The "sha256" signature format.

    The 'SHA256' signature format for directory objects is defined as
    the RSA signature of the OAEP+-padded SHA256 digest of the item to
    be signed.  When checking signatures, the signature MUST be treated
    as valid if the signature material begins with SHA256(document);
    this allows us to add other data later.

Considerations:

    - We should not create a new flavor of consensus when adding a
      field instead wouldn't be too onerous.

    - We should not proliferate flavors lightly: clients will be
      distinguishable based on which flavor they download.

Migration:

    - Stage one: authorities begin generating and serving
      consensus-index files.

    - Stage two: Caches begin downloading consensus-index files,
      validating them, and using them to decide what flavors of
      consensus documents to cache.  They download all listed
      documents, and compare them to the digests given in the
      consensus.

    - Stage three: Once we want to make a significant change to the
      consensus format, we deploy another flavor of consensus at the
      authorities.  This will immediately start getting cached by the
      caches, and clients can start fetching the new flavor without
      waiting a version or two for enough caches to begin supporting
      it.

Acknowledgements:

    Aspects of this design and its applications to hash migration were
    heavily influenced by IRC conversations with Marian.

Filename: 163-detecting-clients.txt
Title: Detecting whether a connection comes from a client
Author: Nick Mathewson
Created: 22-May-2009
Target: 0.2.2
Status: Superseded

[Note: Actually, this is partially done, partially superseded
       -nickm, 9 May 2011]


Overview:

   Some aspects of Tor's design require relays to distinguish
   connections from clients from connections that come from relays.
   The existing means for doing this is easy to spoof.  We propose
   a better approach.

Motivation:

   There are at least two reasons for which Tor servers want to tell
   which connections come from clients and which come from other
   servers:

     1) Some exits, proposal 152 notwithstanding, want to disallow
        their use as single-hop proxies.
     2) Some performance-related proposals involve prioritizing
        traffic from relays, or limiting traffic per client (but not
        per relay).

   Right now, we detect client vs server status based on how the
   client opens circuits.  (Check out the code that implements the
   AllowSingleHopExits option if you want all the details.)  This
   method is depressingly easy to fake, though.  This document
   proposes better means.

Goals:

   To make grabbing relay privileges at least as difficult as just
   running a relay.

   In the analysis below, "using server privileges" means taking any
   action that only servers are supposed to do, like delivering a
   BEGIN cell to an exit node that doesn't allow single hop exits,
   or claiming server-like amounts of bandwidth.

Passive detection:

   A connection is definitely a client connection if it takes one of
   the TLS methods during setup that does not establish an identity
   key.

   A circuit is definitely a client circuit if it is initiated with
   a CREATE_FAST cell, though the node could be a client or a server.

   A node that's listed in a recent consensus is probably a server.

   A node to which we have successfully extended circuits from
   multiple origins is probably a server.

Active detection:

   If a node doesn't try to use server privileges at all, we never
   need to care whether it's a server.

   When a node or circuit tries to use server privileges, if it is
   "definitely a client" as per above, we can refuse it immediately.

   If it's "probably a server" as per above, we can accept it.

   Otherwise, we have either a client, or a server that is neither
   listed in any consensus or used by any other clients -- in other
   words, a new or private server.

   For these servers, we should attempt to build one or more test
   circuits through them.  If enough of the circuits succeed, the
   node is a real relay.  If not, it is probably a client.

   While we are waiting for the test circuits to succeed, we should
   allow a short grace period in which server privileges are
   permitted.  When a test is done, we should remember its outcome
   for a while, so we don't need to do it again.

Why it's hard to do good testing:

   Doing a test circuit starting with an unlisted router requires
   only that we have an open connection for it.  Doing a test
   circuit starting elsewhere _through_ an unlisted router--though
   more reliable-- would require that we have a known address, port,
   identity key, and onion key for the router.  Only the address and
   identity key are easily available via the current Tor protocol in
   all cases.

   We could fix this part by requiring that all servers support
   BEGIN_DIR and support downloading at least a current descriptor
   for themselves.

Open questions:

   What are the thresholds for the needed numbers of circuits
   for us to decide that a node is a relay?

      [Suggested answer: two circuits from two distinct hosts.]

   How do we pick grace periods?  How long do we remember the
   outcome of a test?

      [Suggested answer: 10 minute grace period; 48 hour memory of
      test outcomes.]

   If we can build circuits starting at a suspect node, but we don't
   have enough information to try extending circuits elsewhere
   through the node, should we conclude that the node is
   "server-like" or not?

      [Suggested answer: for now, just try making circuits through
      the node.  Extend this to extending circuits as needed.]

Filename: 164-reporting-server-status.txt
Title: Reporting the status of server votes
Author: Nick Mathewson
Created: 22-May-2009
Status: Obsolete

Notes: This doesn't work with the current things authorities do,
 though we could revise it to work if we ever want to do this.

Overview:

   When a given node isn't listed in the directory, it isn't always easy
   to tell why.  This proposal suggest a quick-and-dirty way for
   authorities to export not only how they voted, but why, and a way to
   collate the information.

Motivation:

   Right now, if you want to know the reason why your server was listed
   a certain way in the Tor directory, the following steps are
   recommended:

       - Look through your log for reports of what the authority said
         when you tried to upload.

       - Look at the consensus; see if you're listed.

       - Wait a while, see if things get better.

       - Download the votes from all the authorities, and see how they
         voted.  Try to figure out why.

       - If you think they'll listen to you, ask some authority
         operators to look you up in their mtbf files and logs to see
         why they voted as they did.

   This is far too hard.

Solution:

   We should add a new vote-like information-only document that
   authorities serve on request.  Call it a "vote info".  It is
   generated at the same time as a vote, but used only for
   determining why a server voted as it did.  It is served from
   /tor/status-vote-info/current/authority[.z]

   It differs from a vote in that:

   * Its vote-status field is 'vote-info'.

   * It includes routers that the authority would not include
     in its vote.

     For these, it includes an "omitted" line with an English
     message explaining why they were omitted.

   * For each router, it includes a line describing its WFU and
     MTBF.  The format is:

       "stability <mtbf> up-since='date'"
       "uptime <wfu> down-since='date'"

   * It describes the WFU and MTBF thresholds it requires to
     vote for a given router in various roles in the header.
     The format is:

       "flag-requirement <flag-name> <field> <op> <value>"

     e.g.

       "flag-requirement Guard uptime > 80"

   * It includes info on routers all of whose descriptors that
     were uploaded but rejected over the past few hours.  The
     "r" lines for these are the same as for regular routers.
     The other lines are omitted for these routers, and are
     replaced with a single "rejected" line, explaining (in
     English) why the router was rejected.


   A status site (like Torweather or Torstatus or another
   tool) can poll these files when they are generated, collate
   the data, and make it available to server operators.

Risks:

   This document makes no provisions for caching these "vote
   info" documents.  If many people wind up fetching them
   aggressively from the authorities, that would be bad.



Filename: 165-simple-robust-voting.txt
Title: Easy migration for voting authority sets
Author: Nick Mathewson
Created: 2009-05-28
Status: Rejected


Status: rejected as too complex.

Overview:

  This proposal describes an easy-to-implement, easy-to-verify way to
  change the set of authorities without creating a "flag day" situation.

Motivation:

  From proposal 134 ("More robust consensus voting with diverse
  authority sets") by Peter Palfrader:

      Right now there are about five authoritative directory servers
      in the Tor network, tho this number is expected to rise to about
      15 eventually.

      Adding a new authority requires synchronized action from all
      operators of directory authorities so that at any time during the
      update at least half of all authorities are running and agree on
      who is an authority.  The latter requirement is there so that the
      authorities can arrive at a common consensus: Each authority
      builds the consensus based on the votes from all authorities it
      recognizes, and so a different set of recognized authorities will
      lead to a different consensus document.

  In response to this problem, proposal 134 suggested that every
  candidate authority list in its vote whom it believes to be an
  authority.  These A-says-B-is-an-authority relationships form a
  directed graph.  Each authority then iteratively finds the largest
  clique in the graph and remove it, until they find one containing
  them.  They vote with this clique.

  Proposal 134 had some problems:

    - It had a security problem in that M hostile authorities in a
      clique could effectively kick out M-1 honest authorities.  This
      could enable a minority of the original authorities to take over.

    - It was too complex in its implications to analyze well: it took us
      over a year to realize that it was insecure.

    - It tried to solve a bigger problem: general fragmentation of
      authority trust.  Really, all we wanted to have was the ability to
      add and remove authorities without forcing a flag day.

Proposed protocol design:

   A "Voting Set" is a set of authorities.  Each authority has a list of
   the voting sets it considers acceptable.  These sets are chosen
   manually by the authority operators. They must always contain the
   authority itself.  Each authority lists all of these voting sets in
   its votes.

   Authorities exchange votes with every other authority in any of their
   voting sets.

   When it is time to calculate a consensus, an authority picks votes from
   whichever voting set it lists that is listed by the most members of
   that set.  In other words, given two sets S1 and S2 that an authority
   lists, that authority will prefer to vote with S1 over S2 whenever
   the number of other authorities in S1 that themselves list S1 is
   higher than the number of other authorities in S2 that themselves
   list S2.

   For example, suppose authority A recognizes two sets, "A B C D" and
   "A E F G H".  Suppose that the first set is recognized by all of A,
   B, C, and D, whereas the second set is recognized only by A, E, and
   F.  Because the first set is recognize by more of the authorities in
   it than the other one, A will vote with the first set.

   Ties are broken in favor of some arbitrary function of the identity
   keys of the authorities in the set.

How to migrate authority sets:

   In steady state, each authority operator should list only the current
   actual voting set as accepted.

   When we want to add an authority, each authority operator configures
   his or her server to list two voting sets: one containing all the old
   authorities, and one containing the old authorities and the new
   authority too.  Once all authorities are listing the new set of
   authorities, they will start voting with that set because of its
   size.

   What if one or two authority operators are slow to list the new set?
   Then the other operators can stop listing the old set once there are
   enough authorities listing the new set to make its voting successful.
   (Note that these authorities not listing the new set will still have
   their votes counted, since they themselves will be members of the new
   set.  They will only fail to sign the consensus generated by the
   other authorities who are using the new set.)

   When we want to remove an authority, the operators list two voting
   sets: one containing all the authorities, and one omitting the
   authority we want to remove.  Once enough authorities list the new
   set as acceptable, we start having authority operators stop listing
   the old set.  Once there are more listing the new set than the old
   set, the new set will win.

Data format changes:

   Add a new 'voting-set' line to the vote document format.  Allow it to
   occur any number of times.  Its format is:

      voting-set SP 'fingerprint' SP 'fingerprint' ... NL

   where each fingerprint is the hex fingerprint of an identity key of
   an authority.  Sort fingerprints in ascending order.

   When the consensus method is at least 'X' (decide this when we
   implement the proposal), add this line to the consensus format as
   well, before the first dir-source line.  [This information is not
   redundant with the dir-source sections in the consensus: If an
   authority is recognized but didn't vote, that authority will appear in
   the voting-set line but not in the dir-source sections.]

   We don't need to list other information about authorities in our
   vote.

Migration issues:

   We should keep track somewhere which Tor client versions
   recognized which authorities.

Acknowledgments:

   The design came out of an IRC conversation with Peter Palfrader.  He
   had the basic idea first.
Filename: 166-statistics-extra-info-docs.txt
Title: Including Network Statistics in Extra-Info Documents
Author: Karsten Loesing
Created: 21-Jul-2009
Target: 0.2.2
Status: Closed

Change history:

  21-Jul-2009  Initial proposal for or-dev


Overview:

  The Tor network has grown to almost two thousand relays and millions
  of casual users over the past few years. With growth has come
  increasing performance problems and attempts by some countries to
  block access to the Tor network. In order to address these problems,
  we need to learn more about the Tor network. This proposal suggests to
  measure additional statistics and include them in extra-info documents
  to help us understand the Tor network better.


Introduction:

  As of May 2009, relays, bridges, and directories gather the following
  data for statistical purposes:

  - Relays and bridges count the number of bytes that they have pushed
    in 15-minute intervals over the past 24 hours. Relays and bridges
    include these data in extra-info documents that they send to the
    directory authorities whenever they publish their server descriptor.

  - Bridges further include a rough number of clients per country that
    they have seen in the past 48 hours in their extra-info documents.

  - Directories can be configured to count the number of clients they
    see per country in the past 24 hours and to write them to a local
    file.

  Since then we extended the network statistics in Tor. These statistics
  include:

  - Directories now gather more precise statistics about connecting
    clients. Fixes include measuring in intervals of exactly 24 hours,
    counting unsuccessful requests, measuring download times, etc. The
    directories append their statistics to a local file every 24 hours.

  - Entry guards count the number of clients per country per day like
    bridges do and write them to a local file every 24 hours.

  - Relays measure statistics of the number of cells in their circuit
    queues and how much time these cells spend waiting there. Relays
    write these statistics to a local file every 24 hours.

  - Exit nodes count the number of read and written bytes on exit
    connections per port as well as the number of opened exit streams
    per port in 24-hour intervals. Exit nodes write their statistics to
    a local file.

  The following four sections contain descriptions for adding these
  statistics to the relays' extra-info documents.


Directory request statistics:

  The first type of statistics aims at measuring directory requests sent
  by clients to a directory mirror or directory authority. More
  precisely, these statistics aim at requests for v2 and v3 network
  statuses only. These directory requests are sent non-anonymously,
  either via HTTP-like requests to a directory's Dir port or tunneled
  over a 1-hop circuit.

  Measuring directory request statistics is useful for several reasons:
  First, the number of locally seen directory requests can be used to
  estimate the total number of clients in the Tor network. Second, the
  country-wise classification of requests using a GeoIP database can
  help counting the relative and absolute number of users per country.
  Third, the download times can give hints on the available bandwidth
  capacity at clients.

  Directory requests do not give any hints on the contents that clients
  send or receive over the Tor network. Every client requests network
  statuses from the directories, so that there are no anonymity-related
  concerns to gather these statistics. It might be, though, that clients
  wish to hide the fact that they are connecting to the Tor network.
  Therefore, IP addresses are resolved to country codes in memory,
  events are accumulated over 24 hours, and numbers are rounded up to
  multiples of 4 or 8.

   "dirreq-stats-end" YYYY-MM-DD HH:MM:SS (NSEC s) NL
      [At most once.]

      YYYY-MM-DD HH:MM:SS defines the end of the included measurement
      interval of length NSEC seconds (86400 seconds by default).

      A "dirreq-stats-end" line, as well as any other "dirreq-*" line,
      is only added when the relay has opened its Dir port and after 24
      hours of measuring directory requests.

   "dirreq-v2-ips" CC=N,CC=N,... NL
      [At most once.]
   "dirreq-v3-ips" CC=N,CC=N,... NL
      [At most once.]

      List of mappings from two-letter country codes to the number of
      unique IP addresses that have connected from that country to
      request a v2/v3 network status, rounded up to the nearest multiple
      of 8. Only those IP addresses are counted that the directory can
      answer with a 200 OK status code.

   "dirreq-v2-reqs" CC=N,CC=N,... NL
      [At most once.]
   "dirreq-v3-reqs" CC=N,CC=N,... NL
      [At most once.]

      List of mappings from two-letter country codes to the number of
      requests for v2/v3 network statuses from that country, rounded up
      to the nearest multiple of 8. Only those requests are counted that
      the directory can answer with a 200 OK status code.

   "dirreq-v2-share" num% NL
      [At most once.]
   "dirreq-v3-share" num% NL
      [At most once.]

      The share of v2/v3 network status requests that the directory
      expects to receive from clients based on its advertised bandwidth
      compared to the overall network bandwidth capacity. Shares are
      formatted in percent with two decimal places. Shares are
      calculated as means over the whole 24-hour interval.

   "dirreq-v2-resp" status=num,... NL
      [At most once.]
   "dirreq-v3-resp" status=nul,... NL
      [At most once.]

      List of mappings from response statuses to the number of requests
      for v2/v3 network statuses that were answered with that response
      status, rounded up to the nearest multiple of 4. Only response
      statuses with at least 1 response are reported. New response
      statuses can be added at any time. The current list of response
      statuses is as follows:

      "ok": a network status request is answered; this number
         corresponds to the sum of all requests as reported in
         "dirreq-v2-reqs" or "dirreq-v3-reqs", respectively, before
         rounding up.
      "not-enough-sigs: a version 3 network status is not signed by a
         sufficient number of requested authorities.
      "unavailable": a requested network status object is unavailable.
      "not-found": a requested network status is not found.
      "not-modified": a network status has not been modified since the
         If-Modified-Since time that is included in the request.
      "busy": the directory is busy.

   "dirreq-v2-direct-dl" key=val,... NL
      [At most once.]
   "dirreq-v3-direct-dl" key=val,... NL
      [At most once.]
   "dirreq-v2-tunneled-dl" key=val,... NL
      [At most once.]
   "dirreq-v3-tunneled-dl" key=val,... NL
      [At most once.]

      List of statistics about possible failures in the download process
      of v2/v3 network statuses. Requests are either "direct"
      HTTP-encoded requests over the relay's directory port, or
      "tunneled" requests using a BEGIN_DIR cell over the relay's OR
      port. The list of possible statistics can change, and statistics
      can be left out from reporting. The current list of statistics is
      as follows:

      Successful downloads and failures:

      "complete": a client has finished the download successfully.
      "timeout": a download did not finish within 10 minutes after
         starting to send the response.
      "running": a download is still running at the end of the
         measurement period for less than 10 minutes after starting to
         send the response.

      Download times:

      "min", "max": smallest and largest measured bandwidth in B/s.
      "d[1-4,6-9]": 1st to 4th and 6th to 9th decile of measured
         bandwidth in B/s. For a given decile i, i/10 of all downloads
         had a smaller bandwidth than di, and (10-i)/10 of all downloads
         had a larger bandwidth than di.
      "q[1,3]": 1st and 3rd quartile of measured bandwidth in B/s. One
         fourth of all downloads had a smaller bandwidth than q1, one
         fourth of all downloads had a larger bandwidth than q3, and the
         remaining half of all downloads had a bandwidth between q1 and
         q3.
      "md": median of measured bandwidth in B/s. Half of the downloads
         had a smaller bandwidth than md, the other half had a larger
         bandwidth than md.


Entry guard statistics:

  Entry guard statistics include the number of clients per country and
  per day that are connecting directly to an entry guard.

  Entry guard statistics are important to learn more about the
  distribution of clients to countries. In the future, this knowledge
  can be useful to detect if there are or start to be any restrictions
  for clients connecting from specific countries.

  The information which client connects to a given entry guard is very
  sensitive. This information must not be combined with the information
  what contents are leaving the network at the exit nodes. Therefore,
  entry guard statistics need to be aggregated to prevent them from
  becoming useful for de-anonymization. Aggregation includes resolving
  IP addresses to country codes, counting events over 24-hour intervals,
  and rounding up numbers to the next multiple of 8.

   "entry-stats-end" YYYY-MM-DD HH:MM:SS (NSEC s) NL
      [At most once.]

      YYYY-MM-DD HH:MM:SS defines the end of the included measurement
      interval of length NSEC seconds (86400 seconds by default).

      An "entry-stats-end" line, as well as any other "entry-*"
      line, is first added after the relay has been running for at least
      24 hours.

   "entry-ips" CC=N,CC=N,... NL
      [At most once.]

      List of mappings from two-letter country codes to the number of
      unique IP addresses that have connected from that country to the
      relay and which are no known other relays, rounded up to the
      nearest multiple of 8.


Cell statistics:

  The third type of statistics have to do with the time that cells spend
  in circuit queues. In order to gather these statistics, the relay
  memorizes when it puts a given cell in a circuit queue and when this
  cell is flushed. The relay further notes the life time of the circuit.
  These data are sufficient to determine the mean number of cells in a
  queue over time and the mean time that cells spend in a queue.

  Cell statistics are necessary to learn more about possible reasons for
  the poor network performance of the Tor network, especially high
  latencies. The same statistics are also useful to determine the
  effects of design changes by comparing today's data with future data.

  There are basically no privacy concerns from measuring cell
  statistics, regardless of a node being an entry, middle, or exit node.

   "cell-stats-end" YYYY-MM-DD HH:MM:SS (NSEC s) NL
      [At most once.]

      YYYY-MM-DD HH:MM:SS defines the end of the included measurement
      interval of length NSEC seconds (86400 seconds by default).

      A "cell-stats-end" line, as well as any other "cell-*" line,
      is first added after the relay has been running for at least 24
      hours.

   "cell-processed-cells" num,...,num NL
      [At most once.]

      Mean number of processed cells per circuit, subdivided into
      deciles of circuits by the number of cells they have processed in
      descending order from loudest to quietest circuits.

   "cell-queued-cells" num,...,num NL
      [At most once.]

      Mean number of cells contained in queues by circuit decile. These
      means are calculated by 1) determining the mean number of cells in
      a single circuit between its creation and its termination and 2)
      calculating the mean for all circuits in a given decile as
      determined in "cell-processed-cells". Numbers have a precision of
      two decimal places.

   "cell-time-in-queue" num,...,num NL
      [At most once.]

      Mean time cells spend in circuit queues in milliseconds. Times are
      calculated by 1) determining the mean time cells spend in the
      queue of a single circuit and 2) calculating the mean for all
      circuits in a given decile as determined in
      "cell-processed-cells".

   "cell-circuits-per-decile" num NL
      [At most once.]

      Mean number of circuits that are included in any of the deciles,
      rounded up to the next integer.


Exit statistics:

  The last type of statistics affects exit nodes counting the number of
  bytes written and read and the number of streams opened per port and
  per 24 hours. Exit port statistics can be measured from looking at
  headers of BEGIN and DATA cells. A BEGIN cell contains the exit port
  that is required for the exit node to open a new exit stream.
  Subsequent DATA cells coming from the client or being sent back to the
  client contain a length field stating how many bytes of application
  data are contained in the cell.

  Exit port statistics are important to measure in order to identify
  possible load-balancing problems with respect to exit policies. Exit
  nodes that permit more ports than others are very likely overloaded
  with traffic for those ports plus traffic for other ports. Improving
  load balancing in the Tor network improves the overall utilization of
  bandwidth capacity.

  Exit traffic is one of the most sensitive parts of network data in the
  Tor network. Even though these statistics do not require looking at
  traffic contents, statistics are aggregated so that they are not
  useful for de-anonymizing users. Only those ports are reported that
  have seen at least 0.1% of exiting or incoming bytes, numbers of bytes
  are rounded up to full kibibytes (KiB), and stream numbers are rounded
  up to the next multiple of 4.

   "exit-stats-end" YYYY-MM-DD HH:MM:SS (NSEC s) NL
      [At most once.]

      YYYY-MM-DD HH:MM:SS defines the end of the included measurement
      interval of length NSEC seconds (86400 seconds by default).

      An "exit-stats-end" line, as well as any other "exit-*" line, is
      first added after the relay has been running for at least 24 hours
      and only if the relay permits exiting (where exiting to a single
      port and IP address is sufficient).

   "exit-kibibytes-written" port=N,port=N,... NL
      [At most once.]
   "exit-kibibytes-read" port=N,port=N,... NL
      [At most once.]

      List of mappings from ports to the number of kibibytes that the
      relay has written to or read from exit connections to that port,
      rounded up to the next full kibibyte.

   "exit-streams-opened" port=N,port=N,... NL
      [At most once.]

      List of mappings from ports to the number of opened exit streams
      to that port, rounded up to the nearest multiple of 4.


Implementation notes:

  Right now, relays that are configured accordingly write similar
  statistics to those described in this proposal to disk every 24 hours.
  With this proposal being implemented, relays include the contents of
  these files in extra-info documents.

  The following steps are necessary to implement this proposal:

  1. The current format of [dirreq|entry|buffer|exit]-stats files needs
     to be adapted to the description in this proposal. This step
     basically means renaming keywords.

  2. The timing of writing the four *-stats files should be unified, so
     that they are written exactly 24 hours after starting the
     relay. Right now, the measurement intervals for dirreq, entry, and
     exit stats starts with the first observed request, and files are
     written when observing the first request that occurs more than 24
     hours after the beginning of the measurement interval. With this
     proposal, the measurement intervals should all start at the same
     time, and files should be written exactly 24 hours later.

  3. It is advantageous to cache statistics in local files in the data
     directory until they are included in extra-info documents. The
     reason is that the 24-hour measurement interval can be very
     different from the 18-hour publication interval of extra-info
     documents. When a relay crashes after finishing a measurement
     interval, but before publishing the next extra-info document,
     statistics would get lost. Therefore, statistics are written to
     disk when finishing a measurement interval and read from disk when
     generating an extra-info document. Only the statistics that were
     appended to the *-stats files within the past 24 hours are included
     in extra-info documents. Further, the contents of the *-stats files
     need to be checked in the process of generating extra-info documents.

  4. With the statistics patches being tested, the ./configure options
     should be removed and the statistics code be compiled by default.
     It is still required for relay operators to add configuration
     options (DirReqStatistics, ExitPortStatistics, etc.) to enable
     gathering statistics. However, in the near future, statistics shall
     be enabled gathered by all relays by default, where requiring a
     ./configure option would be a barrier for many relay operators.
Filename: 167-params-in-consensus.txt
Title: Vote on network parameters in consensus
Author: Roger Dingledine
Created: 18-Aug-2009
Status: Closed
Implemented-In: 0.2.2

0. History


1. Overview

  Several of our new performance plans involve guessing how to tune
  clients and relays, yet we won't be able to learn whether we guessed
  the right tuning parameters until many people have upgraded. Instead,
  we should have directory authorities vote on the parameters, and teach
  Tors to read the currently recommended values out of the consensus.

2. Design

  V3 votes should include a new "params" line after the known-flags
  line. It contains key=value pairs, where value is an integer.

  Consensus documents that are generated with a sufficiently new consensus
  method (7?) then include a params line that includes every key listed
  in any vote, and the median value for that key (in case of ties,
  we use the median closer to zero).

2.1. Planned keys.

  The first planned parameter is "circwindow=101", which is the initial
  circuit packaging window that clients and relays should use. Putting
  it in the consensus will let us perform experiments with different
  values once enough Tors have upgraded -- see proposal 168.

  Later parameters might include a weighting for how much to favor quiet
  circuits over loud circuits in our round-robin algorithm; a weighting
  for how much to prioritize relays over clients if we use an incentive
  scheme like the gold-star design; and what fraction of circuits we
  should throw out from proposal 151.

2.2. What about non-integers?

  I'm not sure how we would do median on non-integer values. Further,
  I don't have any non-integer values in mind yet. So I say we cross
  that bridge when we get to it.

Filename: 168-reduce-circwindow.txt
Title: Reduce default circuit window
Author: Roger Dingledine
Created: 12-Aug-2009
Status: Rejected


0. History


1. Overview

  We should reduce the starting circuit "package window" from 1000 to
  101. The lower package window will mean that clients will only be able
  to receive 101 cells (~50KB) on a circuit before they need to send a
  'sendme' acknowledgement cell to request 100 more.

  Starting with a lower package window on exit relays should save on
  buffer sizes (and thus memory requirements for the exit relay), and
  should save on queue sizes (and thus latency for users).

  Lowering the package window will induce an extra round-trip for every
  additional 50298 bytes of the circuit. This extra step is clearly a
  slow-down for large streams, but ultimately we hope that a) clients
  fetching smaller streams will see better response, and b) slowing
  down the large streams in this way will produce lower e2e latencies,
  so the round-trips won't be so bad.

2. Motivation

  Karsten's torperf graphs show that the median download time for a 50KB
  file over Tor in mid 2009 is 7.7 seconds, whereas the median download
  time for 1MB and 5MB are around 50s and 150s respectively. The 7.7
  second figure is way too high, whereas the 50s and 150s figures are
  surprisingly low.

  The median round-trip latency appears to be around 2s, with 25% of
  the data points taking more than 5s. That's a lot of variance.

  We designed Tor originally with the goal of maximizing
  throughput. We figured that would also optimize other network properties
  like round-trip latency. Looks like we were wrong.

3. Design

  Wherever we initialize the circuit package window, initialize it to
  101 rather than 1000. Reducing it should be safe even when interacting
  with old Tors: the old Tors will receive the 101 cells and send back
  a sendme ack cell. They'll still have much higher deliver windows,
  but the rest of their deliver window will go unused.

  You can find the patch at arma/circwindow. It seems to work.

3.1. Why not 100?

  Tor 0.0.0 through 0.2.1.19 have a bug where they only send the sendme
  ack cell after 101 cells rather than the intended 100 cells.

  Once 0.2.1.19 is obsolete we can change it back to 100 if we like. But
  hopefully we'll have moved to some datagram protocol long before
  0.2.1.19 becomes obsolete.

3.2. What about stream packaging windows?

  Right now the stream packaging windows start at 500. The goal was to
  set the stream window to half the circuit window, to provide a crude
  load balancing between streams on the same circuit. Once we lower
  the circuit packaging window, the stream packaging window basically
  becomes redundant.

  We could leave it in -- it isn't hurting much in either case. Or we
  could take it out -- people building other Tor clients would thank us
  for that step. Alas, people building other Tor clients are going to
  have to be compatible with current Tor clients, so in practice there's
  no point taking out the stream packaging windows.

3.3. What about variable circuit windows?

  Once upon a time we imagined adapting the circuit package window to
  the network conditions. That is, we would start the window small,
  and raise it based on the latency and throughput we see.

  In theory that crude imitation of TCP's windowing system would allow
  us to adapt to fill the network better. In practice, I think we want
  to stick with the small window and never raise it. The low cap reduces
  the total throughput you can get from Tor for a given circuit. But
  that's a feature, not a bug.

4. Evaluation

  How do we know this change is actually smart? It seems intuitive that
  it's helpful, and some smart systems people have agreed that it's
  a good idea (or said another way, they were shocked at how big the
  default package window was before).

  To get a more concrete sense of the benefit, though, Karsten has been
  running torperf side-by-side on exit relays with the old package window
  vs the new one. The results are mixed currently -- it is slightly faster
  for fetching 40KB files, and slightly slower for fetching 50KB files.

  I think it's going to be tough to get a clear conclusion that this is
  a good design just by comparing one exit relay running the patch. The
  trouble is that the other hops in the circuits are still getting bogged
  down by other clients introducing too much traffic into the network.

  Ultimately, we'll want to put the circwindow parameter into the
  consensus so we can test a broader range of values once enough relays
  have upgraded.

5. Transition and deployment

  We should put the circwindow in the consensus (see proposal 167),
  with an initial value of 101. Then as more exit relays upgrade,
  clients should seamlessly get the better behavior.

  Note that upgrading the exit relay will only affect the "download"
  package window. An old client that's uploading lots of bytes will
  continue to use the old package window at the client side, and we
  can't throttle that window at the exit side without breaking protocol.

  The real question then is what we should backport to 0.2.1. Assuming
  this could be a big performance win, we can't afford to wait until
  0.2.2.x comes out before starting to see the changes here. So we have
  two options as I see them:
  a) once clients in 0.2.2.x know how to read the value out of the
  consensus, and it's been tested for a bit, backport that part to
  0.2.1.x.
  b) if it's too complex to backport, just pick a number, like 101, and
  backport that number.

  Clearly choice (a) is the better one if the consensus parsing part
  isn't very complex. Let's shoot for that, and fall back to (b) if the
  patch turns out to be so big that we reconsider.

Filename: 169-eliminating-renegotiation.txt
Title: Eliminate TLS renegotiation for the Tor connection handshake
Author: Nick Mathewson
Created: 27-Jan-2010
Status: Superseded
Target: 0.2.2
Superseded-By: 176

1. Overview

   I propose a backward-compatible change to the Tor connection
   establishment protocol to avoid the use of TLS renegotiation.

   Rather than doing a TLS renegotiation to exchange certificates
   and authenticate the original handshake, this proposal takes an
   approach similar to Steven Murdoch's proposal 124, and uses Tor
   cells to finish authenticating the parties' identities once the
   initial TLS handshake is finished.

   Terminological note: I use "client" below to mean the Tor
   instance (a client or a relay) that initiates a TLS connection,
   and "server" to mean the Tor instance (a relay) that accepts it.

2. Motivation and history

   In the original Tor TLS connection handshake protocol ("V1", or
   "two-cert"), parties that wanted to authenticate provided a
   two-cert chain of X.509 certificates during the handshake setup
   phase.  Every party that wanted to authenticate sent these
   certificates.

   In the current Tor TLS connection handshake protocol ("V2", or
   "renegotiating"), the parties begin with a single certificate
   sent from the server (responder) to the client (initiator), and
   then renegotiate to a two-certs-from-each-authenticating-party.
   We made this change to make Tor's handshake look like a browser
   speaking SSL to a webserver.  (See proposal 130, and
   tor-spec.txt.)  To tell whether to use the V1 or V2 handshake,
   servers look at the list of ciphers sent by the client.  (This is
   ugly, but there's not much else in the ClientHello that they can
   look at.) If the list contains any cipher not used by the V1
   protocol, the server sends back a single cert and expects a
   renegotiation.  If the client gets back a single cert, then it
   withholds its own certificates until the TLS renegotiation phase.

   In other words, initiator behavior now looks like this:

      - Begin TLS negotiation with V2 cipher list; wait for
        certificate(s).
      - If we get a certificate chain:
         - Then we are using the V1 handshake.  Send our own
           certificate chain as part of this initial TLS handshake
           if we want to authenticate; otherwise, send no
           certificates.  When the handshake completes, check
           certificates.  We are now mutually authenticated.

        Otherwise, if we get just a single certificate:
         - Then we are using the V2 handshake.  Do not send any
           certificates during this handshake.
         - When the handshake is done, immediately start a TLS
           renegotiation.  During the renegotiation, expect
           a certificate chain from the server; send a certificate
           chain of our own if we want to authenticate ourselves.
         - After the renegotiation, check the certificates. Then
           send (and expect) a VERSIONS cell from the other side to
           establish the link protocol version.

   And V2 responder behavior now looks like this:

      - When we get a TLS ClientHello request, look at the cipher
        list.
      - If the cipher list contains only the V1 ciphersuites:
         - Then we're doing a V1 handshake.  Send a certificate
           chain.  Expect a possible client certificate chain in
           response.
        Otherwise, if we get other ciphersuites:
         - We're using the V2 handshake.  Send back a single
           certificate and let the handshake complete.
         - Do not accept any data until the client has renegotiated.
         - When the client is renegotiating, send a certificate
           chain, and expect (possibly multiple) certificates in
           reply.
         - Check the certificates when the renegotiation is done.
           Then exchange VERSIONS cells.

   Late in 2009, researchers found a flaw in most applications' use
   of TLS renegotiation: Although TLS renegotiation does not
   reauthenticate any information exchanged before the renegotiation
   takes place, many applications were treating it as though it did,
   and assuming that data sent _before_ the renegotiation was
   authenticated with the credentials negotiated _during_ the
   renegotiation.  This problem was exacerbated by the fact that
   most TLS libraries don't actually give you an obvious good way to
   tell where the renegotiation occurred relative to the datastream.
   Tor wasn't directly affected by this vulnerability, but its
   aftermath hurts us in a few ways:

      1) OpenSSL has disabled renegotiation by default, and created
         a "yes we know what we're doing" option we need to set to
         turn it back on.  (Two options, actually: one for openssl
         0.9.8l and one for 0.9.8m and later.)

      2) Some vendors have removed all renegotiation support from
         their versions of OpenSSL entirely, forcing us to tell
         users to either replace their versions of OpenSSL or to
         link Tor against a hand-built one.

      3) Because of 1 and 2, I'd expect TLS renegotiation to become
         rarer and rarer in the wild, making our own use stand out
         more.

3. Design

3.1. The view in the large

   Taking a cue from Steven Murdoch's proposal 124, I propose that
   we move the work currently done by the TLS renegotiation step
   (that is, authenticating the parties to one another) and do it
   with Tor cells instead of with TLS.

   Using _yet another_ variant response from the responder (server),
   we allow the client to learn that it doesn't need to rehandshake
   and can instead use a cell-based authentication system.  Once the
   TLS handshake is done, the client and server exchange VERSIONS
   cells to determine link protocol version (including
   handshake version).  If they're using the handshake version
   specified here, the client and server arrive at link protocol
   version 3 (or higher), and use cells to exchange further
   authentication information.

3.2. New TLS handshake variant

   We already used the list of ciphers from the clienthello to
   indicate whether the client can speak the V2 ("renegotiating")
   handshake or later, so we can't encode more information there.

   We can, however, change the DN in the certificate passed by the
   server back to the client.  Currently, all V2 certificates are
   generated with CN values ending with ".net".  I propose that we
   have the ".net" commonName ending reserved to indicate the V2
   protocol, and use commonName values ending with ".com" to
   indicate the V3 ("minimal") handshake described herein.

   Now, once the initial TLS handshake is done, the client can look
   at the server's certificate(s).  If there is a certificate chain,
   the handshake is V1.  If there is a single certificate whose
   subject commonName ends in ".net", the handshake is V2 and the
   client should try to renegotiate as it would currently.
   Otherwise, the client should assume that the handshake is V3+.
   [Servers should _only_ send ".com" addesses, to allow room for
   more signaling in the future.]

3.3. Authenticating inside Tor

   Once the TLS handshake is finished, if the client renegotiates,
   then the server should go on as it does currently.

   If the client implements this proposal, however, and the server
   has shown it can understand the V3+ handshake protocol, the
   client immediately sends a VERSIONS cell to the server
   and waits to receive a VERSIONS cell in return.  We negotiate
   the Tor link protocol version _before_ we proceed with the
   negotiation, in case we need to change the authentication
   protocol in the future.

   Once either party has seen the VERSIONS cell from the other, it
   knows which version they will pick (that is, the highest version
   shared by both parties' VERSIONS cells).  All Tor instances using
   the handshake protocol described in 3.2 MUST support at least
   link protocol version 3 as described here.

   On learning the link protocol, the server then sends the client a
   CERT cell and a NETINFO cell.  If the client wants to
   authenticate to the server, it sends a CERT cell, an AUTHENTICATE
   cell, and a NETINFO cell; or it may simply send a NETINFO cell if
   it does not want to authenticate.

   The CERT cell describes the keys that a Tor instance is claiming
   to have.  It is a variable-length cell.  Its payload format is:

        N: Number of certs in cell            [1 octet]
        N times:
           CLEN                               [2 octets]
           Certificate                        [CLEN octets]

   Any extra octets at the end of a CERT cell MUST be ignored.

   Each certificate has the form:

        CertType                              [1 octet]
        CertPurpose                           [1 octet]
        PublicKeyLen                          [2 octets]
        PublicKey                             [PublicKeyLen octets]
        NotBefore                             [4 octets]
        NotAfter                              [4 octets]
        SignerID                              [HASH256_LEN octets]
        SignatureLen                          [2 octets]
        Signature                             [SignatureLen octets]

   where CertType is 1 (meaning "RSA/SHA256")
         CertPurpose is 1 (meaning "link certificate")
         PublicKey is the DER encoding of the ASN.1 representation
            of the RSA key of the subject of this certificate
         NotBefore is a time in HOURS since January 1, 1970, 00:00
            UTC before which this certificate should not be
            considered valid.
         NotAfter is a time in HOURS since January 1, 1970, 00:00
            UTC after which this certificate should not be
            considered valid.
         SignerID is the SHA-256 digest of the public key signing
            this certificate
         and Signature is the signature of all the other fields in
            this certificate, using SHA256 as described in proposal
            158.

   While authenticating, a server need send only a self-signed
   certificate for its identity key.  (Its TLS certificate already
   contains its link key signed by its identity key.)  A client that
   wants to authenticate MUST send two certificates: one containing
   a public link key signed by its identity key, and one self-signed
   cert for its identity.

   Tor instances MUST ignore any certificate with an unrecognized
   CertType or CertPurpose, and MUST ignore extra bytes in the cert.

   The AUTHENTICATE cell proves to the server that the client with
   whom it completed the initial TLS handshake is the one possessing
   the link public key in its certificate.  It is a variable-length
   cell.  Its contents are:

        SignatureType                         [2 octets]
        SignatureLen                          [2 octets]
        Signature                             [SignatureLen octets]

   where SignatureType is 1 (meaning "RSA-SHA256") and Signature is
   an RSA-SHA256 signature of the HMAC-SHA256, using the TLS master
   secret key as its key, of the following elements:

     - The SignatureType field (0x00 0x01)
     - The NUL terminated ASCII string: "Tor certificate verification"
     - client_random, as sent in the Client Hello
     - server_random, as sent in the Server Hello

   Once the above handshake is complete, the client knows (from the
   initial TLS handshake) that it has a secure connection to an
   entity that controls a given link public key, and knows (from the
   CERT cell) that the link public key is a valid public key for a
   given Tor identity.

   If the client authenticates, the server learns from the CERT cell
   that a given Tor identity has a given current public link key.
   From the AUTHENTICATE cell, it knows that an entity with that
   link key knows the master secret for the TLS connection, and
   hence must be the party with whom it's talking, if TLS works.

3.4. Security checks

   If the TLS handshake indicates a V2 or V3+ connection, the server
   MUST reject any connection from the client that does not begin
   with either a renegotiation attempt or a VERSIONS cell containing
   at least link protocol version "3".  If the TLS handshake
   indicates a V3+ connection, the client MUST reject any connection
   where the server sends anything before the client has sent a
   VERSIONS cell, and any connection where the VERSIONS cell does
   not contain at least link protocol version "3".

   If link protocol version 3 is chosen:

     Clients and servers MUST check that all digests and signatures
     on the certificates in CERT cells they are given are as
     described above.

     After the VERSIONS cell, clients and servers MUST close the
     connection if anything besides a CERT or AUTH cell is sent
     before the

     CERT or AUTHENTICATE cells anywhere after the first NETINFO
     cell must be rejected.

   ... [write more here.  What else?] ...

3.5. Summary

   We now revisit the protocol outlines from section 2 to incorporate
   our changes.  New or modified steps are marked with a *.

   The new initiator behavior now looks like this:

      - Begin TLS negotiation with V2 cipher list; wait for
        certificate(s).
      - If we get a certificate chain:
         - Then we are using the V1 handshake.  Send our own
           certificate chain as part of this initial TLS handshake
           if we want to authenticate; otherwise, send no
           certificates.  When the handshake completes, check
           certificates.  We are now mutually authenticated.
        Otherwise, if we get just a single certificate:
         - Then we are using the V2 or the V3+ handshake.  Do not
           send any certificates during this handshake.
         * When the handshake is done, look at the server's
           certificate's subject commonName.
           * If it ends with ".net", we're doing a V2 handshake:
             - Immediately start a TLS renegotiation.  During the
               renegotiation, expect a certificate chain from the
               server; send a certificate chain of our own if we
               want to authenticate ourselves.
             - After the renegotiation, check the certificates. Then
               send (and expect) a VERSIONS cell from the other side
               to establish the link protocol version.
           * If it ends with anything else, assume a V3 or later
             handshake:
             * Send a VERSIONS cell, and wait for a VERSIONS cell
               from the server.
             * If we are authenticating, send CERT and AUTHENTICATE
               cells.
             * Send a NETINFO cell.  Wait for a CERT and a NETINFO
               cell from the server.
             * If the CERT cell contains a valid self-identity cert,
               and the identity key in the cert can be used to check
               the signature on the x.509 certificate we got during
               the TLS handshake, then we know we connected to the
               server with that identity.  If any of these checks
               fail, or the identity key was not what we expected,
               then we close the connection.
             * Once the NETINFO cell arrives, continue as before.

   And V3+ responder behavior now looks like this:

      - When we get a TLS ClientHello request, look at the cipher
        list.

      - If the cipher list contains only the V1 ciphersuites:
         - Then we're doing a V1 handshake.  Send a certificate
           chain.  Expect a possible client certificate chain in
           response.
        Otherwise, if we get other ciphersuites:
         - We're using the V2 handshake.  Send back a single
           certificate whose subject commonName ends with ".com",
           and let the handshake complete.
         * If the client does anything besides renegotiate or send a
           VERSIONS cell, drop the connection.
         - If the client renegotiates immediately, it's a V2
           connection:
           - When the client is renegotiating, send a certificate
             chain, and expect (possibly multiple certificates in
             reply).
           - Check the certificates when the renegotiation is done.
             Then exchange VERSIONS cells.
         * Otherwise we got a VERSIONS cell and it's a V3 handshake.
           * Send a VERSIONS cell, a CERT cell, an AUTHENTICATE
             cell, and a NETINFO cell.
           * Wait for the client to send cells in reply.  If the
             client sends a CERT and an AUTHENTICATE and a NETINFO,
             use them to authenticate the client.  If the client
             sends a NETINFO, it is unauthenticated.  If it sends
             anything else before its NETINFO, it's rejected.

4. Numbers to assign

   We need a version number for this link protocol.  I've been
   calling it "3".

   We need to reserve command numbers for CERT and AUTH cells.  I
   suggest that in link protocol 3 and higher, we reserve command
   numbers 128..240 for variable-length cells.  (241-256 we can hold
   for future extensions.)

5. Efficiency

   This protocol adds a round-trip step when the client sends a
   VERSIONS cell to the server and waits for the {VERSIONS, CERT,
   NETINFO} response in turn.  (The server then waits for the
   client's {NETINFO} or {CERT, AUTHENTICATE, NETINFO} reply,
   but it would have already been waiting for the client's NETINFO,
   so that's not an additional wait.)

   This is actually fewer round-trip steps than required before for
   TLS renegotiation, so that's a win.

6. Open questions:

  - Should we use X.509 certificates instead of the certificate-ish
    things we describe here?  They are more standard, but more ugly.

  - May we cache which certificates we've already verified?  It
    might leak in timing whether we've connected with a given server
    before, and how recently.

  - Is there a better secret than the master secret to use in the
    AUTHENTICATE cell?  Say, a portable one?  Can we get at it for
    other libraries besides OpenSSL?

  - Does using the client_random and server_random data in the
    AUTHENTICATE message actually help us?  How hard is it to pull
    them out of the OpenSSL data structure?

  - Can we give some way for clients to signal "I want to use the
    V3 protocol if possible, but I can't renegotiate, so don't give
    me the V2"?  Clients currently have a fair idea of server
    versions, so they could potentially do the V3+ handshake with
    servers that support it, and fall back to V1 otherwise.

  - What should servers that don't have TLS renegotiation do?  For
    now, I think they should just get it.  Eventually we can
    deprecate the V2 handshake as we did with the V1 handshake.

Title: Configuration options regarding circuit building
Filename: 170-user-path-config.txt
Author: Sebastian Hahn
Created: 01-March-2010
Status: Superseded

Overview:

    This document outlines how Tor handles the user configuration
    options to influence the circuit building process.

Motivation:

    Tor's treatment of the configuration *Nodes options was surprising
    to many users, and quite a few conspiracy theories have crept up. We
    should update our specification and code to better describe and
    communicate what is going during circuit building, and how we're
    honoring configuration. So far, we've been tracking a bugreport
    about this behaviour (
    https://bugs.torproject.org/flyspray/index.php?do=details&id=1090 )
    and Nick replied in a thread on or-talk (
    http://archives.seul.org/or/talk/Feb-2010/msg00117.html ).

    This proposal tries to document our intention for those configuration
    options.

Design:

    Five configuration options are available to users to influence Tor's
    circuit building. EntryNodes and ExitNodes define a list of nodes
    that are for the Entry/Exit position in all circuits. ExcludeNodes
    is a list of nodes that are used for no circuit, and
    ExcludeExitNodes is a list of nodes that aren't used as the last
    hop. StrictNodes defines Tor's behaviour in case of a conflict, for
    example when a node that is excluded is the only available
    introduction point. Setting StrictNodes to 1 breaks Tor's
    functionality in that case, and it will refuse to build such a
    circuit.

    Neither Nick's email nor bug 1090 have clear suggestions how we
    should behave in each case, so I tried to come up with something
    that made sense to me.

Security implications:

    Deviating from normal circuit building can break one's anonymity, so
    the documentation of the above option should contain a warning to
    make users aware of the pitfalls.

Specification:

    It is proposed that the "User configuration" part of path-spec
    (section 2.2.2) be replaced with this:

    Users can alter the default behavior for path selection with
    configuration options. In case of conflicts (excluding and requiring
    the same node) the "StrictNodes" option is used to determine
    behaviour. If a nodes is both excluded and required via a
    configuration option, the exclusion takes preference.

    - If "ExitNodes" is provided, then every request requires an exit
      node on the ExitNodes list. If a request is supported by no nodes
      on that list, and "StrictNodes" is false, then Tor treats that
      request as if ExitNodes were not provided.

    - "EntryNodes" behaves analogously.

    - If "ExcludeNodes" is provided, then no circuit uses any of the
      nodes listed. If a circuit requires an excluded node to be used,
      and "StrictNodes" is false, then Tor uses the node in that
      position while not using any other of the excluded nodes.

    - If "ExcludeExitNodes" is provided, then Tor will not use the nodes
      listed for the exit position in a circuit. If a circuit requires
      an excluded node to be used in the exit position and "StrictNodes"
      is false, then Tor builds that circuit as if ExcludeExitNodes were
      not provided.

    - If a user tries to connect to or resolve a hostname of the form
      <target>.<servername>.exit and the "AllowDotExit" configuration
      option is set to 1, the request is rewritten to a request for
      <target>, and the request is only supported by the exit whose
      nickname or fingerprint is <servername>. If "AllowDotExit" is set
      to 0 (default), any request for <anything>.exit is denied.

    - When any of the *Nodes settings are changed, all circuits are
      expired immediately, to prevent a situation where a previously
      built circuit is used even though some of its nodes are now
      excluded.


Compatibility:

    The old Strict*Nodes options are deprecated, and the StrictNodes
    option is new. Tor users may need to update their configuration file.
Filename: 171-separate-streams.txt
Title: Separate streams across circuits by connection metadata
Author: Robert Hogan, Jacob Appelbaum, Damon McCoy, Nick Mathewson
Created: 21-Oct-2008
Modified: 7-Dec-2010
Status: Closed
Implemented-In: 0.2.3.3-alpha

Summary:

  We propose a new set of options to isolate unrelated streams from one
  another, putting them on separate circuits so that semantically
  unrelated traffic is not inadvertently made linkable.

Motivation:

  Currently, Tor attaches regular streams (that is, ones not carrying
  rendezvous or directory traffic) to circuits based only on whether Tor
  circuit's current exit node supports the destination, and whether the
  circuit has been dirty (that is, in use) for too long.

  This means that traffic that would otherwise be unrelated sometimes
  gets sent over the same circuit, allowing the exit node to link such
  streams with certainty, and allowing other parties to link such
  streams probabilistically.

  Older versions of onion routing tried to address this problem by
  sending every stream over a separate circuit; performance issues made
  this unfeasible. Moreover, in the presence of a localized adversary,
  separating streams by circuits increases the odds that, for any given
  linked set of streams, at least one will go over a compromised
  circuit.

  Therefore we ought to look for ways to allow streams that ought to be
  linked to travel over a single circuit, while keeping streams that
  ought not be linked isolated to separate circuits.

Discussion:

  Let's call a series of inherently-linked streams (like a set of
  streams downloading objects from the same webpage, or a browsing
  session where the user requests several related webpages) a "Session".

  "Sessions" are a necessarily a fuzzy concept.  While users typically
  consider some activities as wholly unrelated to each other ("My IM
  session has nothing to do with my web browsing!"), the boundaries
  between activities are sometimes hard to determine.  If I'm reading
  lolcats in one browser tab and reading about treatments for an
  embarrassing disease in another, those are probably separate sessions.
  If I search for a forum, log in, read it for a while, and post a few
  messages on unrelated topics, that's probably all the same session.

  So with the proviso that no automated process can identify sessions
  100% accurately, let's see which options we have available.

  Generally, all the streams on a session come from a single
  application.  Unfortunately, isolating streams by application
  automatically isn't feasible, given the lack of any nice
  cross-platform way to tell which local process originated a given
  connection.  (Yes, lsof works.  But a quick review of the lsof code
  should be sufficient to scare you away from thinking there is a
  portable option, much less a portable O(1) option.)  So instead, we'll
  have to use some other aspect of a Tor request as a proxy for the
  application.

  Generally, traffic from separate applications is not in the same
  session.

  With some applications (IRC, for example), each stream is a session.

  Some applications (most notably web browsing) can't be meaningfully
  split into sessions without inspecting the traffic itself and
  maintaining a lot of state.

  How well do ports correspond to sessions?  Early versions of this
  proposal focused on using destination ports as a proxy for
  application, since a connection to port 22 for SSH is probably not in
  the same session as one to port 80. This only works with some
  applications better than others, though: while SSH users typically
  know when they're on port 22 and when they aren't, a web browser can
  be coaxed (though img urls or any number of releated tricks) into
  connecting to any port at all.  Moreover, when Tor gets a DNS lookup
  request, it doesn't know in advance which port the resulting address
  will be used to connect to.

  So in summary, each kind of traffic wants to follow different rules,
  and assuming the existence of a web browser and a hostile web page or
  exit node, we can't tell one kind of traffic from another by simply
  looking at the destination:port of the traffic.

  Fortunately, we're not doomed.

Design:

  When a stream arrives at Tor, we have the following data to examine:
    1) The destination address
    2) The destination port (unless this a DNS lookup)
    3) The protocol used by the application to send the stream to Tor:
       SOCKS4, SOCKS4A, SOCKS5, or whatever local "transparent proxy"
       mechanism the kernel gives us.
    4) The port used by the application to send the stream to Tor --
       that is, the SOCKSListenAddress or TransListenAddress that the
       application used, if we have more than one.
    5) The SOCKS username and password, if any.
    6) The source address and port for the application.

  We propose to use 3, 4, and 5 as a backchannel for applications to
  tell Tor about different sessions.  Rather than running only one
  SOCKSPort, a Tor user who would prefer better session isolation should
  run multiple SOCKSPorts/TransPorts, and configure different
  applications to use separate ports. Applications that support SOCKS
  authentication can further be separated on a single port by their
  choice of username/password.  Streams sent to separate ports or using
  different authentication information should never be sent over the
  same circuit.  We allow each port to have its own settings for
  isolation based on destination port, destination address, or both.

  Handling DNS can be a challenge.  We can get hostnames by one of three
  means:

    A) A SOCKS4a request, or a SOCKS5 request with a hostname.  This
       case is handled trivially using the rules above.
    B) A RESOLVE request on a SOCKSPort.  This case is handled using the
       rules above, except that port isolation can't work to isolate
       RESOLVE requests into a proper session, since we don't know which
       port will eventually be used when we connect to the returned
       address.
    C) A request on a DNSPort.  We have no way of knowing which
       address/port will be used to connect to the requested address.

  When B or C is required but problematic, we could favor the use of
  AutomapHostsOnResolve.

Interface:

  We propose that {SOCKS,Natd,Trans,DNS}ListenAddr be deprecated in
  favor of an expanded {SOCKS,Natd,Trans,DNS}Port syntax:

  ClientPortLine = OptionName SP (Addr ":")? Port (SP Options?)
  OptionName = "SOCKSPort" / "NatdPort" / "TransPort" / "DNSPort"
  Addr = An IPv4 address / an IPv6 address surrounded by brackets.
         If optional, we default to 127.0.0.1
  Port = An integer from 1 through 65535 inclusive
  Options = Option
  Options = Options SP Option
  Option = IsolateOption / GroupOption
  GroupOption = "SessionGroup=" UINT
  IsolateOption =  OptNo ("IsolateDestPort" / "IsolateDestAddr" /
         "IsolateSOCKSUser"/ "IsolateClientProtocol" /
         "IsolateClientAddr") OptPlural
  OptNo = "No" ?
  OptPlural = "s" ?
  SP = " "
  UINT = An unsigned integer

  All options are case-insensitive.

  The "IsolateSOCKSUser" and "IsolateClientAddr" options are on by
  default; "NoIsolateSOCKSUser" and "NoIsolateClientAddr" respectively
  turn them off.  The IsolateDestPort and IsolateDestAddr and
  IsolateClientProtocol options are off by default.  NoIsolateDestPort and
  NoIsolateDestAddr and NoIsolateClientProtocol have no effect.

  Given a set of ClientPortLines, streams must NOT be placed on the same
  circuit if ANY of the following hold:

    * They were sent to two different client ports, unless the two
      client ports both specify a "SessionGroup" option with the same
      integer value.
    * At least one was sent to a client port with the IsolateDestPort
      active, and they have different destination ports.
    * At least one was sent to a client port with IsolateDestAddr
      active, and they have different destination addresses.
    * At least one was sent to a client port with IsolateClientProtocol
      active, and they use different protocols (where SOCKS4, SOCKS4a,
      SOCKS5, TransPort, NatdPort, and DNS are the protocols in question)
    * At least one was sent to a client port with IsolateSOCKSUser
      active, and they have different SOCKS username/password values
      configurations.  (For the purposes of this option, the
      username/password pair of ""/"" is distinct from SOCKS without
      authentication, and both are distinct from any non-SOCKS client's
      non-authentication.)
    * At least one was sent to a client port with IsolateClientAddr
      active, and they came from different client addresses.  (For the
      purpose of this option, any local interface counts as the same
      address.  So if the host is configured with addresses 10.0.0.1,
      192.0.32.10, and 127.0.0.1, then traffic from those addresses can
      leave on the same circuit, but traffic to from 10.0.0.2 (for
      example) could not share a circuit with any of them.)

  These rules apply regardless of whether the streams are active at the
  same time.  In other words, if the rules say that streams A and B must
  not be on the same circuit, and stream A is attached to circuit X,
  then stream B must never be attached to stream X, even if stream A is
  closed first.

Alternative Interface:

  We're cramming a lot onto one line in the design above.  Perhaps
  instead it would be a better idea to have grouped lines of the form:

    StreamGroup 1
    SOCKSPort 9050
    TransPort 9051
    IsolateDestPort 1
    IsolateClientProtocol 0
    EndStreamGroup

    StreamGroup 2
    SOCKSPort 9052
    DNSPort 9053
    IsolateDestAddr 1
    EndStreamGroup

  This would be equivalent to:
   SOCKSPort 9050 SessionGroup=1 IsolateDestPort NoIsolateClientProtocol
   TransPort 9051 SessionGroup=1 IsolateDestPort NoIsolateClientProtocol
   SOCKSPort 9052 SessionGroup=2 IsolateDestAddr
   DNSPort   9053 SessionGroup=2 IsolateDestAddr

  But it would let us extend range of allowed options later without
  having client port lines group without bound.  For example, we might
  give different circuit building parameters to different session
  groups.

Example of use:

  Suppose that we want to use a web browser, an IRC client, and a SSH
  client all at the same time.  Let's assume that we want web traffic to
  be isolated from all other traffic, even if the browser makes
  connections to ports usually used for IRC or SSH.  Let's also assume
  that IRC and SSH are both used for relatively long-lived connections,
  and we want to keep all IRC/SSH sessions separate from one another.

  In this case, we could say:

    SOCKSPort 9050
    SOCKSPort 9051 IsolateDestAddr IsolateDestPort

  We would then configure our browser to use 9050 and our IRC/SSH
  clients to use 9051.

Advanced example of use, #2:

  Suppose that we have a bunch of applications, and we launch them all
  using torsocks, and we want to keep each applications isolated from
  one another.  We just create a shell script, "torlaunch":
    #!/bin/bash
    export TORSOCKS_USERNAME="$1"
    exec torsocks $@
  And we configure our SOCKSPort with IsolateSOCKSUser.

  Or if we're on Linux and we want to isolate by application invocation,
  we would change the TORSOCKS_USERNAME line to:

    export TORSOCKS_USERNAME="`cat /proc/sys/kernel/random/uuid`"

Advanced example of use, #2:

  Now suppose that we want to achieve the benefits of the first example
  of use, but we are stuck using transparent proxies.  Let's suppose
  this is Linux.

    TransPort 9090
    TransPort 9091 IsolateDestAddr IsolateDestPort
    DNSPort 5353
    AutomapHostsOnResolve 1

  Here we use the iptables --cmd-owner filter to distinguish which
  command is originating the packets, directing traffic from our irc
  client and our SSH client to port 9091, and directing other traffic to
  9090.  Using AutomapHostsOnResolve will confuse ssh in its default
  configuration; we'll need to find a way around that.

Security Risks:

  Disabling IsolateClientAddr is a pretty bad idea.

  Setting up a set of applications to use this system effectively is a
  big problem.  It's likely that lots of people who try to do this will
  mess it up.  We should try to see which setups are sensible, and see
  if we can provide good feedback to explain which streams are isolated
  how.

Performance Risks:

  This proposal will result in clients building many more circuits than
  they do today.  To avoid accidentally hammering the network, we should
  have in-process limits on the maximum circuit creation rate and the
  total maximum client circuits.

Specification:

  The Tor client circuit selection process is not entirely specified.
  Any client circuit specification must take these changes into account.

Implementation notes:

  The more obvious ways to implement the "find a good circuit to attach
  to" part of this proposal involve doing an O(n_circuits) operation
  every time we have a stream to attach.  We already do such an
  operation, so it's not as if we need to hunt for fancy ways to make it
  O(1).  What will be harder is implementing the "launch circuits as
  needed" part of the proposal.  Still, it should come down to "a simple
  matter of programming."

  The SOCKS4 spec has the client provide authentication info when it
  connects; accepting such info is no problem.  But the SOCKS5 spec has
  the client send a list of known auth methods, then has the server send
  back the authentication method it chooses.  We'll need to update the
  SOCKS5 implementation so it can accept user/password authentication if
  it's offered.

  If we use the second syntax for describing these options, we'll want
  to add a new "section-based" entry type for the configuration parser.
  Not a huge deal; we already have kludged up something similar for
  hidden service configurations.

  Opening circuits for predicted ports has the potential to get a little
  more complicated; we can probably get away with the existing
  algorithm, though, to see where its weak points are and look for
  better ones.

  Perhaps we can get our next-gen HTTP proxy to communicate browser tab
  or session into to tor via authentication, or have torbutton do it
  directly.  More design is needed here, though.

Alternative designs:

  The implementation of this option may want to consider cases where the
  same exit node is shared by two or more circuits and
  IsolateStreamsByPort is in force.  Since one possible use of the option
  is to reduce the opportunity of Exit Nodes to attack traffic from the
  same source on multiple ports, the implementation may need to ensure
  that circuits reserved for the exclusive use of given ports do not
  share the same exit node.  On the other hand, if our goal is only that
  streams should be unlinkable, deliberately shunting them to different
  exit nodes is unnecessary and slightly counterproductive.

  Earlier versions of this design included a mechanism to isolate
  _particular_ destination ports and addresses, so that traffic sent to,
  say, port 22 would never share a port with any traffic *not* sent to
  port 22.  You can achieve this here by having all applications that
  send traffic to one of these ports use a separate SOCKSPort, and
  then setting IsolateDestPorts on that SOCKSPort.

Future work:

  Nikita Borisov suggests that different session profiles -- so long as
  there aren't too many of them -- could well get different guard node
  allocations in order to prevent guard profiling.  This can be done
  orthogonally to the rest of this proposal.

Lingering questions:

  I suspect there are issues remaining with DNS and TransPort users, and
  that my "just use AutomapHostsOnResolve" suggestion may be
  insufficient.
Filename: 172-circ-getinfo-option.txt
Title: GETINFO controller option for circuit information
Author: Damian Johnson
Created: 03-June-2010
Status: Reserve

Overview:

    This details an additional GETINFO option that would provide information
    concerning a relay's current circuits.

Motivation:

    The original proposal was for connection related information, but Jake make
    the excellent point that any information retrieved from the control port
    is...
    
      1. completely ineffectual for auditing purposes since either (a) these
      results can be fetched from netstat already or (b) the information would
      only be provided via tor and can't be validated.
      
      2. The more useful uses for connection information can be achieved with
      much less (and safer) information.
    
    Hence the proposal is now for circuit based rather than connection based
    information. This would strip the most controversial and sensitive data
    entirely (ip addresses, ports, and connection based bandwidth breakdowns)
    while still being useful for the following purposes:

    - Basic Relay Usage Questions
    How is the bandwidth I'm contributing broken down? Is it being evenly
    distributed or is someone hogging most of it? Do these circuits belong to
    the hidden service I'm running or something else? Now that I'm using exit
    policy X am I desirable as an exit, or are most people just using me as a
    relay?

    - Debugging
    Say a relay has a restrictive firewall policy for outbound connections,
    with the ORPort whitelisted but doesn't realize that tor needs random high
    ports. Tor would report success ("your orport is reachable - excellent")
    yet the relay would be nonfunctional. This proposed information would
    reveal numerous RELAY -> YOU -> UNESTABLISHED circuits, giving a good
    indicator of what's wrong.

    - Visualization
    A nice benefit of visualizing tor's behavior is that it becomes a helpful
    tool in puzzling out how tor works. For instance, tor spawns numerous
    client connections at startup (even if unused as a client). As a newcomer
    to tor these asymmetric (outbound only) connections mystified me for quite
    a while until until Roger explained their use to me. The proposed
    TYPE_FLAGS would let controllers clearly label them as being client
    related, making their purpose a bit clearer.

    At the moment connection data can only be retrieved via commands like
    netstat, ss, and lsof. However, providing an alternative via the control
    port provides several advantages:

      - scrubbing for private data
          Raw connection data has no notion of what's sensitive and what is
          not. The relay's flags and cached consensus can be used to take
          educated guesses concerning which connections could possibly belong
          to client or exit traffic, but this is both difficult and inaccurate.
          Anything provided via the control port can scrubbed to make sure we
          aren't providing anything we think relay operators should not see.
     
      - additional information
          All connection querying commands strictly provide the ip address and
          port of connections, and nothing else. However, for the uses listed
          above the far more interesting attributes are the circuit's type,
          bandwidth usage and uptime.
     
      - improved performance
          Querying connection data is an expensive activity, especially for
          busy relays or low end processors (such as mobile devices). Tor
          already internally knows its circuits, allowing for vastly quicker
          lookups.
     
      - cross platform capability
          The connection querying utilities mentioned above not only aren't
          available under Windows, but differ widely among different *nix
          platforms. FreeBSD in particular takes a very unique approach,
          dropping important options from netstat and assigning ss to a
          spreadsheet application instead. A controller interface, however,
          would provide a uniform means of retrieving this information.

Security Implications:

    This is an open question. This proposal lacks the most controversial pieces
    of information (ip addresses and ports) and insight into potential threats
    this would pose would be very welcomed!

Specification:

   The following addition would be made to the control-spec's GETINFO section:

  "rcirc/id/<Circuit identity>" -- Provides entry for the associated relay
    circuit, formatted as:
      CIRC_ID=<circuit ID> CREATED=<timestamp> UPDATED=<timestamp> TYPE=<flag>
        READ=<bytes> WRITE=<bytes>

    none of the parameters contain whitespace, and additional results must be
    ignored to allow for future expansion. Parameters are defined as follows:
      CIRC_ID - Unique numeric identifier for the circuit this belongs to.
      CREATED - Unix timestamp (as seconds since the Epoch) for when the
          circuit was created.
      UPDATED - Unix timestamp for when this information was last updated.
      TYPE - Single character flags indicating attributes in the circuit:
          (E)ntry : has a connection that doesn't belong to a known Tor server,
            indicating that this is either the first hop or bridged
          E(X)it : has been used for at least one exit stream
          (R)elay : has been extended
          Rende(Z)vous : is being used for a rendezvous point
          (I)ntroduction : is being used for a hidden service introduction
          (N)one of the above: none of the above have happened yet.
      READ - Total bytes transmitted toward the exit over the circuit.
      WRITE - Total bytes transmitted toward the client over the circuit.

  "rcirc/all" -- The 'rcirc/id/*' output for all current circuits, joined by
    newlines.

   The following would be included for circ info update events.

4.1.X. Relay circuit status changed

  The syntax is:
     "650" SP "RCIRC" SP CircID SP Notice [SP Created SP Updated SP Type SP
          Read SP Write] CRLF
     
     Notice =
            "NEW"    / ; first information being provided for this circuit
            "UPDATE" / ; update for a previously reported circuit
            "CLOSED"   ; notice that the circuit no longer exists
    
  Notice indicating that queryable information on a relay related circuit has
  changed. If the Notice parameter is either "NEW" or "UPDATE" then this
  provides the same fields that would be given by calling "GETINFO rcirc/id/"
  with the CircID.

Filename: 173-getinfo-option-expansion.txt
Title: GETINFO Option Expansion
Author: Damian Johnson
Created: 02-June-2010
Status: Obsolete

Overview:

    Over the course of developing arm there's been numerous hacks and
    workarounds to glean pieces of basic, desirable information about the tor
    process. As per Roger's request I've compiled a list of these pain points
    to try and improve the control protocol interface.

Motivation:

    The purpose of this proposal is to expose additional process and relay
    related information that is currently unavailable in a convenient,
    dependable, and/or platform independent way. Examples are:

      - The relay's total contributed bandwidth. This is a highly requested
        piece of information and, based on the following patch from pipe, looks
        trivial to include.
        http://www.mail-archive.com/or-talk@freehaven.net/msg13085.html

      - The process ID of the tor process. There is a high degree of guess work
        in obtaining this. Arm for instance uses pidof, netstat, and ps yet
        still fails on some platforms, and Orbot recently got a ticket about
        its own attempt to fetch it with ps:
        https://trac.torproject.org/projects/tor/ticket/1388

    This just includes the pieces of missing information I've noticed
    (suggestions or questions of their usefulness are welcome!).

Security Implications:

    None that I'm aware of. From a security standpoint this seems decently
    innocuous.

Specification:

    The following addition would be made to the control-spec's GETINFO section:

    "relay/bw-limit" -- Effective relayed bandwidth limit.

    "relay/burst-limit" -- Effective relayed burst limit.

    "relay/read-total" -- Total bytes relayed (download).

    "relay/write-total" -- Total bytes relayed (upload).

    "relay/flags" -- Space separated listing of flags currently held by the
    relay as reported by the currently cached consensus.

    "process/user" -- Username under which the tor process is running,
    or an empty string if none exists.
    [what do we mean 'if none exists'?]
      [Implemented in 0.2.3.1-alpha.]

    "process/pid" -- Process id belonging to the main tor process, -1 if none
    exists for the platform.
      [Implemented in 0.2.3.1-alpha.]

    "process/uptime" -- Total uptime of the tor process (in seconds).

    "process/uptime-reset" -- Time since last reset (startup, sighup, or RELOAD
    signal, in seconds). [should clarify exactly which events cause an
    uptime reset]

    "process/descriptors-used" -- Count of file descriptors used.

    "process/descriptor-limit" -- File descriptor limit (getrlimit results).

    "ns/authority" -- Router status info (v2 directory style) for all
    recognized directory authorities, joined by newlines.

    "state/names" -- A space-separated list of all the keys supported by this
    version of Tor's state.

    "state/val/<key>" -- Provides the current state value belonging to the
    given key. If undefined, this provides the key's default value.

    "status/ports-seen" -- A summary of which ports we've seen connections'
    circuits connect to recently, formatted the same as the EXITS_SEEN status
    event described in Section 4.1.XX. This GETINFO option is currently
    available only for exit relays.

4.1.XX. Per-port exit stats

  The syntax is:
     "650" SP "EXITS_SEEN" SP TimeStarted SP PortSummary CRLF

  We just generated a new summary of which ports we've seen exiting circuits
  connecting to recently. The controller could display this for the user, e.g.
  in their "relay" configuration window, to give them a sense of how they're
  being used (popularity of the various ports they exit to). Currently only
  exit relays will receive this event.

  TimeStarted is a quoted string indicating when the reported summary
  counts from (in GMT).

  The PortSummary keyword has as its argument a comma-separated, possibly
  empty set of "port=count" pairs. For example (without linebreak),
  650-EXITS_SEEN TimeStarted="2008-12-25 23:50:43"
  PortSummary=80=16,443=8

Filename: 174-optimistic-data-server.txt
Title: Optimistic Data for Tor: Server Side
Author: Ian Goldberg
Created: 2-Aug-2010
Status: Closed
Implemented-In: 0.2.3.1-alpha

Overview:

When a SOCKS client opens a TCP connection through Tor (for an HTTP
request, for example), the query latency is about 1.5x higher than it
needs to be.  Simply, the problem is that the sequence of data flows
is this:

1. The SOCKS client opens a TCP connection to the OP
2. The SOCKS client sends a SOCKS CONNECT command
3. The OP sends a BEGIN cell to the Exit
4. The Exit opens a TCP connection to the Server
5. The Exit returns a CONNECTED cell to the OP
6. The OP returns a SOCKS CONNECTED notification to the SOCKS client
7. The SOCKS client sends some data (the GET request, for example)
8. The OP sends a DATA cell to the Exit
9. The Exit sends the GET to the server
10. The Server returns the HTTP result to the Exit
11. The Exit sends the DATA cells to the OP
12. The OP returns the HTTP result to the SOCKS client

Note that the Exit node knows that the connection to the Server was
successful at the end of step 4, but is unable to send the HTTP query to
the server until step 9.

This proposal (as well as its upcoming sibling concerning the client
side) aims to reduce the latency by allowing:
1. SOCKS clients to optimistically send data before they are notified
    that the SOCKS connection has completed successfully
2. OPs to optimistically send DATA cells on streams in the CONNECT_WAIT
    state
3. Exit nodes to accept and queue DATA cells while in the
    EXIT_CONN_STATE_CONNECTING state

This particular proposal deals with #3.

In this way, the flow would be as follows:

1. The SOCKS client opens a TCP connection to the OP
2. The SOCKS client sends a SOCKS CONNECT command, followed immediately
    by data (such as the GET request)
3. The OP sends a BEGIN cell to the Exit, followed immediately by DATA
    cells
4. The Exit opens a TCP connection to the Server
5. The Exit returns a CONNECTED cell to the OP, and sends the queued GET
    request to the Server
6. The OP returns a SOCKS CONNECTED notification to the SOCKS client,
    and the Server returns the HTTP result to the Exit
7. The Exit sends the DATA cells to the OP
8. The OP returns the HTTP result to the SOCKS client

Motivation:

This change will save one OP<->Exit round trip (down to one from two).
There are still two SOCKS Client<->OP round trips (negligible time) and
two Exit<->Server round trips.  Depending on the ratio of the
Exit<->Server (Internet) RTT to the OP<->Exit (Tor) RTT, this will
decrease the latency by 25 to 50 percent.  Experiments validate these
predictions. [Goldberg, PETS 2010 rump session; see
https://thunk.cs.uwaterloo.ca/optimistic-data-pets2010-rump.pdf ]

Design:

The current code actually correctly handles queued data at the Exit; if
there is queued data in a EXIT_CONN_STATE_CONNECTING stream, that data
will be immediately sent when the connection succeeds.  If the
connection fails, the data will be correctly ignored and freed.  The
problem with the current server code is that the server currently
drops DATA cells on streams in the EXIT_CONN_STATE_CONNECTING state.
Also, if you try to queue data in the EXIT_CONN_STATE_RESOLVING state,
bad things happen because streams in that state don't yet have
conn->write_event set, and so some existing sanity checks (any stream
with queued data is at least potentially writable) are no longer sound.

The solution is to simply not drop received DATA cells while in the
EXIT_CONN_STATE_CONNECTING state.  Also do not send SENDME cells in this
state, so that the OP cannot send more than one window's worth of data
to be queued at the Exit.  Finally, patch the sanity checks so that
streams in the EXIT_CONN_STATE_RESOLVING state that have buffered data
can pass.

If no clients ever send such optimistic data, the new code will never be
executed, and the behaviour of Tor will not change.  When clients begin
to send optimistic data, the performance of those clients' streams will
improve.

After discussion with nickm, it seems best to just have the server
version number be the indicator of whether a particular Exit supports
optimistic data.  (If a client sends optimistic data to an Exit which
does not support it, the data will be dropped, and the client's request
will fail to complete.)  What do version numbers for hypothetical future
protocol-compatible implementations look like, though?

Security implications:

Servers (for sure the Exit, and possibly others, by watching the
pattern of packets) will be able to tell that a particular client
is using optimistic data.  This will be discussed more in the sibling
proposal.

On the Exit side, servers will be queueing a little bit extra data, but
no more than one window.  Clients today can cause Exits to queue that
much data anyway, simply by establishing a Tor connection to a slow
machine, and sending one window of data.

Specification:

tor-spec section 6.2 currently says:

    The OP waits for a RELAY_CONNECTED cell before sending any data.
    Once a connection has been established, the OP and exit node
    package stream data in RELAY_DATA cells, and upon receiving such
    cells, echo their contents to the corresponding TCP stream.
    RELAY_DATA cells sent to unrecognized streams are dropped.

It is not clear exactly what an "unrecognized" stream is, but this last
sentence would be changed to say that RELAY_DATA cells received on a
stream that has processed a RELAY_BEGIN cell and has not yet issued a
RELAY_END or a RELAY_CONNECTED cell are queued; that queue is processed
immediately after a RELAY_CONNECTED cell is issued for the stream, or
freed after a RELAY_END cell is issued for the stream.

The earlier part of this section will be addressed in the sibling
proposal.

Compatibility:

There are compatibility issues, as mentioned above.  OPs MUST NOT send
optimistic data to Exit nodes whose version numbers predate (something).
OPs MAY send optimistic data to Exit nodes whose version numbers match
or follow that value.  (But see the question about independent server
reimplementations, above.)

Implementation:

Here is a simple patch.  It seems to work with both regular streams and
hidden services, but there may be other corner cases I'm not aware of.
(Do streams used for directory fetches, hidden services, etc. take a
different code path?)

diff --git a/src/or/connection.c b/src/or/connection.c
index 7b1493b..f80cd6e 100644
--- a/src/or/connection.c
+++ b/src/or/connection.c
@@ -2845,7 +2845,13 @@ _connection_write_to_buf_impl(const char *string, size_t len,
     return;
   }
 
-  connection_start_writing(conn);
+  /* If we receive optimistic data in the EXIT_CONN_STATE_RESOLVING
+   * state, we don't want to try to write it right away, since
+   * conn->write_event won't be set yet.  Otherwise, write data from
+   * this conn as the socket is available. */
+  if (conn->state != EXIT_CONN_STATE_RESOLVING) {
+      connection_start_writing(conn);
+  }
   if (zlib) {
     conn->outbuf_flushlen += buf_datalen(conn->outbuf) - old_datalen;
   } else {
@@ -3382,7 +3388,11 @@ assert_connection_ok(connection_t *conn, time_t now)
     tor_assert(conn->s < 0);
 
   if (conn->outbuf_flushlen > 0) {
-    tor_assert(connection_is_writing(conn) || conn->write_blocked_on_bw ||
+    /* With optimistic data, we may have queued data in
+     * EXIT_CONN_STATE_RESOLVING while the conn is not yet marked to writing.
+     * */
+    tor_assert(conn->state == EXIT_CONN_STATE_RESOLVING ||
+	    connection_is_writing(conn) || conn->write_blocked_on_bw ||
             (CONN_IS_EDGE(conn) && TO_EDGE_CONN(conn)->edge_blocked_on_circ));
   }
 
diff --git a/src/or/relay.c b/src/or/relay.c
index fab2d88..e45ff70 100644
--- a/src/or/relay.c
+++ b/src/or/relay.c
@@ -1019,6 +1019,9 @@ connection_edge_process_relay_cell(cell_t *cell, circuit_t *circ,
   relay_header_t rh;
   unsigned domain = layer_hint?LD_APP:LD_EXIT;
   int reason;
+  int optimistic_data = 0;  /* Set to 1 if we receive data on a stream
+			       that's in the EXIT_CONN_STATE_RESOLVING
+			       or EXIT_CONN_STATE_CONNECTING states.*/
 
   tor_assert(cell);
   tor_assert(circ);
@@ -1038,9 +1041,20 @@ connection_edge_process_relay_cell(cell_t *cell, circuit_t *circ,
   /* either conn is NULL, in which case we've got a control cell, or else
    * conn points to the recognized stream. */
 
-  if (conn && !connection_state_is_open(TO_CONN(conn)))
-    return connection_edge_process_relay_cell_not_open(
-             &rh, cell, circ, conn, layer_hint);
+  if (conn && !connection_state_is_open(TO_CONN(conn))) {
+    if ((conn->_base.state == EXIT_CONN_STATE_CONNECTING ||
+	    conn->_base.state == EXIT_CONN_STATE_RESOLVING) &&
+	rh.command == RELAY_COMMAND_DATA) {
+	/* We're going to allow DATA cells to be delivered to an exit
+	 * node in state EXIT_CONN_STATE_CONNECTING or
+	 * EXIT_CONN_STATE_RESOLVING.  This speeds up HTTP, for example. */
+	log_warn(domain, "Optimistic data received.");
+	optimistic_data = 1;
+    } else {
+	return connection_edge_process_relay_cell_not_open(
+		 &rh, cell, circ, conn, layer_hint);
+    }
+  }
 
   switch (rh.command) {
     case RELAY_COMMAND_DROP:
@@ -1090,7 +1104,9 @@ connection_edge_process_relay_cell(cell_t *cell, circuit_t *circ,
       log_debug(domain,"circ deliver_window now %d.", layer_hint ?
                 layer_hint->deliver_window : circ->deliver_window);
 
-      circuit_consider_sending_sendme(circ, layer_hint);
+      if (!optimistic_data) {
+	  circuit_consider_sending_sendme(circ, layer_hint);
+      }
 
       if (!conn) {
         log_info(domain,"data cell dropped, unknown stream (streamid %d).",
@@ -1107,7 +1123,9 @@ connection_edge_process_relay_cell(cell_t *cell, circuit_t *circ,
       stats_n_data_bytes_received += rh.length;
       connection_write_to_buf(cell->payload + RELAY_HEADER_SIZE,
                               rh.length, TO_CONN(conn));
-      connection_edge_consider_sending_sendme(conn);
+      if (!optimistic_data) {
+	  connection_edge_consider_sending_sendme(conn);
+      }
       return 0;
     case RELAY_COMMAND_END:
       reason = rh.length > 0 ?

Performance and scalability notes:

There may be more RAM used at Exit nodes, as mentioned above, but it is
transient.
Filename: 175-automatic-node-promotion.txt
Title: Automatically promoting Tor clients to nodes
Author: Steven Murdoch
Created: 12-Mar-2010
Status: Rejected

1. Overview

   This proposal describes how Tor clients could determine when they
   have sufficient bandwidth capacity and are sufficiently reliable to
   become either bridges or Tor relays. When they meet this
   criteria, they will automatically promote themselves, based on user
   preferences. The proposal also defines the new controller messages
   and options which will control this process.

   Note that for the moment, only transitions between client and
   bridge are being considered. Transitions to public relay will
   be considered at a future date, but will use the same
   infrastructure for measuring capacity and reliability.

2. Motivation and history

   Tor has a growing user-base and one of the major impediments to the
   quality of service offered is the lack of network capacity. This is
   particularly the case for bridges, because these are gradually
   being blocked, and thus no longer of use to people within some
   countries. By automatically promoting Tor clients to bridges, and
   perhaps also to full public relays, this proposal aims to solve
   these problems.

   Only Tor clients which are sufficiently useful should be promoted,
   and the process of determining usefulness should be performed
   without reporting the existence of the client to the central
   authorities. The criteria used for determining usefulness will be
   in terms of bandwidth capacity and uptime, but parameters should be
   specified in the directory consensus. State stored at the client
   should be in no more detail than necessary, to prevent sensitive
   information being recorded.

3. Design

3.x Opt-in state model

   Tor can be in one of five node-promotion states:

   - off (O): Currently a client, and will stay as such
   - auto (A): Currently a client, but will consider promotion
   - bridge (B): Currently a bridge, and will stay as such
   - auto-bridge (AB): Currently a bridge, but will consider promotion
   - relay (R): Currently a public relay, and will stay as such

   The state can be fully controlled from the configuration file or
   controller, but the normal state transitions are as follows:

   Any state -> off: User has opted out of node promotion
   Off -> any state: Only permitted with user consent

   Auto -> auto-bridge: Tor has detected that it is sufficiently
    reliable to be a *bridge*
   Auto -> bridge: Tor has detected that it is sufficiently reliable
    to be a *relay*, but the user has chosen to remain a *bridge*
   Auto -> relay: Tor has detected that it is sufficiently reliable
    to be *relay*, and will skip being a *bridge*
   Auto-bridge -> relay: Tor has detected that it is sufficiently
    reliable to be a *relay*

   Note that this model does not support automatic demotion. If this
   is desirable, there should be some memory as to whether the
   previous state was relay, bridge, or auto-bridge. Otherwise the
   user may be prompted to become a relay, although he has opted to
   only be a bridge.

3.x User interaction policy

   There are a variety of options in how to involve the user into the
   decision as to whether and when to perform node promotion. The
   choice also may be different when Tor is running from Vidalia (and
   thus can readily prompt the user for information), and standalone
   (where Tor can only log messages, which may or may not be read).

   The option requiring minimal user interaction is to automatically
   promote nodes according to reliability, and allow the user to opt
   out, by changing settings in the configuration file or Vidalia user
   interface.

   Alternatively, if a user interface is available, Tor could prompt
   the user when it detects that a transition is available, and allow
   the user to choose which of the available options to select. If
   Vidalia is not available, it still may be possible to solicit an
   email address on install, and contact the operator to ask whether
   a transition to bridge or relay is permitted.

   Finally, Tor could by default not make any transition, and the user
   would need to opt in by stating the maximum level (bridge or
   relay) to which the node may automatically promote itself.

3.x Performance monitoring model

   To prevent a large number of clients activating as relays, but
   being too unreliable to be useful, clients should measure their
   performance. If this performance meets a parameterized acceptance
   criteria, a client should consider promotion. To measure
   reliability, this proposal adopts a simple user model:

    - A user decides to use Tor at times which follow a Poisson
      distribution
    - At each time, the user will be happy if the bridge chosen has
      adequate bandwidth and is reachable
    - If the chosen bridge is down or slow too many times, the user
      will consider Tor to be bad

   If we additionally assume that the recent history of relay
   performance matches the current performance, we can measure
   reliability by simulating this simple user.

   The following parameters are distributed to clients in the
   directory consensus:

     - min_bandwidth: Minimum self-measured bandwidth for a node to be
       considered useful, in bytes per second
     - check_period: How long, in seconds, to wait between checking
       reachability and bandwidth (on average)
     - num_samples: Number of recent samples to keep
     - num_useful: Minimum number of recent samples where the node was
       reachable and had at least min_bandwidth capacity, for a client
       to consider promoting to a bridge

   A different set of parameters may be used for considering when to
   promote a bridge to a full relay, but this will be the subject of a
   future revision of the proposal.

3.x Performance monitoring algorithm

   The simulation described above can be implemented as follows:

   Every 60 seconds:
     1. Tor generates a random floating point number x in
        the interval [0, 1).
     2. If x > (1 / (check_period / 60)) GOTO end; otherwise:
     3. Tor sets the value last_check to the current_time (in seconds)
     4. Tor measures reachability
     5. If the client is reachable, Tor measures its bandwidth
     6. If the client is reachable and the bandwidth is >=
        min_bandwidth, the test has succeeded, otherwise it has failed.
     7. Tor adds the test result to the end of a ring-buffer containing
        the last num_samples results: measurement_results
     8. Tor saves last_check and measurements_results to disk
     9. If the length of measurements_results == num_samples and
        the number of successes >= num_useful, Tor should consider
        promotion to a bridge
   end.

   When Tor starts, it must fill in the samples for which it was not
   running. This can only happen once the consensus has downloaded,
   because the value of check_period is needed.

      1. Tor generates a random number y from the Poisson distribution [1]
         with lambda = (current_time - last_check) * (1 / check_period)
      2. Tor sets the value last_check to the current_time (in seconds)
      3. Add y test failures to the ring buffer measurements_results
      4. Tor saves last_check and measurements_results to disk

   In this way, a Tor client will measure its bandwidth and
   reachability every check_period seconds, on average. Provided
   check_period is sufficiently greater than a minute (say, at least an
   hour), the times of check will follow a Poisson distribution. [2]

   While this does require that Tor does record the state of a client
   over time, this does not leak much information. Only a binary
   reachable/non-reachable is stored, and the timing of samples becomes
   increasingly fuzzy as the data becomes less recent.

   On IP address changes, Tor should clear the ring-buffer, because
   from the perspective of users with the old IP address, this node
   might as well be a new one with no history. This policy may change
   once we start allowing the bridge authority to hand out new IP
   addresses given the fingerprint.
   [Perhaps another consensus param? Also, this means we save previous
    IP address in our state file, yes? -RD]

3.x Bandwidth measurement

   Tor needs to measure its bandwidth to test the usefulness as a
   bridge. A non-intrusive way to do this would be to passively measure
   the peak data transfer rate since the last reachability test. Once
   this exceeds min_bandwidth, Tor can set a flag that this node
   currently has sufficient bandwidth to pass the bandwidth component
   of the upcoming performance measurement.

   For the first version we may simply skip the bandwidth test,
   because the existing reachability test sends 500 kB over several
   circuits, and checks whether the node can transfer at least 50
   kB/s.  This is probably good enough for a bridge, so this test
   might be sufficient to record a success in the ring buffer.

3.x New options

3.x New controller message

4. Migration plan

   We should start by setting a high bandwidth and uptime requirement
   in the consensus, so as to avoid overloading the bridge authority
   with too many bridges. Once we are confident our systems can scale,
   the criteria can be gradually shifted down to gain more bridges.

5. Related proposals

6. Open questions:

   - What user interaction policy should we take?

   - When (if ever) should we turn a relay into an exit relay?

   - What should the rate limits be for auto-promoted bridges/relays?
     Should we prompt the user for this?

   - Perhaps the bridge authority should tell potential bridges
     whether to enable themselves, by taking into account whether
     their IP address is blocked

   - How do we explain the possible risks of running a bridge/relay
     * Use of bandwidth/congestion
     * Publication of IP address
     * Blocking from IRC (even for non-exit relays)

   - What feedback should we give to bridge relays, to encourage them
     e.g. number of recent users (what about reserve bridges)?

   - Can clients back-off from doing these tests (yes, we should do
     this)

[1] For algorithms to generate random numbers from the Poisson
    distribution, see: http://en.wikipedia.org/wiki/Poisson_distribution#Generating_Poisson-distributed_random_variables
[2] "The sample size n should be equal to or larger than 20 and the
     probability of a single success, p, should be smaller than or equal to
     .05. If n >= 100, the approximation is excellent if np is also <= 10."
    http://www.itl.nist.gov/div898/handbook/pmc/section3/pmc331.htm (e-Handbook of Statistical Methods)

% vim: spell ai et:
Filename: 176-revising-handshake.txt
Title: Proposed version-3 link handshake for Tor
Author: Nick Mathewson
Created: 31-Jan-2011
Status: Closed
Target: 0.2.3
Supersedes: 169

1. Overview

   I propose a (mostly) backward-compatible change to the Tor
   connection establishment protocol to avoid the use of TLS
   renegotiation, to avoid certain protocol fingerprinting attacks,
   and to make it easier to write Tor clients and servers.

   Rather than doing a TLS renegotiation to exchange certificates
   and authenticate the original handshake, this proposal takes an
   approach similar to Steven Murdoch's proposal 124 and my old
   proposal 169, and uses Tor cells to finish authenticating the
   parties' identities once the initial TLS handshake is finished.

   I discuss some alternative design choices and why I didn't make
   them in section 7; please have a quick look there before
   telling me that something is pointless or makes no sense.

   Terminological note: I use "client" or "initiator" below to mean
   the Tor instance (a client or a bridge or a relay) that initiates a
   TLS connection, and "server" or "responder" to mean the Tor
   instance (a bridge or a relay) that accepts it.

2. History and Motivation

   The _goals_ of the Tor link handshake have remained basically uniform
   since our earliest versions.  They are:

      * Provide data confidentiality, data integrity
      * Provide forward secrecy
      * Allow responder authentication or bidirectional authentication.
      * Try to look like some popular too-important-to-block-at-whim
        encryption protocol, to avoid fingerprinting and censorship.
      * Try to be implementable -- on the client side at least! --
        by as many TLS implementations as possible.

   When we added the v2 handshake, we added another goal:

      * Remain compatible with older versions of the handshake
        protocol.

   In the original Tor TLS connection handshake protocol ("V1", or
   "two-cert"), parties that wanted to authenticate provided a
   two-cert chain of X.509 certificates during the handshake setup
   phase.  Every party that wanted to authenticate sent these
   certificates.  The security properties of this protocol are just
   fine; the problem was that our behavior of sending
   two-certificate chains made Tor easy to identify.

   In the current Tor TLS connection handshake protocol ("V2", or
   "renegotiating"), the parties begin with a single certificate
   sent from the server (responder) to the client (initiator), and
   then renegotiate to a two-certs-from-each-authenticating party.
   We made this change to make Tor's handshake look like a browser
   speaking SSL to a webserver.  (See proposal 130, and
   tor-spec.txt.)  So from an observer's point of view, two parties
   performing the V2 handshake begin by making a regular TLS
   handshake with a single certificate, then renegotiate
   immediately.

   To tell whether to use the V1 or V2 handshake, the servers look
   at the list of ciphers sent by the client.  (This is ugly, but
   there's not much else in the ClientHello that they can look at.)
   If the list contains any cipher not used by the V1 protocol, the
   server sends back a single cert and expects a renegotiation.  If
   the client gets back a single cert, then it withholds its own
   certificates until the TLS renegotiation phase.

   In other words, V2-supporting initiator behavior currently looks
   like this:

      - Begin TLS negotiation with V2 cipher list; wait for
        certificate(s).
      - If we get a certificate chain:
         - Then we are using the V1 handshake.  Send our own
           certificate chain as part of this initial TLS handshake
           if we want to authenticate; otherwise, send no
           certificates.  When the handshake completes, check
           certificates.  We are now mutually authenticated.

        Otherwise, if we get just a single certificate:
         - Then we are using the V2 handshake.  Do not send any
           certificates during this handshake.
         - When the handshake is done, immediately start a TLS
           renegotiation.  During the renegotiation, expect
           a certificate chain from the server; send a certificate
           chain of our own if we want to authenticate ourselves.
         - After the renegotiation, check the certificates. Then
           send (and expect) a VERSIONS cell from the other side to
           establish the link protocol version.

   And V2-supporting responder behavior now looks like this:

      - When we get a TLS ClientHello request, look at the cipher
        list.
      - If the cipher list contains only the V1 ciphersuites:
         - Then we're doing a V1 handshake.  Send a certificate
           chain.  Expect a possible client certificate chain in
           response.
        Otherwise, if we get other ciphersuites:
         - We're using the V2 handshake.  Send back a single
           certificate and let the handshake complete.
         - Do not accept any data until the client has renegotiated.
         - When the client is renegotiating, send a certificate
           chain, and expect (possibly multiple) certificates in
           reply.
         - Check the certificates when the renegotiation is done.
           Then exchange VERSIONS cells.

   Late in 2009, researchers found a flaw in most applications' use
   of TLS renegotiation: Although TLS renegotiation does not
   reauthenticate any information exchanged before the renegotiation
   takes place, many applications were treating it as though it did,
   and assuming that data sent _before_ the renegotiation was
   authenticated with the credentials negotiated _during_ the
   renegotiation.  This problem was exacerbated by the fact that
   most TLS libraries don't actually give you an obvious good way to
   tell where the renegotiation occurred relative to the datastream.
   Tor wasn't directly affected by this vulnerability, but the
   aftermath hurts us in a few ways:

      1) OpenSSL has disabled renegotiation by default, and created
         a "yes we know what we're doing" option we need to set to
         turn it back on.  (Two options, actually: one for openssl
         0.9.8l and one for 0.9.8m and later.)

      2) Some vendors have removed all renegotiation support from
         their versions of OpenSSL entirely, forcing us to tell
         users to either replace their versions of OpenSSL or to
         link Tor against a hand-built one.

      3) Because of 1 and 2, I'd expect TLS renegotiation to become
         rarer and rarer in the wild, making our own use stand out
         more.

   Furthermore, there are other issues related to TLS and
   fingerprinting that we want to fix in any revised handshake:

      1) We should make it easier to use self-signed certs, or maybe
         even existing HTTPS certificates, for the server side
         handshake, since most non-Tor SSL handshakes use either
         self-signed certificates or CA-signed certificates.

      2) We should allow other changes in our use of TLS and in our
         certificates so as to resist fingerprinting based on how
         our certificates look.  (See proposal 179.)

3. Design

3.1. The view in the large

   Taking a cue from Steven Murdoch's proposal 124 and my old
   proposal 169, I propose that we move the work currently done by
   the TLS renegotiation step (that is, authenticating the parties
   to one another) and do it with Tor cells instead of with TLS
   alone.

   This section outlines the protocol; we go into more detail below.

   To tell the client that it can use the new cell-based
   authentication system, the server sends a "V3 certificate" during
   the initial TLS handshake.  (More on what makes a certificate
   "v3" below.)  If the client recognizes the format of the
   certificate and decides to pursue the V3 handshake, then instead
   of renegotiating immediately on completion of the initial TLS
   handshake, the client instead sends a VERSIONS cell (and the
   negotiation begins).

   So the flowchart on the server side is:

      Wait for a ClientHello.
      If the client sends a ClientHello that indicates V1:
          - Send a certificate chain.
          - When the TLS handshake is done, if the client sent us a
            certificate chain, then check it.
      If the client sends a ClientHello that indicates V2 or V3:
          - Send a self-signed certificate or a CA-signed certificate
          - When the TLS handshake is done, wait for renegotiation or data.
            - If renegotiation occurs, the client is V2: send a
              certificate chain and maybe receive one.  Check the
              certificate chain as in V1.
            - If the client sends data without renegotiating, it is
              starting the V3 handshake.  Proceed with the V3
              handshake as below.

   And the client-side flowchart is:

      - Send a ClientHello with a set of ciphers that indicates V2/V3.
      - After the handshake is done:
        - If the server sent us a certificate chain, check it: we
          are using the V1 handshake.
        - If the server sent us a single "V2 certificate", we are
          using the v2 handshake: the client begins to renegotiate
          and proceeds as before.
        - Finally, if the server sent us a "v3 certificate", we are
          doing the V3 handshake below.

   And the cell-based part of the V3 handshake, in summary, is:

    C<->S: TLS handshake where S sends a "v3 certificate"

    In TLS:

       C->S: VERSIONS cell
       S->C: VERSIONS cell, CERT cell, AUTH_CHALLENGE cell, NETINFO cell

       C->S: Optionally: CERT cell, AUTHENTICATE cell
       C->S: NETINFO cell

   A "CERTS" cell contains a set of certificates; an "AUTHENTICATE"
   cell authenticates the client to the server.  More on these
   later.

3.2. Distinguishing V2 and V3 certificates

   In the protocol outline above, we require that the client can
   distinguish between v2 certificates (that is, those sent by
   current servers) and v3 certificates.  We further require that
   existing clients will accept v3 certificates as they currently
   accept v2 certificates.

   Fortunately, current certificates have a few characteristics that
   make them fairly well-mannered as it is.  We say that a certificate
   indicates a V2-only server if ALL of the following hold:
      * The certificate is not self-signed.
      * There is no DN field set in the certificate's issuer or
        subject other than "commonName".
      * The commonNames of the issuer and subject both end with
        ".net"
      * The public modulus is at most 1024 bits long.

   Otherwise, the client should assume that the server supports the
   V3 handshake.

   To the best of my knowledge, current clients will behave properly
   on receiving non-v2 certs during the initial TLS handshake so
   long as they eventually get the correct V2 cert chain during the
   renegotiation.

   The v3 requirements are easy to meet: any certificate designed to
   resist fingerprinting will likely be self-signed, or if it's
   signed by a CA, then the issuer will surely have more DN fields
   set.  Certificates that aren't trying to resist fingerprinting
   can trivially become v3 by using a CN that doesn't end with .net,
   or using a key longer than 1024 bits.

3.3. Authenticating via Tor cells: server authentication

   Once the TLS handshake is finished, if the client renegotiates,
   then the server should go on as it does currently.

   If the client implements this proposal, however, and the server
   has shown it can understand the V3+ handshake protocol, the
   client immediately sends a VERSIONS cell to the server
   and waits to receive a VERSIONS cell in return.  We negotiate
   the Tor link protocol version _before_ we proceed with the
   negotiation, in case we need to change the authentication
   protocol in the future.

   Once either party has seen the VERSIONS cell from the other, it
   knows which version they will pick (that is, the highest version
   shared by both parties' VERSIONS cells).  All Tor instances using
   the handshake protocol described in 3.2 MUST support at least
   link protocol version 3 as described here.  If a version lower
   than 3 is negotiated with the V3 handshake in place, a Tor
   instance MUST close the connection.

   On learning the link protocol, the server then sends the client a
   CERT cell and a NETINFO cell.  If the client wants to
   authenticate to the server, it sends a CERT cell, an AUTHENTICATE
   cell, and a NETINFO cell; or it may simply send a NETINFO cell if
   it does not want to authenticate.

   The CERT cell describes the keys that a Tor instance is claiming
   to have.  It is a variable-length cell.  Its payload format is:

        N: Number of certs in cell            [1 octet]
        N times:
           CertType                           [1 octet]
           CLEN                               [2 octets]
           Certificate                        [CLEN octets]

   Any extra octets at the end of a CERT cell MUST be ignored.

     CertType values are:
        1: Link key certificate from RSA1024 identity
        2: RSA1024 Identity certificate
        3: RSA1024 AUTHENTICATE cell link certificate

   The certificate format is X509.

   To authenticate the server, the client MUST check the following:
     * The CERTS cell contains exactly one CertType 1 "Link" certificate.
     * The CERTS cell contains exactly one CertType 2 "ID" certificate.
     * Both certificates have validAfter and validUntil dates that
       are not expired.
     * The certified key in the Link certificate matches the
       link key that was used to negotiate the TLS connection.
     * The certified key in the ID certificate is a 1024-bit RSA key.
     * The certified key in the ID certificate was used to sign both
       certificates.
     * The link certificate is correctly signed with the key in the
       ID certificate
     * The ID certificate is correctly self-signed.

   If all of these conditions hold, then the client knows that it is
   connected to the server whose identity key is certified in the ID
   certificate.  If any condition does not hold, the client closes
   the connection.  If the client wanted to connect to a server with
   a different identity key, the client closes the connection.

   An AUTH_CHALLENGE cell is a variable-length cell with the following
   fields:
       Challenge [32 octets]
       N_Methods [2 octets]
       Methods   [2 * N_Methods octets]

   It is sent from the server to the client.  Clients MUST ignore
   unexpected bytes at the end of the cell.  Servers MUST generate
   every challenge using a strong RNG or PRNG.

   The Challenge field is a randomly generated string that the
   client must sign (a hash of) as part of authenticating.  The
   methods are the authentication methods that the server will
   accept.  Only one authentication method is defined right now; see
   3.4 below.

3.4. Authenticating via Tor cells: Client authentication

   A client does not need to authenticate to the server.  If it
   does not wish to, it responds to the server's valid CERT cell by
   sending a NETINFO cell: once it has gotten a valid NETINFO cell,
   the client should consider the connection open, and the
   server should consider the connection as opened by an
   unauthenticated client.

   If a client wants to authenticate, it responds to the
   AUTH_CHALLENGE cell with a CERT cell and an AUTHENTICATE cell.
   The CERT cell is as a server would send, except that instead of
   sending a CertType 1 cert for an arbitrary link certificate, the
   client sends a CertType 3 cert for an RSA AUTHENTICATE key.
   (This difference is because we allow any link key type on a TLS
   link, but the protocol described here will only work for 1024-bit
   RSA keys.  A later protocol version should extend the protocol
   here to work with non-1024-bit, non-RSA keys.)

        AuthType                              [2 octets]
        AuthLen                               [2 octets]
        Authentication                        [AuthLen octets]

   Servers MUST ignore extra bytes at the end of an AUTHENTICATE
   cell.  If AuthType is 1 (meaning "RSA-SHA256-TLSSecret"), then the
   Authentication contains the following:

       TYPE: The characters "AUTH0001" [8 octets]
       CID: A SHA256 hash of the client's RSA1024 identity key [32 octets]
       SID: A SHA256 hash of the server's RSA1024 identity key [32 octets]
       SLOG: A SHA256 hash of all bytes sent from the server to the client
         as part of the negotiation up to and including the
         AUTH_CHALLENGE cell; that is, the VERSIONS cell,
         the CERT cell, the AUTH_CHALLENGE cell, and any padding cells.
         [32 octets]
       CLOG: A SHA256 hash of all bytes sent from the client to the
         server as part of the negotiation so far; that is, the
         VERSIONS cell and the CERT cell and any padding cells. [32 octets]
       SCERT: A SHA256 hash of the server's TLS link
         certificate. [32 octets]
       TLSSECRETS: A SHA256 HMAC, using the TLS master secret as the
         secret key, of the following:
           - client_random, as sent in the TLS Client Hello
           - server_random, as sent in the TLS Server Hello
           - the NUL terminated ASCII string:
             "Tor V3 handshake TLS cross-certification"
          [32 octets]
       TIME: The time of day in seconds since the POSIX epoch. [8 octets]
       RAND: A 16 byte value, randomly chosen by the client [16 octets]
       SIG: A signature of a SHA256 hash of all the previous fields
         using the client's "Authenticate" key as presented.  (As
         always in Tor, we use OAEP-MGF1 padding; see tor-spec.txt
         section 0.3.)
          [variable length]

   To check the AUTHENTICATE cell, a server checks that all fields
   containing from TYPE through TLSSECRETS contain their unique
   correct values as described above, and then verifies the signature.
   signature.  The server MUST ignore any extra bytes in the signed
   data after the SHA256 hash.

3.5. Responding to extra cells, and other security checks.

   If the handshake is a V3 TLS handshake, both parties MUST reject
   any negotiated link version less than 3.  Both parties MUST check
   this and close the connection if it is violated.

   If the handshake is not a V3 TLS handshake, both parties MUST
   still advertise all link protocols they support in their versions
   cell.  Both parties MUST close the link if it turns out they both
   would have supported version 3 or higher, but they somehow wound
   up using a v2 or v1 handshake.  (More on this in section 6.4.)

   Either party may send a VPADDING cell at any time during the
   handshake, except as the first cell. (See proposal 184.)

   A server SHOULD NOT send any sequence of cells when starting a v3
   negotiation other than "VERSIONS, CERT, AUTH_CHALLENGE,
   NETINFO".  A client SHOULD drop a CERT, AUTH_CHALLENGE, or
   NETINFO cell that appears at any other time or out of sequence.

   A client should not begin a v3 negotiation with any sequence
   other than "VERSIONS, NETINFO" or "VERSIONS, CERT, AUTHENTICATE,
   NETINFO".   A server SHOULD drop a CERT, AUTH_CHALLENGE, or
   NETINFO cell that appears at any other time or out of sequence.

4. Numbers to assign

   We need a version number for this link protocol.  I've been
   calling it "3".

   We need to reserve command numbers for CERT, AUTH_CHALLENGE, and
   AUTHENTICATE.  I suggest that in link protocol 3 and higher, we
   reserve a separate range of commands for variable-length cells.
   See proposal 184 for more there.

5. Efficiency

   This protocol adds a round-trip step when the client sends a
   VERSIONS cell to the server, and waits for the {VERSIONS, CERT,
   NETINFO} response in turn.  (The server then waits for the
   client's {NETINFO} or {CERT, AUTHENTICATE, NETINFO} reply,
   but it would have already been waiting for the client's NETINFO,
   so that's not an additional wait.)

   This is actually fewer round-trip steps than required before for
   TLS renegotiation, so that's a win over v2.

6. Security argument

   These aren't crypto proofs, since I don't write those.  They are
   meant to be reasonably convincing.

6.1. The server is authenticated

   TLS guarantees that if the TLS handshake completes successfully,
   the client knows that it is speaking to somebody who knows the
   private key corresponding to the public link key that was used in
   the TLS handshake.

   Because this public link key is signed by the server's identity
   key in the CERT cell, the client knows that somebody who holds
   the server's private identity key says that the server's public
   link key corresponds to the server's public identity key.

   Therefore, if the crypto works, and if TLS works, and if the keys
   aren't compromised, then the client is talking to somebody who
   holds the server's private identity key.

6.2. The client is authenticated

   Once the server has checked the client's certificates, the server
   knows that somebody who knows the client's private identity key
   says that he is the one holding the private key corresponding to
   the client's presented link-authentication public key.

   Once the server has checked the signature in the AUTHENTICATE
   cell, the server knows that somebody holding the client's
   link-authentication private key signed the data in question.  By
   the standard certification argument above, the server knows that
   somebody holding the client's private identity key signed the
   data in question.

   So the server's remaining question is: am I really talking to
   somebody holding the client's identity key, or am I getting a
   replayed or MITM'd AUTHENTICATE cell that was previously sent by
   the client?

   Because the client includes a TLSSECRET component, and the
   server is able to verify it, then the answer is easy: the server
   knows for certain that it is talking to the party with whom it
   did the TLS handshake, since if somebody else generated a correct
   TLSSECRET, they would have to know the master secret of the TLS
   connection, which would require them to have broken TLS.

   Even if the protocol didn't contain the TLSSECRET component,
   the server could the client's authentication, but it's a little
   trickier.  The server knows that it is not getting a replayed
   AUTHENTICATE cell, since the cell authenticates (among other
   stuff) the server's AUTH_CHALLENGE cell, which it has never used
   before.  The server knows that it is not getting a MITM'd
   AUTHENTICATE cell, since the cell includes a hash of the server's
   link certificate, which nobody else should have been able to use
   in a successful TLS negotiation.

6.3. MITM attacks won't work any better than they do against TLS

   TLS guarantees that a man-in-the-middle attacker can't read the
   content of a successfully negotiated encrypted connection, nor
   alter the content in any way other than truncating it, unless he
   compromises the session keys or one of the key-exchange secret
   keys used to establish that connection.  Let's make sure we do at
   least that well.

   Suppose that a client Alice connects to an MITM attacker Mallory,
   thinking that she is connecting to some server Bob.  Let's assume
   that the TLS handshake between Alice and Mallory finishes
   successfully and the v3 protocol is chosen.  [If the v1 or v2
   protocol is chosen, those already resist MITM.  If the TLS
   handshake doesn't complete, then Alice isn't connected to anybody.]

   During the v3 handshake, Mallory can't convince Alice that she is
   talking to Bob, since she should not be able to produce a CERT
   cell containing a certificate chain signed by Bob's identity key
   and used to authenticate the link key that Mallory used during
   TLS.  (If Mallory used her own link key for the TLS handshake, it
   won't match anything Bob signed unless Bob is compromised.
   Mallory can't use any key that Bob _did_ produce a certificate
   for, since she doesn't know the private key.)

   Even if Alice fails to check the certificates from Bob, Mallory
   still can't convince Bob that she is really Alice.  Assuming that
   Alice's keys aren't compromised, Mallory can't send a CERT cell
   with a cert chain from Alice's identity key to a key that Mallory
   controls, so if Mallory wants to impersonate Alice's identity
   key, she can only do so by sending an AUTHENTICATE cell really
   generated by Alice.  Because Bob will check that the random bytes
   in the AUTH_CHALLENGE cell will influence the SLOG hash, Mallory
   needs to send Bob's challenge to Alice, and can't use any other
   AUTHENTICATE cell that Alice generated before.  But because the
   AUTHENTICATE cell Alice will generate will include in the SCERT
   field a hash of the link certificate used by Mallory, Bob will
   reject it as not being valid to connect to him.

6.4. Protocol downgrade attacks won't work.

   Assuming that Alice checks the certificates from Bob, she knows
   that Bob really sent her the VERSION cell that she received.

   Because the AUTHENTICATE cell from Alice includes signed hashes
   of the VERSIONS cells from Alice and Bob, Bob knows that Alice
   got the VERSIONS cell he sent and sent the VERSIONS cell that he
   received.

   But what about attempts to downgrade the protocol earlier in the
   handshake?  Here TLS comes to the rescue: because the TLS
   Finished handshake message includes an authenticated digest of
   everything previously said during the handshake, an attacker
   can't replace the client's ciphersuite list (to trigger a
   downgrade to the v1 protocol) or the server's certificate [chain]
   (to trigger a downgrade to the v1 or v2 protocol).

7. Design considerations

   I previously considered adding our own certificate format in
   order to avoid the pain associated with X509, but decided instead
   to simply use X509 since a correct Tor implementation will
   already need to have X509 code to handle the other handshake
   versions and to use TLS.

   The trickiest part of the design here is deciding what to stick
   in the AUTHENTICATE cell.  Some of it is strictly necessary, and
   some of it is left there for security margin in case my other
   security arguments fail.  Because of the CID and SID elements
   you can't use an AUTHENTICATE cell for anything other than
   authenticating a client ID to a server with an appropriate
   server ID.  The SLOG and CLOG elements are there mostly to
   authenticate the VERSIONS cells and resist downgrade attacks
   once there are two versions of this.  The presence of the
   AUTH_CHALLENGE field in the stuff authenticated in SLOG
   prevents replays and ensures that the AUTHENTICATE cell was
   really generated by somebody who is reading what the server is
   sending over the TLS connection.  The SCERT element is meant to
   prevent MITM attacks.  When the TLSSECRET field is
   used, it should prevent the use of the AUTHENTICATE cell for
   anything other than the TLS connection the client had in mind.

   A signature of the TLSSECRET element on its own should also be
   sufficient to prevent the attacks we care about.  The redundancy
   here should come in handy if I've made a mistake somewhere else in
   my analysis.

   If the client checks the server's certificates and matches them
   to the TLS connection link key before proceding with the
   handshake, then signing the contents of the AUTH_CHALLENGE cell
   would be sufficient to authenticate the client.  But implementers
   of allegedly compatible Tor clients have in the past skipped
   certificate verification steps, and I didn't want a client's
   failure to verify certificates to mean that a server couldn't
   trust that he was really talking to the client.  To prevent this,
   I added the TLS link certificate to the authenticated data: even
   if the Tor client code doesn't check any certificates, the TLS
   library code will still check that the certificate used in the
   handshake contains a link key that matches the one used in the
   handshake.

8. Open questions:

  - May we cache which certificates we've already verified?  It
    might leak in timing whether we've connected with a given server
    before, and how recently.

  - With which TLS libraries is it feasible to yoink client_random,
    server_random, and the master secret?  If the answer is "All
    free C TLS libraries", great.  If the answer is "OpenSSL only",
    not so great.

  - Should we do anything to check the timestamp in the AUTHENTICATE
    cell?

  - Can we give some way for clients to signal "I want to use the
    V3 protocol if possible, but I can't renegotiate, so don't give
    me the V2"?  Clients currently have a fair idea of server
    versions, so they could potentially do the V3 handshake with
    servers that support it, and fall back to V1 otherwise.

  - What should servers that don't have TLS renegotiation do?  For
    now, I think they should just stick with V1.  Eventually we can
    deprecate the V2 handshake as we did with the V1 handshake.
    When that happens, servers can be V3-only.
Filename: 177-flag-abstention.txt
Title: Abstaining from votes on individual flags
Author: Nick Mathewson
Created: 14 Feb 2011
Status: Reserve
Target: 0.2.4.x

Overview:

   We should have a way for authorities to vote on flags in
   particular instances, without having to vote on that flag for all
   servers.

Motivation:

   Suppose that the status of some router becomes controversial, and
   an authority wants to vote for or against the BadExit status of
   that router.  Suppose also that the authority is not currently
   voting on the BadExit flag.  If the authority wants to say that
   the router is or is not "BadExit", it cannot currently do so
   without voting yea or nay on the BadExit status of all other
   routers.

   Suppose that an authority wants to vote "Valid" or "Invalid" on a
   large number of routers, but does not have an opinion on some of
   them.  Currently, it cannot do so: if it votes for the Valid flag
   anywhere, it votes for it everywhere.

Design:

   We add a new line "extra-flags" in directory votes, to appear
   after "known-flags".  It lists zero or more flags that an
   authority has occasional opinions on, but for which the authority
   will usually abstain.  No flag may appear in both extra-flags and
   known-flags.

   In the router-status section for each directory vote, we allow an
   optional "s2" line to appear after the "s" line.  It contains
   zero or more flag votes.  A flag vote is of the form of one of
   "+", "-", or "/" followed by the name of a flag.  "+" denotes a
   yea vote, and "-" denotes a nay vote, and "/" notes an
   abstention.  Authorities may omit most abstentions, except as
   noted below.  No flag may appear in an s2 line unless it appears
   in the known-flags or extra-flags line.We retain the rule that no
   flag may appear in an s line unless it appears in the known-flags
   line.

   When using an appropriate consensus method to vote, we use these
   new rules to determine flags:

   A flag is listed in the consensus if it is in the known-flags
   section of at least one voter, and in the known-flags or
   extra-flags section of at least three voters (or half the
   authorities, whichever set is smaller).

   A single authority's vote for a given flag on a given router is
   interpreted as follows:

      - If the authority votes +Flag or -Flag or /Flag in the s2 line for
        that router, the vote is "yea" or "nay" or "abstain" respectively.
      - Otherwise, if the flag is listed on the "s" line for the
        router, then the vote is "yea".
      - Otherwise, if the flag is listed in the known-flags line,
        then the vote is "nay".
      - Otherwise, the vote is "abstain".

   A router is assigned a flag in the consensus iff the total "yeas"
   outnumber the total "nays".

   As an exception, this proposal does not affect the behavior of
   the "Named" and "Unnamed" flags; these are still treated as
   before.  (An authority can already abstain from a single naming
   decision by not voting Named on any router with a given name.)

Examples:

   Suppose that it becomes important to know which Tor servers are
   operated by burrowing marsupials.  Some authority operators
   diligently research this question; others want to vote about
   individual routers on an ad hoc basis when they learn about a
   particular router's being e.g. located underground in New South
   Wales.

   If an authority usually has no opinions on the RunByWombats flag,
   it should list it in the "extra-flags" of its votes.  If it
   occasionally wants to vote that a router is (or is not) run by
   wombats, it should list "s2 +RunByWombats" or "s2 -RunByWombats"
   for the routers in question.  Otherwise it can omit the flag from
   its s and s2 lines entirely.

   If an authority usually has an opinion on the RunByWombats flag,
   but wants to abstain in some cases, it should list "RunByWombats"
   in the "known-flags" part of its votes, and include
   "RunByWombats" in the s line for every router that it believes is
   run by wombats. When it wants to vote that a router is not run
   by wombats, it should list the RunByWombats flag in neither the s
   nor the s2 line.  When it wants to abstain, it should list "s2
   /RunByWombats".

   In both cases, when the new consensus method is used, a router
   will get listed as "RunByWombats" if there are more authorities
   that say it is run by wombats than there are authorities saying
   it is not run by wombats.  (As now, "no" votes win ties.)


Filename: 178-param-voting.txt
Title: Require majority of authorities to vote for consensus parameters
Author: Sebastian Hahn
Created: 16-Feb-2011
Status: Closed
Implemented-In: 0.2.3.9-alpha

Overview:

The consensus that the directory authorities create may contain one or
more parameters (32-bit signed integers) that influence the behavior
of Tor nodes (see proposal 167, "Vote on network parameters in
consensus" for more details).

Currently (as of consensus method 11), a consensus will end up
containing a parameter if at least one directory authority votes for
that paramater. The value of the parameter will be the low-median of
all the votes for this parameter.

This proposal aims at changing this voting process to be more secure
against tampering by a small fraction of directory authorities.

Motivation:

To prevent a small fraction of the directory authorities from
influencing the value of a parameter unduly, a big enough fraction
of all directory authorities authorities has to vote for that
parameter. This is not currently happening, and it is in fact not
uncommon for a single authority to govern the value of a consensus
parameter.

Design:

When the consensus is generated, the directory authorities ensure that
a param is only included in the list of params if at least three of the
authorities (or a simple majority, whichever is the smaller number)
votes for that param. The value chosen is the low-median of all the
votes. We don't mandate that the authorities have to vote on exactly
the same value for it to be included because some consensus parameters
could be the result of active measurements that individual authorities
make.

Security implications:

This change is aimed at improving the security of Tor nodes against
attacks carried out by a small fraction of directory authorities. It
is possible that a consensus parameter that would be helpful to the
network is not included because not enough directory authorities
voted for it, but since clients are required to have sane defaults
in case the parameter is absent this does not carry a security risk.

This proposal makes a security vs coordination effort tradeoff. When
considering only the security of the design, it would be better to
require a simple majority of directory authorities to agree on
voting on a parameter, but it would involve requiring more
directory authority operators to coordinate their actions to set the
parameter successfully.

Specification:

dir-spec section 3.4 currently says:

     Entries are given on the "params" line for every keyword on which any
     authority voted.  The values given are the low-median of all votes on
     that keyword.

It is proposed that the above is changed to:

     Entries are given on the "params" line for every keyword on which a
     majority of authorities (total authorities, not just those
     participating in this vote) voted on, or if at least three
     authorities voted for that parameter. The values given are the
     low-median of all votes on that keyword.

     Consensus methods 11 and before, entries are given on the "params"
     line for every keyword on which any authority voted, the value given
     being the low-median of all votes on that keyword.

The following should be added to the bottom of section 3.4.:

        * If consensus method 12 or later is used, only consensus
          parameters that more than half of the total number of
          authorities voted for are included in the consensus.

The following line should be added to the bottom of section 3.4.1.:

     "12" -- Params are only included if enough auths voted for them

Compatibility:

A sufficient number of directory authorities must upgrade to the new
consensus method used to calculate the params in the way this proposal
calls for, otherwise the old mechanism is used. Nodes that do not act
as directory authorities do not need to be upgraded and should
experience no change in behaviour.

Implementation:

An example implementation of this feature can be found in
https://gitweb.torproject.org/sebastian/tor.git, branch safer_params.

Filename: 179-TLS-cert-and-parameter-normalization.txt
Title: TLS certificate and parameter normalization
Author: Jacob Appelbaum, Gladys Shufflebottom
Created: 16-Feb-2011
Status: Closed
Target: 0.2.3.x


        Draft spec for TLS certificate and handshake normalization


                                    Overview

     STATUS NOTE:

     This document is implemented in part in 0.2.3.x, deferred in part, and
     rejected in part.  See indented bracketed comments in individual
     sections below for more information. -NM

Scope

This is a document that proposes improvements to problems with Tor's
current TLS (Transport Layer Security) certificates and handshake that will
reduce the distinguishability of Tor traffic from other encrypted traffic that
uses TLS.  It also addresses some of the possible fingerprinting attacks
possible against the current Tor TLS protocol setup process.

Motivation and history

Censorship is an arms race and this is a step forward in the defense
of Tor.  This proposal outlines ideas to make it more difficult to
fingerprint and block Tor traffic.

Goals

This proposal intends to normalize or remove easy-to-predict or static
values in the Tor TLS certificates and with the Tor TLS setup process.
These values can be used as criteria for the automated classification of
encrypted traffic as Tor traffic. Network observers should not be able
to trivially detect Tor merely by receiving or observing the certificate
used or advertised by a Tor relay. I also propose the creation of
a hard-to-detect covert channel through which a server can signal that it
supports the third version ("V3") of the Tor handshake protocol.

Non-Goals

This document is not intended to solve all of the possible active or passive
Tor fingerprinting problems. This document focuses on removing distinctive
and predictable features of TLS protocol negotiation; we do not attempt to
make guarantees about resisting other kinds of fingerprinting of Tor
traffic, such as fingerprinting techniques related to timing or volume of
transmitted data.

                                Implementation details


Certificate Issues

The CN or commonName ASN1 field

Tor generates certificates with a predictable commonName field; the
field is within a given range of values that is specific to Tor.
Additionally, the generated host names have other undesirable properties.
The host names typically do not resolve in the DNS because the domain
names referred to are generated at random. Although they are syntatically
valid, they usually refer to domains that have never been registered by
any domain name registrar.

An example of the current commonName field: CN=www.s4ku5skci.net

An example of OpenSSL’s asn1parse over a typical Tor certificate:

   0:d=0  hl=4 l= 438 cons: SEQUENCE
    4:d=1  hl=4 l= 287 cons: SEQUENCE
    8:d=2  hl=2 l=   3 cons: cont [ 0 ]
   10:d=3  hl=2 l=   1 prim: INTEGER           :02
   13:d=2  hl=2 l=   4 prim: INTEGER           :4D3C763A
   19:d=2  hl=2 l=  13 cons: SEQUENCE
   21:d=3  hl=2 l=   9 prim: OBJECT            :sha1WithRSAEncryption
   32:d=3  hl=2 l=   0 prim: NULL
   34:d=2  hl=2 l=  35 cons: SEQUENCE
   36:d=3  hl=2 l=  33 cons: SET
   38:d=4  hl=2 l=  31 cons: SEQUENCE
   40:d=5  hl=2 l=   3 prim: OBJECT            :commonName
   45:d=5  hl=2 l=  24 prim: PRINTABLESTRING   :www.vsbsvwu5b4soh4wg.net
   71:d=2  hl=2 l=  30 cons: SEQUENCE
   73:d=3  hl=2 l=  13 prim: UTCTIME           :110123184058Z
   88:d=3  hl=2 l=  13 prim: UTCTIME           :110123204058Z
  103:d=2  hl=2 l=  28 cons: SEQUENCE
  105:d=3  hl=2 l=  26 cons: SET
  107:d=4  hl=2 l=  24 cons: SEQUENCE
  109:d=5  hl=2 l=   3 prim: OBJECT            :commonName
  114:d=5  hl=2 l=  17 prim: PRINTABLESTRING   :www.s4ku5skci.net
  133:d=2  hl=3 l= 159 cons: SEQUENCE
  136:d=3  hl=2 l=  13 cons: SEQUENCE
  138:d=4  hl=2 l=   9 prim: OBJECT            :rsaEncryption
  149:d=4  hl=2 l=   0 prim: NULL
  151:d=3  hl=3 l= 141 prim: BIT STRING
  295:d=1  hl=2 l=  13 cons: SEQUENCE
  297:d=2  hl=2 l=   9 prim: OBJECT            :sha1WithRSAEncryption
  308:d=2  hl=2 l=   0 prim: NULL
  310:d=1  hl=3 l= 129 prim: BIT STRING

I propose that we match OpenSSL's default self-signed certificates. I hypothesise
that they are the most common self-signed certificates. If this turns out not
to be the case, then we should use whatever the most common turns out to be.

Certificate serial numbers

Currently our generated certificate serial number is set to the number of
seconds since the epoch at the time of the certificate's creation. I propose
that we should ensure that our serial numbers are unrelated to the epoch,
since the generation methods are potentially recognizable as Tor-related.

Instead, I propose that we use a randomly generated number that is
subsequently hashed with SHA-512 and then truncate the data to eight bytes[1].

Random sixteen byte values appear to be the high bound for serial number as
issued by Verisign and DigiCert.  RapidSSL appears to be three bytes in length.
Others common byte lengths appear to be between one and four bytes. The default
OpenSSL certificates are eight bytes and we should use this length with our
self-signed certificates.

This randomly generated serial number field may now serve as a covert channel
that signals to the client that the OR will not support TLS renegotiation; this
means that the client can expect to perform a V3 TLS handshake setup.
Otherwise, if the serial number is a reasonable time since the epoch, we should
assume the OR is using an earlier protocol version and hence that it expects
renegotiation.

We also have a need to signal properties with our certificates for a possible
v3 handshake in the future. Therefore I propose that we match OpenSSL default
self-signed certificates (a 64-bit random number), but reserve the two least-
significant bits for signaling. For the moment, these two bits will be zero.

This means that an attacker may be able to identify Tor certificates from default
OpenSSL certificates with a 75% probability.

As a security note, care must be taken to ensure that supporting this
covert channel will not lead to an attacker having a method to downgrade client
behavior. This shouldn't be a risk because the TLS Finished message hashes over
all the bytes of the handshake, including the certificates.

     [Randomized serial numbers are implemented in 0.2.3.9-alpha. We probably
     shouldn't do certificate tagging by a covert channel in serial numbers,
     since doing so would mean we could never have an externally signed
     cert. -NM]

Certificate fingerprinting issues expressed as base64 encoding

It appears that all deployed Tor certificates have the following strings in
common:

MIIB
CCA
gAwIBAgIETU
ANBgkqhkiG9w0BAQUFADA
YDVQQDEx
3d3cu

As expected these values correspond to specific ASN.1 OBJECT IDENTIFIER (OID)
properties (sha1WithRSAEncryption, commonName, etc) of how we generate our
certificates.

As an illustrated example of the common bytes of all certificates used within
the Tor network within a single one hour window, I have replaced the actual
value with a wild card ('.') character here:

-----BEGIN CERTIFICATE-----
MIIB..CCA..gAwIBAgIETU....ANBgkqhkiG9w0BAQUFADA.M..w..YDVQQDEx.3
d3cu............................................................
................................................................
................................................................
................................................................
................................................................
................................................................
................................................................
................................................................
........................... <--- Variable length and padding
-----END CERTIFICATE-----

This fine ascii art only illustrates the bytes that absolutely match in all
cases.  In many cases, it's likely that there is a high probability for a given
byte to be only a small subset of choices.

Using the above strings, the EFF's certificate observatory may trivially
discover all known relays, known bridges and unknown bridges in a single SQL
query.  I propose that we ensure that we test our certificates to ensure that
they do not have these kinds of statistical similarities without ensuring
overlap with a very large cross section of the internet's certificates.

Certificate dating and validity issues

TLS certificates found in the wild are generally found to be long-lived;
they are frequently old and often even expired. The current Tor certificate
validity time is a very small time window starting at generation time and
ending shortly thereafter, as defined in or.h by MAX_SSL_KEY_LIFETIME
(2*60*60).

I propose that the certificate validity time length is extended to a period of
twelve Earth months, possibly with a small random skew to be determined by the
implementer. Tor should randomly set the start date in the past or some
currently unspecified window of time before the current date. This would
more closely track the typical distribution of non-Tor TLS certificate
expiration times.

The certificate values, such as expiration, should not be used for anything
relating to security; for example, if the OR presents an expired TLS
certificate, this does not imply that the client should terminate the
connection (as would be appropriate for an ordinary TLS implementation).
Rather, I propose we use a TOFU style expiration policy - the certificate
should never be trusted for more than a two hour window from first sighting.

This policy should have two major impacts. The first is that an adversary will
have to perform a differential analysis of all certificates for a given IP
address rather than a single check. The second is that the server expiration
time is enforced by the client and confirmed by keys rotating in the consensus.

The expiration time should not be a fixed time that is simple to calculate by
any Deep Packet Inspection device or it will become a new Tor TLS setup
fingerprint.

   [Deferred and needs revision; see proposal XXX. -NM]

Proposed certificate form

The following output from openssl asn1parse results from the proposed
certificate generation algorithm. It matches the results of generating a
default self-signed certificate:

    0:d=0  hl=4 l= 513 cons: SEQUENCE          
    4:d=1  hl=4 l= 362 cons: SEQUENCE          
    8:d=2  hl=2 l=   9 prim: INTEGER           :DBF6B3B864FF7478
   19:d=2  hl=2 l=  13 cons: SEQUENCE          
   21:d=3  hl=2 l=   9 prim: OBJECT            :sha1WithRSAEncryption
   32:d=3  hl=2 l=   0 prim: NULL              
   34:d=2  hl=2 l=  69 cons: SEQUENCE          
   36:d=3  hl=2 l=  11 cons: SET               
   38:d=4  hl=2 l=   9 cons: SEQUENCE          
   40:d=5  hl=2 l=   3 prim: OBJECT            :countryName
   45:d=5  hl=2 l=   2 prim: PRINTABLESTRING   :AU
   49:d=3  hl=2 l=  19 cons: SET               
   51:d=4  hl=2 l=  17 cons: SEQUENCE          
   53:d=5  hl=2 l=   3 prim: OBJECT            :stateOrProvinceName
   58:d=5  hl=2 l=  10 prim: PRINTABLESTRING   :Some-State
   70:d=3  hl=2 l=  33 cons: SET               
   72:d=4  hl=2 l=  31 cons: SEQUENCE          
   74:d=5  hl=2 l=   3 prim: OBJECT            :organizationName
   79:d=5  hl=2 l=  24 prim: PRINTABLESTRING   :Internet Widgits Pty Ltd
  105:d=2  hl=2 l=  30 cons: SEQUENCE          
  107:d=3  hl=2 l=  13 prim: UTCTIME           :110217011237Z
  122:d=3  hl=2 l=  13 prim: UTCTIME           :120217011237Z
  137:d=2  hl=2 l=  69 cons: SEQUENCE          
  139:d=3  hl=2 l=  11 cons: SET               
  141:d=4  hl=2 l=   9 cons: SEQUENCE          
  143:d=5  hl=2 l=   3 prim: OBJECT            :countryName
  148:d=5  hl=2 l=   2 prim: PRINTABLESTRING   :AU
  152:d=3  hl=2 l=  19 cons: SET               
  154:d=4  hl=2 l=  17 cons: SEQUENCE          
  156:d=5  hl=2 l=   3 prim: OBJECT            :stateOrProvinceName
  161:d=5  hl=2 l=  10 prim: PRINTABLESTRING   :Some-State
  173:d=3  hl=2 l=  33 cons: SET               
  175:d=4  hl=2 l=  31 cons: SEQUENCE          
  177:d=5  hl=2 l=   3 prim: OBJECT            :organizationName
  182:d=5  hl=2 l=  24 prim: PRINTABLESTRING   :Internet Widgits Pty Ltd
  208:d=2  hl=3 l= 159 cons: SEQUENCE          
  211:d=3  hl=2 l=  13 cons: SEQUENCE          
  213:d=4  hl=2 l=   9 prim: OBJECT            :rsaEncryption
  224:d=4  hl=2 l=   0 prim: NULL              
  226:d=3  hl=3 l= 141 prim: BIT STRING        
  370:d=1  hl=2 l=  13 cons: SEQUENCE          
  372:d=2  hl=2 l=   9 prim: OBJECT            :sha1WithRSAEncryption
  383:d=2  hl=2 l=   0 prim: NULL              
  385:d=1  hl=3 l= 129 prim: BIT STRING        

    [Rejected pending more evidence; this pattern is trivially detectable,
    and there is just not enough reason at the moment to think that this
    particular certificate pattern is common enough for sites that matter
    that the censors wouldn't be willing to block it. -NM]

Custom Certificates

It should be possible for a Tor relay operator to use a specifically supplied
certificate and secret key. This will allow a relay or bridge operator to use a
certificate signed by any member of any geographically relevant certificate
authority racket; it will also allow for any other user-supplied certificate.
This may be desirable in some kinds of filtered networks or when attempting to
avoid attracting suspicion by blending in with the TLS web server certificate
crowd.

    [Deferred; see proposal XXX]

Problematic Diffie–Hellman parameters

We currently send a static Diffie–Hellman parameter, prime p (or “prime p
outlaw”) as specified in RFC2409 as part of the TLS Server Hello response.

The use of this prime in TLS negotiations may, as a result, be filtered and
effectively banned by certain networks. We do not have to use this particular
prime in all cases.

While amusing to have the power to make specific prime numbers into a new class
of numbers (cf. imaginary, irrational, illegal [3]) - our new friend prime p
outlaw is not required.

The use of this prime in TLS negotiations may, as a result, be filtered and
effectively banned by certain networks. We do not have to use this particular
prime in all cases.

I propose that the function to initialize and generate DH parameters be
split into two functions.

First, init_dh_param() should be used only for OR-to-OR DH setup and
communication. Second, it is proposed that we create a new function
init_tls_dh_param() that will have a two-stage development process.

The first stage init_tls_dh_param() will use the same prime that
Apache2.x [4] sends (or “dh1024_apache_p”), and this change should be
made immediately. This is a known good and safe prime number (p-1 / 2
is also prime) that is currently not known to be blocked.

The second stage init_tls_dh_param() should randomly generate a new prime on a
regular basis; this is designed to make the prime difficult to outlaw or
filter.  Call this a shape-shifting or "Rakshasa" prime.  This should be added
to the 0.2.3.x branch of Tor. This prime can be generated at setup or execution
time and probably does not need to be stored on disk. Rakshasa primes only
need to be generated by Tor relays as Tor clients will never send them. Such
a prime should absolutely not be shared between different Tor relays nor
should it ever be static after the 0.2.3.x release.

As a security precaution, care must be taken to ensure that we do not generate
weak primes or known filtered primes. Both weak and filtered primes will
undermine the TLS connection security properties. OpenSSH solves this issue
dynamically in RFC 4419 [5] and may provide a solution that works reasonably
well for Tor. More research in this area including the applicability of
Miller-Rabin or AKS primality tests[6] will need to be analyzed and probably
added to Tor.

      [Randomized DH groups are implemented in 0.2.3.9-alpha. -NM]

Practical key size

Currently we use a 1024 bit long RSA modulus. I propose that we increase the
RSA key size to 2048 as an additional channel to signal support for the V3
handshake setup.  2048 appears to be the most common key size[0] above 1024.
Additionally, the increase in modulus size provides a reasonable security boost
with regard to key security properties.

The implementer should increase the 1024 bit RSA modulus to 2048 bits.

     [Deferred and needs performance analysis.  See proposal
     XXX. Additionally, DH group strength seems far more crucial. Still, this
     is out-of-scope for a "normalization" question. -NM]

Possible future filtering nightmares

At some point it may cost effective or politically feasible for a network
filter to simply block all signed or self-signed certificates without a known
valid CA trust chain. This will break many applications on the internet and
hopefully, our option for custom certificates will ensure that this step is
simply avoided by the censors.

The Rakshasa prime approach may cause censors to specifically allow only
certain known and accepted DH parameters.


Appendix: Other issues

What other obvious TLS certificate issues exist? What other static values are
present in the Tor TLS setup process?

[0] http://archives.seul.org/or/dev/Jan-2011/msg00051.html
[1] http://archives.seul.org/or/dev/Feb-2011/msg00016.html
[2] http://archives.seul.org/or/dev/Feb-2011/msg00039.html
[3] To be fair this is hardly a new class of numbers. History is rife with
    similar examples of inane authoritarian attempts at mathematical secrecy.
    Probably the most dramatic example is the story of the pupil Hipassus of
    Metapontum, pupil of the famous Pythagoras, who, legend goes, proved the
    fact that Root2 cannot be expressed as a fraction of whole numbers (now
    called an irrational number) and was assassinated for revealing this
    secret.  Further reading on the subject may be found on the Wikipedia:
    http://en.wikipedia.org/wiki/Hippasus

[4] httpd-2.2.17/modules/ss/ssl_engine_dh.c
[5] http://tools.ietf.org/html/rfc4419
[6] http://archives.seul.org/or/dev/Jan-2011/msg00037.html
Filename: 180-pluggable-transport.txt
Title: Pluggable transports for circumvention
Author: Jacob Appelbaum, Nick Mathewson
Created: 15-Oct-2010
Status: Closed
Implemented-In: 0.2.3.x

Overview

  This proposal describes a way to decouple protocol-level obfuscation
  from the core Tor protocol in order to better resist client-bridge
  censorship.  Our approach is to specify a means to add pluggable
  transport implementations to Tor clients and bridges so that they can
  negotiate a superencipherment for the Tor protocol.

Scope

  This is a document about transport plugins; it does not cover
  discovery improvements, or bridgedb improvements.  While these
  requirements might be solved by a program that also functions as a
  transport plugin, this proposal only covers the requirements and
  operation of transport plugins.

Motivation

  Frequently, people want to try a novel circumvention method to help
  users connect to Tor bridges.  Some of these methods are already
  pretty easy to deploy: if the user knows an unblocked VPN or open
  SOCKS proxy, they can just use that with the Tor client today.

  Less easy to deploy are methods that require participation by both the
  client and the bridge.  In order of increasing sophistication, we
  might want to support:

  1. A protocol obfuscation tool that transforms the output of a TLS
     connection into something that looks like HTTP as it leaves the
     client, and back to TLS as it arrives at the bridge.
  2. An additional authentication step that a client would need to
     perform for a given bridge before being allowed to connect.
  3. An information passing system that uses a side-channel in some
     existing protocol to convey traffic between a client and a bridge
     without the two of them ever communicating directly.
  4. A set of clients to tunnel client->bridge traffic over an existing
     large p2p network, such that the bridge is known by an identifier
     in that network rather than by an IP address.

  We could in theory support these almost fine with Tor as it stands
  today: every Tor client can take a SOCKS proxy to use for its outgoing
  traffic, so a suitable client proxy could handle the client's traffic
  and connections on its behalf, while a corresponding program on the
  bridge side could handle the bridge's side of the protocol
  transformation.  Nevertheless, there are some reasons to add support
  for transportation plugins to Tor itself:

  1. It would be good for bridges to have a standard way to advertise
     which transports they support, so that clients can have multiple
     local transport proxies, and automatically use the right one for
     the right bridge.

  2. There are some changes to our architecture that we'll need for a
     system like this to work.  For testing purposes, if a bridge blocks
     off its regular ORPort and instead has an obfuscated ORPort, the
     bridge authority has no way to test it.  Also, unless the bridge
     has some way to tell that the bridge-side proxy at 127.0.0.1 is not
     the origin of all the connections it is relaying, it might decide
     that there are too many connections from 127.0.0.1, and start
     paring them down to avoid a DoS.

  3. Censorship and anticensorship techniques often evolve faster than
     the typical Tor release cycle.  As such, it's a good idea to
     provide ways to test out new anticensorship mechanisms on a more
     rapid basis.

  4. Transport obfuscation is a relatively distinct problem
     from the other privacy problems that Tor tries to solve, and it
     requires a fairly distinct skill-set from hacking the rest of Tor.
     By decoupling transport obfuscation from the Tor core, we hope to
     encourage people working on transport obfuscation who would
     otherwise not be interested in hacking Tor.

  5. Finally, we hope that defining a generic transport obfuscation plugin
     mechanism will be useful to other anticensorship projects.

Non-Goals

  We're not going to talk about automatic verification of plugin
  correctness and safety via sandboxing, proof-carrying code, or
  whatever.

  We need to do more with discovery and distribution, but that's not
  what this proposal is about.  We're pretty convinced that the problems
  are sufficiently orthogonal that we should be fine so long as we don't
  preclude a single program from implementing both transport and
  discovery extensions.

  This proposal is not about what transport plugins are the best ones
  for people to write.  We do, however, make some general
  recommendations for plugin authors in an appendix.

  We've considered issues involved with completely replacing Tor's TLS
  with another encryption layer, rather than layering it inside the
  obfuscation layer.  We describe how to do this in an appendix to the
  current proposal, though we are not currently sure whether it's a good
  idea to implement.

  We deliberately reject any design that would involve linking the
  transport plugins into Tor's process space.

Design overview

  To write a new transport protocol, an implementer must provide two
  pieces: a "Client Proxy" to run at the initiator side, and a "Server
  Proxy" to run at the server side.  These two pieces may or may not be
  implemented by the same program.

  Each client may run any number of Client Proxies.  Each one acts like
  a SOCKS proxy that accepts connections on localhost.  Each one
  runs on a different port, and implements one or more transport
  methods.  If the protocol has any parameters, they are passed from Tor
  inside the regular username/password parts of the SOCKS protocol.

  Bridges (and maybe relays) may run any number of Server Proxies: these
  programs provide an interface like stunnel: they get connections from the
  network (typically by listening for connections on the network) and relay
  them to the Bridge's real ORPort.

  To configure one of these programs, it should be sufficient simply to
  list it in your torrc.  The program tells Tor which transports it
  provides.  The Tor consensus should carry a new approved version number that
  is specific for pluggable transport; this will allow Tor to know when a
  particular transport is known to be unsafe, safe, or non-functional.

  Bridges (and maybe relays) report in their descriptors which transport
  protocols they support.  This information can be copied into bridge
  lines.  Bridges using a transport protocol may have multiple bridge
  lines.

  Any methods that are wildly successful, we can bake into Tor.

Specifications: Client behavior

  We extend the bridge line format to allow you to say which method
  to use to connect to a bridge.

  The new format is:
     Bridge method address:port [[keyid=]id-fingerprint] [k=v] [k=v] [k=v]

  To connect to such a bridge, the Tor program needs to know which
  SOCKS proxy will support the transport called "method".  It
  then connects to this proxy, and asks it to connect to
  address:port.  If [id-fingerprint] is provided, Tor should expect
  the public identity key on the TLS connection to match the digest
  provided in [id-fingerprint].  If any [k=v] items are provided,
  they are configuration parameters for the proxy: Tor should
  separate them with semicolons and put them in the user and
  password fields of the request, splitting them across the fields
  as necessary.  If a key or value value must contain a semicolon or
  a backslash, it is escaped with a backslash.

  Method names must be C identifiers.

  For reference, the old bridge format was
    Bridge address[:port] [id-fingerprint]
  where port defaults to 443 and the id-fingerprint is optional. The
  new format can be distinguished from the old one by checking if the
  first argument has any non-C-identifier characters. (Looking for a
  period should be a simple way.) Also, while the id-fingerprint could
  optionally include whitespace in the old format, whitespace in the
  id-fingerprint is not permitted in the new format.

  Example: if the bridge line is "bridge trebuchet www.example.com:3333
     keyid=09F911029D74E35BD84156C5635688C009F909F9 rocks=20 height=5.6m"
     AND if the Tor client knows that the 'trebuchet' method is supported,
     the client should connect to the proxy that provides the 'trebuchet'
     method, ask it to connect to www.example.com, and provide the string
     "rocks=20;height=5.6m" as the username, the password, or split
     across the username and password.

  There are two ways to tell Tor clients about protocol proxies:
  external proxies and managed proxies.  An external proxy is configured
  with
     ClientTransportPlugin <method> socks4 <address:port> [auth=X]
  or
     ClientTransportPlugin <method> socks5 <address:port> [username=X] [password=Y]
  as in
     "ClientTransportPlugin trebuchet socks5 127.0.0.1:9999".
  This example tells Tor that another program is already running to handle
  'trubuchet' connections, and Tor doesn't need to worry about it.

  A managed proxy is configured with
     ClientTransportPlugin <methods> exec <path> [options]
  as in
    "ClientTransportPlugin trebuchet exec /usr/libexec/trebuchet --managed".
  This example tells Tor to launch an external program to provide a
  socks proxy for 'trebuchet' connections. The Tor client only
  launches one instance of each external program with a given set of
  options, even if the same executable and options are listed for
  more than one method.

  In managed proxies, <methods> can be a comma-separated list of
  pluggable transport method names, as in:
    "ClientTransportPlugin pawn,bishop,rook exec /bin/ptproxy --managed".

  If instead of a transport method, the torrc lists "*" for a managed
  proxy, Tor uses that proxy for all transport methods that the plugin
  supports. So "ClientTransportPlugin * exec /usr/libexec/tor/foobar"
  tells Tor that Tor should use the foobar plugin for every method that
  the proxy supports. See the "Managed proxy interface" section below
  for details on how Tor learns which methods a plugin supports.

  If two plugins support the same method, Tor should use whichever
  one is listed first.

  The same program can implement a managed or an external proxy: it just
  needs to take an argument saying which one to be.

Server behavior

  Server proxies are configured similarly to client proxies.  When
  launching a proxy, the server must tell it what ORPort it has
  configured, and what address (if any) it can listen on.  The
  server must tell the proxy which (if any) methods it should
  provide if it can; the proxy needs to tell the server which
  methods it is actually providing, and on what ports.

  When a client connects to the proxy, the proxy may need a way to
  tell the server some identifier for the client address.  It does
  this in-band.

  As before, the server lists proxies in its torrc.  These can be
  external proxies that run on their own, or managed proxies that Tor
  launches.

  An external server proxy is configured as
     ServerTransportPlugin <method> proxy <address:port> <param=val> ...
  as in
     "ServerTransportPlugin trebuchet proxy 127.0.0.1:999 rocks=heavy".
  The param=val pairs and the address are used to make the bridge
  configuration information that we'll tell users.

  A managed proxy is configured as
     ServerTransportPlugin <methods> exec </path/to/binary> [options]
  or
     ServerTransportPlugin * exec </path/to/binary> [options]

  When possible, Tor should launch only one binary of each binary/option
  pair configured.  So if the torrc contains

     ClientTransportPlugin foo exec /usr/bin/megaproxy --foo
     ClientTransportPlugin bar exec /usr/bin/megaproxy --bar
     ServerTransportPlugin * exec /usr/bin/megaproxy --foo

  then Tor will launch the megaproxy binary twice: once with the option
  --foo and once with the option --bar.

Managed proxy interface

   When the Tor client or relay launches a managed proxy, it communicates
   via environment variables.  At a minimum, it sets (in addition to the
   normal environment variables inherited from Tor):

      {Client and server}

      "TOR_PT_STATE_LOCATION" -- A filesystem directory path where the
       proxy should store state if it wants to.  This directory is not
       required to exist, but the proxy SHOULD be able to create it if
       it doesn't.  The proxy MUST NOT store state elsewhere.
      Example: TOR_PT_STATE_LOCATION=/var/lib/tor/pt_state/

      "TOR_PT_MANAGED_TRANSPORT_VER" -- To tell the proxy which
       versions of this configuration protocol Tor supports.  Future
       versions will give a comma-separated list.  Clients MUST accept
       comma-separated lists containing any version that they
       recognize, and MUST work correctly even if some of the versions
       they don't recognize are non-numeric.  Valid version characters
       are non-space, non-comma printing ASCII characters.
      Example: TOR_PT_MANAGED_TRANSPORT_VER=1,1a,2,4B

      {Client only}

      "TOR_PT_CLIENT_TRANSPORTS" -- A comma-separated list of which
       methods this client should enable, or * if all methods should
       be enabled.  The proxy SHOULD ignore methods that it doesn't
       recognize.
      Example: TOR_PT_CLIENT_TRANSPORTS=trebuchet,battering_ram,ballista

      {Server only}

      "TOR_PT_EXTENDED_SERVER_PORT" -- An <address>:<port> where tor
       should be listening for connections speaking the extended
       ORPort protocol (See the "The extended ORPort protocol" section
       below). If tor does not support the extended ORPort protocol,
       it MUST use the empty string as the value of this environment
       variable.
      Example: TOR_PT_EXTENDED_SERVER_PORT=127.0.0.1:4200

      "TOR_PT_ORPORT" -- Our regular ORPort in a form suitable
       for local connections, i.e. connections from the proxy to
       the ORPort.
      Example: TOR_PT_ORPORT=127.0.0.1:9001

      "TOR_PT_SERVER_BINDADDR" -- A comma seperated list of
       <key>-<value> pairs, where <key> is a transport name and
       <value> is the adress:port on which it should listen for client
       proxy connections.
       The keys holding transport names must appear on the same order
       as they appear on TOR_PT_SERVER_TRANSPORTS.
       This might be the advertised address, or might be a local
       address that Tor will forward ports to.  It MUST be an address
       that will work with bind().
      Example:
        TOR_PT_SERVER_BINDADDR=trebuchet-127.0.0.1:1984,ballista-127.0.0.1:4891

      "TOR_PT_SERVER_TRANSPORTS" -- A comma-separated list of server
       methods that the proxy should support, or * if all methods
       should be enabled.  The proxy SHOULD ignore methods that it
       doesn't recognize.
      Example: TOR_PT_SERVER_TRANSPORTS=trebuchet,ballista

  The transport proxy replies by writing NL-terminated lines to
  stdout.  The line metaformat is

      <Line> ::= <Keyword> <OptArgs> <NL>
      <Keyword> ::= <KeywordChar> | <Keyword> <KeywordChar>
      <KeyWordChar> ::= <any US-ASCII alphanumeric, dash, and underscore>
      <OptArgs> ::= <Args>*
      <Args> ::= <SP> <ArgChar> | <Args> <ArgChar>
      <ArgChar> ::= <any US-ASCII character but NUL or NL>
      <SP> ::= <US-ASCII whitespace symbol (32)>
      <NL> ::= <US-ASCII newline (line feed) character (10)>

  Tor MUST ignore lines with keywords that it doesn't recognize.

  First, if there's an error parsing the environment variables, the
  proxy should write:
    ENV-ERROR <errormessage>
  and exit.

  If the environment variables were correctly formatted, the proxy
  should write:
    VERSION <configuration protocol version>
  to say that it supports this configuration protocol version (example
  "VERSION 1"). It must either pick a version that Tor told it about
  in TOR_PT_MANAGED_TRANSPORT_VER, or pick no version at all, say:
     VERSION-ERROR no-version
  and exit.

  The proxy should then open its ports.  If running as a client
  proxy, it should not use fixed ports; instead it should autoselect
  ports to avoid conflicts.  A client proxy should by default only
  listen on localhost for connections.

  A server proxy SHOULD try to listen at a consistent port, though it
  SHOULD pick a different one if the port it last used is now allocated.

  A client or server proxy then should tell which methods it has
  made available and how.  It does this by printing zero or more
  CMETHOD and SMETHOD lines to its stdout.  These lines look like:

   CMETHOD <methodname> socks4/socks5 <address:port> [ARGS=arglist] \
        [OPT-ARGS=arglist]

  as in

   CMETHOD trebuchet socks5 127.0.0.1:19999 ARGS=rocks,height \
              OPT-ARGS=tensile-strength

  The ARGS field lists mandatory parameters that must appear in
  every bridge line for this method. The OPT-ARGS field lists
  optional parameters.  If no ARGS or OPT-ARGS field is provided,
  Tor should not check the parameters in bridge lines for this
  method.

  The proxy should print a single "CMETHODS DONE" line after it is
  finished telling Tor about the client methods it provides.  If it
  tries to supply a client method but can't for some reason, it
  should say:
    CMETHOD-ERROR <methodname> <errormessage>

  A proxy should also tell Tor about the server methods it is providing
  by printing zero or more SMETHOD lines.  These lines look like:

    SMETHOD <methodname> <address:port> [options]

  If there's an error setting up a configured server method, the
  proxy should say:
    SMETHOD-ERROR <methodname> <errormessage>
  as in
    SMETHOD-ERROR trebuchet could not setup 'trebuchet' method

  The 'address:port' part of an SMETHOD line is the address to put
  in the bridge line.  The Options part is a list of space-separated
  K:V flags that Tor should know about.  Recognized options are:

      - FORWARD:1

        If this option is set (for example, because address:port is not
        a publicly accessible address), then Tor needs to forward some
        other address:port to address:port via upnp-helper. Tor would
        then advertise that other address:port in the bridge line instead.

      - ARGS:K=V,K=V,K=V

        If this option is set, the K=V arguments are added to Tor's
        extrainfo document.

      - DECLARE:K=V,...

        If this option is set, the K=V options should be added as
        extension entries to the router descriptor, so clients and other
        relays can make use of it. See ideas/xxx-triangleboy-transport.txt
        for an example situation where the plugin would want to declare
        parameters to other Tors.

      - USE-EXTENDED-PORT:1

        If this option is set, the server plugin is planning to connect
        to Tor's extended server port.

  SMETHOD and CMETHOD lines may be interspersed, to allow the proxies to
  report methods as they become available, even when some methods may
  require probing your network, connecting to some kind of peers, etc
  before they are set up. After the final SMETHOD line, the proxy says
  "SMETHODS DONE".

  The proxy SHOULD NOT tell Tor about a server or client method
  unless it is actually open and ready to use.

  Tor clients SHOULD NOT use any method from a client proxy or
  advertise any method from a server proxy UNLESS it is listed as a
  possible method for that proxy in torrc, and it is listed by the
  proxy as a method it supports.

  Proxies should respond to a single INT signal by closing their
  listener ports and not accepting any new connections, but keeping
  all connections open, then terminating when connections are all
  closed.  Proxies should respond to a second INT signal by shutting
  down cleanly.

  The managed proxy configuration protocol version defined in this
  section is "1".
  So, for example, if tor supports this configuration protocol it
  should set the environment variable:
    TOR_PT_MANAGED_TRANSPORT_VER=1

The Extended ORPort protocol

  The Extended ORPort protocol is described in proposal 196.

Advertising bridge methods

  Bridges put the 'method' lines in their extra-info documents.

     transport SP <transportname> SP <address:port> [SP arglist] NL

  The address:port are as returned from an SMETHOD line (unless they are
  replaced by the FORWARD: directive).  The arglist is a K=V,... list as
  returned in the ARGS: part of the SMETHOD line's Options component.

  If the SMETHOD line includes a DECLARE: part, the router descriptor gets
  a new line:

     transport-info SP <transportname> [SP arglist] NL

Bridge authority behavior

  We need to specify a way to test different transport methods that
  bridges claim to support.  We should test as many as possible.  We
  should NOT require that we have a way to test every possible
  transport method before we allow its use: the point of this design
  is to remove bottlenecks in transport deployment.

Bridgedb behavior

  Bridgedb can, given a set of router descriptors and their
  corresponding extrainfo documents, generate a set of bridge lines
  for each bridge.  Bridgedb may want to avoid handing out
  methods that seem to get bridges blocked quickly.

Implementation plan

  First, we should implement per-bridge proxies via the "external
  proxy" method described in "Specifications: Client behavior".  Also,
  we'll want to build the
  extended-server-port mechanism.  This will let bridges run
  transport proxies such that they can generate bridge lines to
  give to clients for testing, so long as the user configures and
  launches their proxies on their own.

  Once that's done, we can see if we need any managed proxies, or if
  the whole idea there is silly.

  If we do, the next most important part seems to be getting
  the client-side automation part written.  And once that's done, we
  can evaluate how much of the server side is easy for people to do
  and how much is hard.

  The "obfsproxy" obfuscating proxy is a likely candidate for an
  initial transport (trac entry #2760), as is Steven Murdoch's http
  thing (trac entry #2759) or something similar.

Notes on plugins to write

   We should ship a couple of null plugin implementations in one or two
   popular, portable languages so that people get an idea of how to
   write the stuff.

   1. We should have one that's just a proof of concept that does
      nothing but transfer bytes back and forth.

   2. We should implement DNS or HTTP using other software (as Geoff Goodell
      did years ago with DNS) as an example of wrapping existing code into
      our plugin model.

   3. The obfuscated-ssh superencipherment is pretty trivial and pretty
      useful.  It makes the protocol stringwise unfingerprintable.

   4. If we do a raw-traffic proxy, openssh tunnels would be the logical
      choice.

Appendix: recommendations for transports

  Be free/open-source software.  Also, if you think your code might
  someday do so well at circumvention that it should be implemented
  inside Tor, it should use the same license as Tor.

  Tor already uses OpenSSL, Libevent, and zlib.  Before you go and decide
  to use crypto++ in your transport plugin, ask yourself whether OpenSSL
  wouldn't be a nicer choice.

  Be portable: most Tor users are on Windows, and most Tor developers
  are not, so designing your code for just one of these platforms will
  make it either get a small userbase, or poor auditing.

  Think secure: if your code is in a C-like language, and it's hard to
  read it and become convinced it's safe, then it's probably not safe.

  Think small: we want to minimize the bytes that a Windows user needs
  to download for a transport client.

  Avoid security-through-obscurity if possible.  Specify.

  Resist trivial fingerprinting: There should be no good string or regex
  to search for to distinguish your protocol from protocols permitted by
  censors.

  Imitate a real profile: There are many ways to implement most
  protocols -- and in many cases, most possible variants of a given
  protocol won't actually exist in the wild.

Filename: 181-optimistic-data-client.txt
Title: Optimistic Data for Tor: Client Side
Author: Ian Goldberg
Created: 2-Jun-2011
Status: Closed
Implemented-In: 0.2.3.3-alpha

Overview:

This proposal (as well as its already-implemented sibling concerning the
server side) aims to reduce the latency of HTTP requests in particular
by allowing:
1. SOCKS clients to optimistically send data before they are notified
    that the SOCKS connection has completed successfully
2. OPs to optimistically send DATA cells on streams in the CONNECT_WAIT
    state
3. Exit nodes to accept and queue DATA cells while in the
    EXIT_CONN_STATE_CONNECTING state

This particular proposal deals with #1 and #2.

For more details (in general and for #3), see the sibling proposal 174
(Optimistic Data for Tor: Server Side), which has been implemented in
0.2.3.1-alpha.

Motivation:

This change will save one OP<->Exit round trip (down to one from two).
There are still two SOCKS Client<->OP round trips (negligible time) and
two Exit<->Server round trips.  Depending on the ratio of the
Exit<->Server (Internet) RTT to the OP<->Exit (Tor) RTT, this will
decrease the latency by 25 to 50 percent.  Experiments validate these
predictions. [Goldberg, PETS 2010 rump session; see
https://thunk.cs.uwaterloo.ca/optimistic-data-pets2010-rump.pdf ]

Design:

Currently, data arriving on the SOCKS connection to the OP on a stream
in AP_CONN_STATE_CONNECT_WAIT is queued, and transmitted when the state
transitions to AP_CONN_STATE_OPEN.  Instead, when data arrives on the
SOCKS connection to the OP on a stream in AP_CONN_STATE_CONNECT_WAIT
(connection_edge_process_inbuf):

- Check to see whether optimistic data is allowed at all (see below).
- Check to see whether the exit node for this stream supports optimistic
  data (according to tor-spec.txt section 6.2, this means that the
  exit node's version number is at least 0.2.3.1-alpha).  If you don't
  know the exit node's version number (because it's not in your
  hashtable of fingerprints, for example), assume it does *not* support
  optimistic data.
- If both are true, transmit the data on the stream.

Also, when a stream transitions *to* AP_CONN_STATE_CONNECT_WAIT
(connection_ap_handshake_send_begin), do the above checks, and
immediately send any already-queued data if they pass.

SOCKS clients (e.g. polipo) will also need to be patched to take
advantage of optimistic data.  The simplest solution would seem to be to
just start sending data immediately after sending the SOCKS CONNECT
command, without waiting for the SOCKS server reply.  When the SOCKS
client starts reading data back from the SOCKS server, it will first
receive the SOCKS server reply, which may indicate success or failure.
If success, it just continues reading the stream as normal.  If failure,
it does whatever it used to do when a SOCKS connection failed.

Security implications:

ORs (for sure the Exit, and possibly others, by watching the
pattern of packets), as well as possibly end servers, will be able to
tell that a particular client is using optimistic data.  This of course
has the potential to fingerprint clients, dividing the anonymity set.
The usual kind of solution is suggested:

- There is a boolean consensus parameter UseOptimisticData.
- There is a 3-state (-1, 0, 1) configuration parameter
  UseOptimisticData (or give it a distinct name if you like)
  defaulting to -1.
- If the configuration parameter is -1, the OP obeys the consensus
  value; otherwise, it obeys the configuration parameter.

It may be wise to set the consensus parameter to 1 at the same time as
similar other client protocol changes are made (for example, a new
circuit construction protocol) in order to not further subdivide the
anonymity set.

Specification:

The current tor-spec has already been updated by proposal 174 to handle
optimistic data.  It says, in part:

    If the exit node does not support optimistic data (i.e. its version
    number is before 0.2.3.1-alpha), then the OP MUST wait for a
    RELAY_CONNECTED cell before sending any data.  If the exit node
    supports optimistic data (i.e. its version number is 0.2.3.1-alpha
    or later), then the OP MAY send RELAY_DATA cells immediately after
    sending the RELAY_BEGIN cell (and before receiving either a
    RELAY_CONNECTED or RELAY_END cell).

Should the "MAY" be more specific, referring to the consensus
parameters?  Or does the existence of the configuration parameter
override mean it's really "MAY", regardless?

Compatibility:

There are compatibility issues, as mentioned above.  OPs MUST NOT send
optimistic data to Exit nodes whose version numbers predate
0.2.3.1-alpha.  OPs MAY send optimistic data to Exit nodes whose version
numbers match or follow that value.

Implementation:

My git diff is 42 lines long (+17 lines, -1 line), changing only the two
functions mentioned above (connection_edge_process_inbuf and
connection_ap_handshake_send_begin).  This diff does not, however,
handle the configuration options, or check the version number of the
exit node.

I have patched a command-line SOCKS client (webfetch) to use optimistic
data.  I have not attempted to patch polipo, but I have looked at it a
bit, and it seems pretty straightforward.  (Of course, if and when
polipo is deprecated, whatever else speaks SOCKS to the OP should take
advantage of optimistic data.)

Performance and scalability notes:

OPs may queue a little more data, if the SOCKS client pushes it faster
than the OP can write it out.  But that's also true today after the
SOCKS CONNECT returns success, right?
Filename: 182-creditbucket.txt
Title: Credit Bucket
Author: Florian Tschorsch and Björn Scheuermann
Created: 22 Jun 2011
Status: Obsolete

Note: Obsolete because we no longer have a once-per-second bucket refill.

Overview:

  The following proposal targets the reduction of queuing times in onion
  routers. In particular, we focus on the token bucket algorithm in Tor and
  point out that current usage unnecessarily locks cells for long time spans.
  We propose a non-intrusive change in Tor's design which overcomes the
  deficiencies.

Motivation and Background:

  Cell statistics from the Tor network [1] reveal that cells reside in
  individual onion routers' cell queues for up to several seconds. These
  queuing times increase the end-to-end delay very significantly and are
  apparently the largest contributor to overall cell latency in Tor.

  In Tor there exist multiple token buckets on different logical levels. They 
  all work independently. They are used to limit the up- and downstream of an
  onion router. All token buckets are refilled every second with a constant
  amount of tokens that depends on the configured bandwidth limits. For
  example, the so-called RelayedTokenBucket limits relay traffic only. All
  read data of incoming connections are bound to a dedicated read token
  bucket. An analogous mechanism exists for written data leaving the onion
  router. We were able to identify the specific usage and implementation of
  the token bucket algorithm as one cause for very high (and unnecessary)
  queuing times in an onion router.

  We observe that the token buckets in Tor are (surprisingly at a first
  glance) allowed to take on negative fill levels. This is justified by the
  TLS connections between onion routers where whole TLS records need to be
  processed. The token bucket on the incoming side (i.e., the one which
  determines at which rate it is allowed to read from incoming TCP
  connections) in particular often runs into non-negligible negative fill
  levels. As a consequence of this behavior, sometimes slightly more data is
  read than it would be admissible upon strict interpretation of the token
  bucket concept.

  However, the token bucket for limiting the outgoing rate does not take on
  negative fill levels equally often. Consequently, it regularly happens
  that somewhat more data are read on the incoming side than the outgoing
  token bucket allows to be written during the same cycle, even if their
  configured data rates are the same. The respective cells will thus not be
  allowed to leave the onion router immediately. They will thus necessarily
  be queued for at least as long as it takes until the token bucket on the
  outgoing side is refilled again. The refill interval currently is, as
  mentioned before, one second -- so, these cells are delayed for a very
  substantial time. In summary, one could say that the two buckets, on the
  incoming and outgoing side, work like a double door system and frequently
  lock cells for a full token bucket refill interval length.

General Design:

  In order to overcome the described problem, we propose the following 
  changes related to the token bucket algorithm.

  We observe that the token bucket on the outgoing connections with its
  current design is contra productive in the sense of queuing times. We 
  therefore propose modifications to the token bucket algorithm that will
  eliminate the "double door effect" discussed above.

  Let us start from Tor's current approach: Thus, we have a regular token 
  bucket on the reading side with a certain rate and a certain burst size. 
  Let x denote the current amount of tokens in the bucket. On the outgoing 
  side we need something appropriate that monitors and constrains the 
  outgoing rate, but at the same time avoids holding back cells (cf. double 
  door effects) whenever possible.

  Here we propose something that adopts the role of a token bucket, but 
  realizes this functionality in a slightly different way. We call it a 
  "credit bucket". Like a token bucket, the credit bucket also has a current 
  fill level, denoted by y. However, the credit bucket is refilled in a 
  different way.

  To understand how it works, let us look at the possible operations:

  As said, x is the fill level of a regular token bucket on the incoming 
  side   and thus gets incremented periodically according to the configured 
  rate. No changes here.

  If x<=0, we are obviously not allowed to read. If x>0, we are allowed to 
  read up to x bytes of incoming data. If k bytes are read (k<=x), then we 
  update x and y as follows:

    x = x - k        (1)
    y = y + k        (2)

  (1) is the standard token bucket operation on the incoming side. Whenever 
  data is admitted in, though, an additional operation is performed: (2) 
  allocates the same number of bytes on the outgoing side, which will later 
  on allow the same number of bytes to leave the onion router without any 
  delays.

  If y + x > -M, we are allowed to write up to y + x + M bytes on the 
  outgoing side, where M is a positive constant. M specifies a burst size for
  the outgoing side. M should be higher than the number of tokens that get 
  refilled during a refill interval, we would suggest to have M in the order 
  of a few seconds "worth" of data. Now if k bytes are written on the 
  outgoing side, we proceed as follows:

    If k <= y then y = y - k

  In this case we use "saved" credits, previously allocated on the incoming 
  side when incoming data has been processed.

    If k > y then y = 0 and x = x - (k-y)

  We generated additional traffic in the onion router, so that more data is 
  to be sent than has been read (the credit is not sufficient). We therefore 
  "steal" tokens from the token buffer on the incoming side to compensate for 
  the additionally generated data. This will result in correspondingly less 
  data being read on the incoming side subsequently. As a result of such an 
  operation, the token bucket fill level x on the incoming side may become 
  negative (but it can never fall below -M).

  If y + x <= -M then outgoing data will be held back. This may lead to 
  double-door effects, but only in extreme cases where the outgoing traffic 
  largely exceeds the incoming traffic, so that the outgoing bursts size M is 
  exceeded.

  Aside from short-term bursts of configurable size (as with every token 
  bucket), this procedure guarantees that the configured rate may never be 
  exceeded (on the application layer, that is; as with the current 
  implementation, an attacker may easily cause the onion router to 
  arbitrarily exceed the limits on the lower layers). Over time, we never 
  send more data than the configured rate: every sent byte needs a 
  corresponding token on the incoming side; this token must either have been
  consumed by an incoming byte before (it then became a "credit"), or it is 
  "stolen" from the incoming bucket to compensate for data generated within 
  the onion router.

Specific Design Changes: 

  In the following we briefly point out the specific changes that need to be 
  done in Tor's source code. By doing so one can see how non intrusive our
  modifications are. 
  
  First we need to address the bucket increment and decrement operations. 
  According to the described logic above, this should be done in the methods 
  connection_bucket_refill and connection_buckets_decrement respectively. In
  particular allocating, saving and "stealing" of tokens need to be 
  considered here. 
  
  Second the rate limiting, i.e. the amount we are allowed to write 
  (connection_bucket_write_limit) needs to be adapted in lines of the credit 
  bucket logic. Meaning in order to avoid  the here identified unnecessary 
  queuing of cells, we need to consider the new burst parameter M. Here we 
  also need to take non rate limited connections such as from the localhost 
  into account. The rate limiting on the reading side remains the same.   

  At last we need to find good values/ ratios for the parameter M such that 
  the trade off between avoiding "double door effects" and maintaining 
  strict rate limits work as expected. As future work and after insights 
  about the performance gain of the here described proposal we need to find a
  way to implement this both using bufferevent rate limiting with libevent 
  2.3.x and Tor's rate limiting code. 

Conclusion:

  This proposal can be implemented with moderate effort and requires changes 
  only at the points where currently the token bucket operations are 
  performed.

  We feel that this is not the be-all and end-all solution, because it again 
  introduces a feedback loop between the incoming and the outgoing side. We 
  therefore still hope that we will be able to come to a both simpler and 
  more effective design in the future. However, we believe that what we 
  proposed here is a good compromise between avoiding double-door effects to 
  the furthest possible extent, strictly enforcing an application-layer data 
  rate, and keeping the extent of changes to the code small.

  Feedback is highly appreciated.

References:

  [1] Karsten Loesing. Analysis of Circuit Queues in Tor. August 25, 2009.
  [2] https://trac.torproject.org/projects/tor/wiki/sponsors/SponsorD/June2011
Filename: 183-refillintervals.txt
Title: Refill Intervals
Author: Florian Tschorsch and Björn Scheuermann
Created: 03-Dec-2010
Status: Closed
Implemented-In: 0.2.3.5-alpha

Overview:

  In order to avoid additional queuing and bursty traffic, the refill 
  interval of the token bucket algorithm should be shortened. Thus we 
  propose a configurable parameter that sets the refill interval 
  accordingly. 

Motivation and Background:

  In Tor there exist multiple token buckets on different logical levels. They 
  all work independently. They are used to limit the up- and downstream of an
  onion router. All token buckets are refilled every second with a constant
  amount of tokens that depends on the configured bandwidth limits. The very
  coarse-grained refill interval of one second has detrimental effects. 

  First, consider an onion router with multiple TLS connections over which 
  cells arrive. If there is high activity (i.e., many incoming cells in
  total), then the coarse refill interval will cause unfairness. Assume (just
  for simplicity) that C doesn't share its TLS connection with any other
  circuit. Moreover, assume that C hasn't transmitted any data for some time
  (e.g., due a typical bursty HTTP traffic pattern). Consequently, there are
  no cells from this circuit in the incoming socket buffers. When the buckets
  are refilled, the incoming token bucket will immediately spend all its
  tokens on other incoming connections. Now assume that cells from C arrive
  soon after. For fairness' sake, these cells should be serviced timely --
  circuit C hasn't received any bandwidth for a significant time before.
  However, it will take a very long time (one refill interval) before the
  current implementation will fetch these cells from the incoming TLS
  connection, because the token bucket will remain empty for a long time. Just
  because the cells happened to arrive at the "wrong" point in time, they must
  wait. Such situations may occur even though the configured admissible
  incoming data rate is not exceeded by incoming cells: the long refill
  intervals often lead to an operational state where all the cells that were
  admissible during a given one-second period are queued until the end of this
  second, before the onion router even just starts processing them. This
  results in unnecessary, long queuing delays in the incoming socket buffers.
  These delays are not visible in the Tor circuit queue delay statistics [1]. 

  Finally, the coarse-grained refill intervals result in a very bursty outgoing
  traffic pattern at the onion routers (one large chunk of data once per
  second, instead of smooth transmission progress). This is undesirable, since
  such a traffic pattern can interfere with TCP's control mechanisms and can
  be the source of suboptimal TCP performance on the TLS links between onion
  routers.  

Specific Changes: 

  The token buckets should be refilled more often, with a correspondingly 
  smaller amount of tokens. For instance, the buckets might be refilled every
  10 milliseconds with one-hundredth of the amount of data admissible per 
  second. This will help to overcome the problem of unfairness when reading 
  from the incoming socket buffers. At the same time it smoothes the traffic 
  leaving the onion routers. We are aware that this latter change has 
  apparently been discussed before [2]; we are not sure why this change has
  not been implemented yet.

  In particular we need to change the current implementation in Tor which 
  triggers refilling always after exactly one second. Instead the refill event 
  should fire more frequently. The smaller time intervals between each refill 
  action need to be taken into account for the number of tokens that are added 
  to the bucket. 
  
  With libevent 2.x and bufferevents enabled, smaller refill intervals are 
  already considered but hard coded. This should be changed to a configurable 
  parameter, too.   

Conclusion:

  This proposal can be implemented with moderate effort and requires changes 
  only at the points where the token bucket operations are currently
  performed.
  
  This change will also be a good starting point for further enhancements 
  to improve queuing times in Tor. I.e. it will pave the ground for other means 
  that tackle this problem.  

  Feedback is highly appreciated.

References:

  [1] Karsten Loesing. Analysis of Circuit Queues in Tor. August 25, 2009.
  [2] https://trac.torproject.org/projects/tor/wiki/sponsors/SponsorD/June2011
  
Filename: 184-v3-link-protocol.txt
Title: Miscellaneous changes for a v3 Tor link protocol
Author: Nick Mathewson
Created: 19-Sep-2011
Status: Closed
Target: 0.2.3.x

Overview:

  When proposals 176 and 179 are implemented, Tor will have a new
  link protocol.  I propose two simple improvements for the v3 link
  protocol: a more partitioned set of which types indicate
  variable-length cells, and a better way to handle link padding if
  and when we come up with a decent scheme for it.

Motivation:

  We're getting a new link protocol in 0.2.3.x, thanks (again) to
  TLS fingerprinting concerns.  When we do, it'd be nice to take
  care of some small issues that require a link protocol version
  increment.

  First, our system for introducing new variable-length cell types
  has required a protocol increment for each one.  Unlike
  fixed-length (512 byte) cells, we can't add new variable-length
  cells in the existing link protocols and just let older clients
  ignore them, because unless the recipient knows which cells are
  variable-length, it will treat them as 512-byte cells and discard
  too much of the stream or too little.  In the past, it's been
  useful to be able to introduce new cell types without having to
  increment the link protocol version.

  Second, once we have our new TLS handshake in place, we will want
  a good way to address the remaining fingerprinting opportunities.
  Some of those will likely involve traffic volume.  We can't fix
  that easily with our existing PADDING cell type, since PADDING
  cells are fixed-length, and wouldn't be so easy to use to break up
  our TLS record sizes.

Design: Indicating variable-length cells.

  Beginning with the v3 link protocol, we specify that all cell
  types in the range 128..255 indicate variable-length cells.
  Cell types in the range 0..127 are still used for 512-byte
  cells, except that the VERSIONS cell type (7) also indicates a
  variable-length cell (for backward compatibility).

  As before, all Tor instances must ignore cells with types that
  they don't recognize.

Design: Variable-length padding.

  We add a new variable-length cell type, "VPADDING", to be used for
  padding.  All Tor instances may send a VPADDING cell at any point that
  a VERSIONS cell is not required; a VPADDING cell's body may be any
  length; the body of a VPADDING cell MAY have any content.  Upon
  receiving a VPADDING cell, the recipient should drop it, as with a
  PADDING cell.

  (This does not give a way to send fewer than 5 bytes of padding.
  We could add this in the future, in a new link protocol.)

  Implementations SHOULD fill the content of all padding cells
  randomly.

A note on padding:

  We do not specify any situation in which a node ought to generate
  a VPADDING cell; that's left for future work.  Implementors should
  be aware that many schemes have been proposed for link padding
  that do not in fact work as well as one would expect.  We
  recommend that no mainstream implementation should produce padding
  in an attempt to resist traffic analysis, without real research
  showing that it helps.

Interaction with proposal 176:

  Proposal 176 says that during the v3 handshake, no cells other
  than VERSIONS, AUTHENTICATE, AUTH_CHALLENGE, CERT, and NETINFO are
  allowed, and those are only allowed in their standard order.  If
  this proposal is accepted, then VPADDING cells should also be
  allowed in the handshake at any point after the VERSIONS cell.
  They should be included when computing the "SLOG" and "CLOG"
  handshake-digest fields of the AUTHENTICATE cell.

Notes on future-proofing:

  It may be in the future we need a new cell format that is neither the
  original 512-byte format nor the variable-length format.  If we
  do, we can just increment the link protocol version number again.

  Right now we have 10 cell types; with this proposal and proposal
  176, we will have 14.  It's unlikely that we'll run out any time
  soon, but if we start to approach the number 64 with fixed-length
  cell types or 196 with var-length cell types, we should consider
  tweaking the link protocol to have a variable-length cell type
  encoding.

Filename: 185-dir-without-dirport.txt
Title: Directory caches without DirPort
Author: Nick Mathewson
Created: 20-Sep-2011
Status: Superseded
Superseded-by: 237

Overview:

  Exposing a directory port is no longer necessary for running as a
  directory cache.  This proposal suggests that we eliminate that
  requirement, and describes how.

Motivation:

  Now that we tunnel directory connections by default, it is no
  longer necessary to have a DirPort to be a directory cache.  In
  fact, bridges act as directory caches but do not actually have a
  DirPort exposed.  It would be nice and tidy to expand that
  property to the rest of the network.

Configuration:

  Add a new torrc option, "DirCache".  Its values can be "0", "1",
  and "auto".  If it is 0, we never act as a directory cache, even
  if DirPort is set.  If it is 1, then we act as a directory cache
  according to same rules as those used for nodes that set a
  DirPort.  If it is "auto", then Tor decides whether to act as a
  directory cache based on some future intelligent algorithm. "Auto"
  should be the new default.

Advertising cache status:

  Nodes that are running as a directory cache should set the entry
  "dir-cache 1" in their router descriptors.  If they do not have a
  DirPort set, or do not have a working DirPort, they should give
  their directory port as 0 in their router lines.  (Nodes that have
  a working directory port advertise it as usual, and also include a
  "dir-cache" line.  Nodes that do not serve directory information
  should set their directory port to 0, and not include any
  dir-cache line.  Implementations should accept and ignore
  dir-cache lines with values other than "dir-cache 1".)

Consensus:

  Authorities should assign a "DirCache" flag to all nodes running
  as a directory cache.

  This does not require a new version of the consensus algorithm.
Filename: 186-multiple-orports.txt
Title: Multiple addresses for one OR or bridge
Author: Nick Mathewson
Created: 19-Sep-2011
Supersedes: 118
Status: Closed
Target: 0.2.4.x+

Status:

  This proposal is partially implemented to the extent needed to allow nodes
  to have one IPv4 and one IPv6 address.

Overview:

  This document is a proposal for servers to advertise multiple
  address/port combinations for their ORPort.

  It supersedes proposal 118.

Motivation:

  Sometimes servers want to support multiple ports for incoming
  connections, either in order to support multiple address families
  (ie, to add IPv6 support), to better use multiple interfaces, or
  to support a variety of FascistFirewallPorts settings.  This is
  easy to set up now, but there's no way to advertise it to clients.

Configuring additional addresses and ports:

  In consonance with our changes to the (Socks|Trans|NATD|DNS)Port
  options made in 0.2.3.x for proposal 171, I make a corresponding
  change to allow multiple ORPort options and deprecate
  ORListenAddress.

  The new syntax will be:

      "ORPort" PortDescription Option*

      Option = "NoAdvertise" | "NoListen" | "AllAddrs" | "IPV4Only"
          | "IPV6Only"

      PortDescription = PORTLIST |
                        ADDRESS ":" PORTLIST |
                        Hostname ":" PORTLIST

      (PORTLIST and ADDRESS are defined below.)

  The 'NoAdvertise' option performs the function of the old
  ORListenAddress option.  If it is set, we bind a port, but
  don't put it in our descriptor.

  The 'NoListen' option tells Tor to advertise an address, but not
  bind to it.  The operator needs to use some other mechanism to
  ensure that ports are redirected to ports that _are_ listened on.

  The 'AllAddrs' option tells Tor that if no address is given in the
  PortDescription part, we should bind/advertise every one of our
  publicly visible unicast addresses; and that if a hostname address
  is given in the PortDescription, we should bind/advertise every
  publicly visible unicast address that the hostname resolves to.
  (Q: Should this be on by default?)   The 'IPv4Only' and 'IPv6Only'
  options tell Tor to interpret such situations as applying only to
  IPv4 addresses or to IPv6 addresses.

  As with the client *Port options, only the old format or the new
  format are allowed: either a single numeric ORPort and zero or
  more ORListenAddress options, or a set of one or more
  ORPorts in the new extended format.

  In current operating systems (unless we get into crazy nonportable
  tricks) we need to use one socket for every address:port that Tor
  binds on.  As a sanity check, we can limit the number of such sockets
  we use to, say, something between 8 and 64.  If you want to bind lots
  of address:port combinations, you'll want to do it at the
  firewall/routing level.

  Example: We want to bind on 0.0.0.0:9001

     ORPort 9001

  Example: Our firewall is redirecting ports 80, 443, and 7000
  on all hosts in 18.244.2.0 onto our port 2929.

     ORPort 2929 noadvertise
     ORPort 18.244.2.0:80,443,7000 nolisten

  Example: We have a dynamic DNS provider that maps
  tornode.example.com to our current external IPv4 and IPv6
  addresses.  Our firewall forwards port 443 on those addresses to our
  port 1337.

     ORPort 1337 noadvertise alladdrs
     ORPort tornode.example.com:443 nobind alladdrs

Self-testing:

  Right now, Tor nodes need to check every port that they advertise
  before they declare themselves reachable.  If a Tor has
  a lot of advertised ports, that could be prohibitive.
  Instead, it should try a sample of ports for each address.  It should
  not advertise any given ORPort line until it has tried
  extending to or connecting to a sample of the address/port
  combinations.

  It will now be possible for a Tor node to find that some addresses
  work and others do not.  In this case, the node should only advertise
  ORPort lines that have been checked.  (As a consequence, the node
  should not advertise any address unless at least one ORPort without
  nolisten has been specified.)

  {Until support is added for extend cells to IPv6 addresses, it
  will only be possible to test IPv6 addresses by connecting
  directly.  We might want to just skip self-testing those until we
  have IPv6 extend support.}

New descriptor syntax:

  We add a new line in the router descriptor, "or-address".  This line
  can occur zero, one, or multiple times.  Its format is:

      or-address SP ADDRESS ":" PORTLIST NL

      ADDRESS = IPV6ADDR | IPV4ADDR
      IPV6ADDR = an ipv6 address, surrounded by square brackets.
      IPV4ADDR = an ipv4 address, represented as a dotted quad.
      PORTLIST = PORTSPEC | PORTSPEC "," PORTLIST
      PORTSPEC = PORT
      PORT = a number between 1 and 65535 inclusive.

  [This is the regular format for specifying sets of addresses and
  ports in Tor.]

  A descriptor should not include an or-address line that does
  nothing but duplicate the address:port pair from its "router"
  line.

  A node must not list more than 8 or-address lines.

  A PORTLIST must have no more than 16 PORTSPEC entries, and its entries must
  be disjoint.

  (Q: Any reason to allow more than 2?  Multiple interfaces, I guess.)

New authority behavior:

  The same rationale applies as for self-testing.  An authority
  needs to test the main address:port from the router line, and
  every or-address line.  For or-address lines that contain
  multiple ports, it needs to test all of them if they are few, or a
  sample if they are not.

  An authority shouldn't list a node as Running unless every
  or-address line it advertises looks like it will work.

Consensus directories and microdescriptors:

  We introduce a new line type for microdescriptors and consensuses,
  "a".  Each "a" line has the same format as an or-address line.
  The "a" lines (if any) appear immediately after the "r" line for a
  router in the consensus, and immediately after the "onion-key"
  entry in a microdescriptor.

  Clients that use microdescriptors should consider a node's
  addresses to be the address:port listed in the "r" line of a
  consensus, plus all "a" lines for that node in the consensus, plus
  all "a" lines for that node in its microdescriptor.  Clients
  that use full descriptors should consider a node's addresses to be
  everything listed in its descriptor.

  We will have to define a new voting algorithm version; when using
  this version or later, votes should include a single "a" line for
  every relay that has an IPv6 address, to include the first IPv6
  line in its descriptor.  (If there are no IPv6 or-address lines, then
  they shouldn't include any "a" lines.)  The remaining or-address
  lines will turn into "a" lines in the microdescriptor.

  As with other data in the vote derived from the descriptor, the
  consensus will include whichever set of "a" lines are given by the
  most authorities who voted for the descriptor digest that will be
  used for the router.

Directory authorities with more addresses:

  We need a way for a client to configure a TrustedDirServer as
  having multiple OR addresses, specifically so that we can give at
  least one default authority an IPv6 address for bootstrapping
  purposes.

  (Q: Do any of the current authorities have stable IPv6 addresses?)

  We will want to allow the address in a "dir-source" line in a vote
  to contain an IPv6 address, and/or allow voters to list themselves
  with more addresses in votes/consensuses.  But right now, nothing
  actually uses the addresses listed for voters in dir-source lines
  for anything besides log messages.

Client behavior:

  I propose that initially we shouldn't change client behavior too
  much here.

  (Q: Is there any advantage to having a client choose a random
  address?  If so we can do it later.  If not, why list any more
  than one IPv4 and one IPv6 address?)

  Tor clients not running with bridges, and running with IPv4
  support, should still use the address and ORPort as advertised in
  the "router" or "r" line of the appropriate directory object.

  Tor clients not running with bridges, and running without IPv4
  support, should use the first listed IPv6 address for a node,
  using the lowest-numbered listed port for that address.  They
  should only connect to nodes with an IPv6 address.

  Clients should accept Bridge lines with IPv6 addresses, and
  address:port sets, in addition to the lines they currently accept.

  Clients, for now, should only use the address:port from the router
  line when making EXTEND cells; see below.

Nodes without IPv4 addresses:

  Currently Tor requires every node or bridge to have an IPv4
  address.  We will want to maintain this property for the
  foreseeable future, but we should define how a node without an IPv4
  address would advertise itself.

  Right now, there's no way to do that: if anything but an IPv4
  address appears in a router line of a routerdesc, or the "r" line of
  a consensus, then it won't parse.  If something that looks like an
  IPv4 address appears there, clients will (I believe) try to
  connect to it.

  We can make this work, though: let's allow nodes to list themselves
  with a magic IPv4 address (say, 127.1.1.1) if they have
  or-address entries containing only IPv6 address.  We could give
  these nodes a new flag other than Running to indicate that they're
  up, and not give them the Running flag.  That way, old clients
  would never try to use them, but new clients could know to treat
  the new flag as indicating that the node is running, and know not
  to connect to a node listed with address 127.1.1.1.

Interaction with EXTEND and NETINFO:

  Currently, EXTEND cells only support IPv4 addresses, so we should
  use only those.  There is a proposal draft to support more address
  types.

  A server's NETINFO cells must list all configured addresses for a
  server.

Why not extend DirPort this way too?

  Because clients are all using BEGINDIR these days.

  That is, clients tunnel their directory requests inside OR
  connections, and don't generally connect to DirPorts at all.

Why not have address and port ranges?

  Earlier drafts of this proposal suggested that servers should provide
  ranges of addresses, specified with bitmasks.  That's a neat idea for
  circumvention, but if we did that, you wouldn't want to advertise
  publicly that you have an entire address range.

  Port ranges are out because I don't think they would actually get used
  much, and they add a fair bit of complexity.

Coding impact:

  In addition to the obvious changes, we need to audit everything
  that looks up or compares OR connections and nodes by address:port
  under the assumptions that each node has only a single address or
  ORPort.

TODO:

  * Make it so that authorities can vote on which addresses are working
    somehow.

  * Specify some way to say "I only want to connect to v4/v6 addresses".

  * Come up with a better alternative to running6 for the longterm?

Filename: 187-allow-client-auth.txt
Title: Reserve a cell type to allow client authorization
Author: Nick Mathewson
Created: 16-Oct-2011
Status: Closed
Target: 0.2.3.x

Overview:

  Proposals 176 and 184 introduce a new "v3" handshake, coupled with
  a new version 3 link protocol.  This is a good time to introduce
  other stuff we might need.

  One thing we might want is a scanning resistance feature for
  bridges.  This proposal suggests a change we should make right
  away to enable us to deploy such a feature in future versions of
  Tor.

Motivation:

  If an adversary has a suspected bridge address/port combination,
  the easiest way for them to confirm or disconfirm their suspicion
  is to connect to the address and see whether they can do a Tor
  handshake.  The easiest way to fix this problem seems to be to
  give out bridge addresses along with some secret that clients
  should know, but which an adversary shouldn't be able to learn
  easily.  The client should prove to the bridge that it's
  authorized to know about the bridge, before the bridge acts like a
  bridge.  If the client doesn't show knowledge of the proper
  secret, the bridge should act like an HTTPS server or a bittorrent
  tracker or something.

  This proposal *does not* specify a way for clients to authorize
  themselves at bridges; rather, it specifies changes that we should
  make now in order to allow this kind of authorization in the
  future.

Design:

  Currently, now that proposal 176 is implemented, if a server
  provides a certificate that indicates a v3 handshake, and the
  client understands how to do a V3 handshake, we specify that the
  client's first cell must be a VERSIONS cell.

  Instead, we make the following specification changes:

  We reserve a new variable-length cell type, "AUTHORIZE".

  We specify that any number of PADDING or VPADDING or AUTHORIZE
  cells may be sent by the client before it sends a VERSIONS cell.
  Servers that do not require client authorization MUST ignore such
  cells, except to include them when calculating the HMAC that will
  appear in the CLOG part of a client's AUTHENTICATE cell.

  We still specify that clients SHOULD send VERSIONS as their first
  cell; only in some future version of Tor will an AUTHORIZE cell be sent
  first.

Discussion:

  This change allows future versions of the Tor client to know that
  some bridges need authorization, and to send them authentication
  before sending them anything recognizably Tor-like.

  The authorization cell needs to be received before the server can
  send any Tor cells, so we can't just patch it in after the
  VERSIONS cell exchange: the server's VERSIONS cell is unsendable
  until after the AUTHORIZE has been accepted.

  Note that to avoid scanning attacks, it's not sufficient to wait
  for a single cell, and then either handle it as authorization or
  reject the connection.  Instead, we need to decide what kind of
  server we're impersonating, and respond once the client has
  provided *either* an authorization cell, *or* a recognizably valid
  or invalid command in the impersonated protocol.


Alternative design: Just use pluggable transports

  Pluggable transports can do this too, but in general, we want to
  avoid designing the Tor protocol so that any particular desirable
  feature can only be done with a pluggable transport.  That is, any
  feature that *every* bridge should want, should be doable in Tor
  proper.

  Also, as of 16 Oct 2011, pluggable transports aren't in general
  use.  Past experience IMO suggests that we shouldn't offload
  architectural responsibilities to our chickens until they've
  hatched.

Alternative design: Out-of-TLS authorization

  There are features (like port-knocking) designed to allow a client
  to show that it's authorized to use a bridge before the TLS
  handshake even happens.  These are appropriate for bunches of
  applications, but they're trickier with an adversary who is
  MITMing the client.

Alternative design: Just use padding.

  Arguably, we could only add the "VPADDING" cell type to the list
  of those allowed before VERSIONS cells, and say that any client
  authorization we specify later on will be sent as a VPADDING
  cell.  But that design is kludgy: padding should be padding, not
  semantically significant.  Besides, cell types are still fairly
  plentiful.

Counterargument: specify it later

  We could, later on, say that if a client learns that a bridge
  needs authorization, it should send an AUTHORIZE cell.  So long as
  a client never sends an AUTHORIZE to anything other than a bridge that
  needs authorization, it'll never violate the spec.

  But all things considered, it seems easier (just a few lines of
  spec and code) to let bridges eat unexpected authorization now
  than it does to have stuff fail later when clients think that a
  bridge needs authorization but it doesn't.

Counterargument: it's too late!

  We've already got the prop176 branch merged and running on a few
  servers.  But as of this writing, it isn't in any Tor version.

  Even if it *is* out in an alpha before we can get this proposal
  accepted and implemented, that's not a big disaster.  In the worst
  case, where future clients don't know whom to send authorization
  to so they need to send it to _all_ v3 servers, they will at worst
  break their connections only to a couple of alpha versions which
  one hopes by then will be long-deprecated already.

Filename: 188-bridge-guards.txt
Title: Bridge Guards and other anti-enumeration defenses
Author: Nick Mathewson, Isis Lovecruft
Created: 14 Oct 2011
Modified: 10 Sep 2015
Status: Reserve

   [NOTE: This proposal is marked as "reserve" because the enumeration
   technique it addresses does not currently seem to be in use. See
   ticket tor#7144 for more information. (2020 July 31)]


1. Overview

   Bridges are useful against censors only so long as the adversary
   cannot easily enumerate their addresses. I propose a design to make
   it harder for an adversary who controls or observes only a few
   nodes to enumerate a large number of bridges.

   Briefly: bridges should choose guard nodes, and use the Tor
   protocol's "Loose source routing" feature to re-route all extend
   requests from clients through an additional layer of guard nodes
   chosen by the bridge.  This way, only a bridge's guard nodes can
   tell that it is a bridge, and the attacker needs to run many more
   nodes in order to enumerate a large number of bridges.

   I also discuss other ways to avoid enumeration, recommending some.

   These ideas are due to a discussion at the 2011 Tor Developers'
   Meeting in Waterloo, Ontario.  Practically none of the ideas here
   are mine; I'm just writing up what I remember.

2. History and Motivation

   Under the current bridge design, an attacker who runs a node can
   identify bridges by seeing which "clients" make a large number of
   connections to it, or which "clients" make connections to it in the
   same way clients do.  This has been a known attack since early
   versions {XXXX check} of the design document; let's try to fix it.

2.1. Related idea: Guard nodes

   The idea of guard nodes isn't new: since 0.1.1, Tor has used guard
   nodes (first designed as "Helper" nodes by Wright et al in {XXXX})
   to make it harder for an adversary who controls a smaller number of
   nodes to eavesdrop on clients.  The rationale was: an adversary who
   controls or observes only one entry and one exit will have a low
   probability of correlating any single circuit, but over time, if
   clients choose a random entry and exit for each circuit, such an
   adversary will eventually see some circuits from each client with a
   probability of 1, thereby building a statistical profile of the
   client's activities.  Therefore, let each client choose its entry
   node only from among a small number of client-selected "guard"
   nodes: the client is still correlated with the same probability as
   before, but now the client has a nonzero chance of remaining
   unprofiled.

2.2. Related idea: Loose source routing

   Since the earliest versions of Onion Routing, the protocol has
   provided "loose source routing".  In strict source routing, the
   source of a message chooses every hop on the message's path.  But
   in loose source routing, the message traverses the selected nodes,
   but may also traverse other nodes as well.  In other words, the
   client selects nodes N_a, N_b, and N_c, but the message may in fact
   traverse any sequence of nodes N_1...N_j, so long as N_1=N_a,
   N_x=N_b, and N_y=N_c, for 1 < x < y.

   Tor has retained this feature, but has not yet made use of it.

3. Design

   Every bridge currently chooses a set of guard nodes for its
   circuits.  Bridges should also re-route client circuits through
   these circuits.

   Specifically, when a bridge receives a request from a client to
   extend a circuit, it should first create a circuit to its guard,
   and then relay that extend cell through the guard.  The bridge
   should add an additional layer of encryption to outgoing cells on
   that circuit corresponding to the encryption that the guard will
   remove, and remove a layer of encryption on incoming cells on that
   circuit corresponding to the encryption that the guard will add.

3.1. Loose-Source Routed Circuit Construction

   Alice, an OP, is using a bridge, Bob, and she has chosen the
   following path through the network:

       Alice -> Bob -> Charlie -> Deidra

   However, Bob has decided to take advantage of the loose-source
   routing circuit characteristic (for example, in order to use a bridge
   guard), and Bob has chosen N additional loose-source routed hop(s),
   through which he will transparently relays cells.

   NOTE: For the purposes of bridge guards, N is always 1.  However, for
   completion's sake, the following details of the circuit construction
   are generalized to include N > 1.  Additionally, the following steps
   should hold for a hop at any position in Alice's circuit that has
   decided to take advantage of the loose-source routing feature, not
   only for bridge ORs.

   From Alice's perspective, her circuit path matches the one diagrammed
   above.  However, the overall path of the circuit is:

       Alice -> Bob -> Guillaume -> Charlie -> Deidra

   From Bob's perspective, the circuit's path is:

       Alice -> Bob -> Guillaume -> Charlie -> UNKNOWN

   Interestingly, because Bob's behaviour towards Guillaume and choices
   of cell types is that of a normal OP, Guillaume's perspective of the
   circuit's path is:

       Bob -> Guillaume -> Charlie -> UNKNOWN

   That is, to Guillaume, Bob appears (for the most part) to be a
   normally connecting client.  (See §4.1 for more detailed analysis.)

3.1.1. Detailed Steps of Loose-Source Routed Circuit Construction

   1. Connection from OP

      Alice has connected to Bob, and she has sent to Bob either a
      CREATE/CREATE_FAST or CREATE2 cell.

   2. Loose-Source Path Selection

      In anticipation of Alice's first RELAY_EARLY cell (which will
      contain an EXTEND cell to Alice's next hop), Bob begins
      constructing a loose-source routed circuit.  To do so, Bob chooses
      N additional hop(s):

      2.a. For the first additional hop, H_1, Bob chooses a suitable
           entry guard node, Guillaume, using the same algorithm as OPs.
           See "§5 Guard nodes" of path-spec.txt for additional
           information on the selection algorithm.

      2.b. Each additional hop, [H_2, ..., H_N], is chosen at random
           from a list of suitable, non-excluded ORs.

   3. Loose-Source Routed Circuit Extension and Cell Types

      Bob now follows the same procedure as OPs use to complete the key
      exchanges with his chosen additional hop(s).

      While undergoing these following substeps, Bob SHOULD continue to
      proceed with Step 4, below, in parallel, as an optimization for
      speeding up circuit construction.

      3.a. Create Cells

           Bob sends the appropriate type of create cell to Guillaume.
           For ORs new enough to support the NTor handshake (nearly all
           of them at this point), Bob sends a CREATE2 cell.  Otherwise,
           for ORs which only support the older TAP handshake, Bob sends
           either a CREATE or CREATE_FAST cell, using the same
           decision-making logic as OPs.

           See §4.1 for more information the distinguishability of
           bridges based upon whether they use CREATE versus
           CREATE_FAST.  Also note that the CREATE2 cell has since
           become ubiquitous after this proposal was originally drafted.
           Thus, because we prefer ORs which use loose-source routing to
           behave (as much as possible) like OPs, we now prefer to use
           CREATE2.

      3.b. Created Cells

           Later, when Bob receives a corresponding CREATED/CREATED_FAST
           or CREATED2 cell from Guillaume, Bob extracts key material
           for the shared forward and reverse keys, KG_f and KG_b,
           respectively.

      3.c. Extend Cells

           When N > 1, for each additional hop, H_i, in [H_2, ..., H_N],
           Bob chooses the appropriate type of extend cell for H_i, and
           sends this extend cell to H_i-1, who transforms it into a
           create cell in order to perform the extension.  To choose
           which type of extend cell to send, Bob uses the same
           algorithm as an OP to determine whether to use EXTEND or
           EXTEND2.  Similar to the CREATE* cells above, for most modern
           ORs, this will very likely mean an EXTEND2 cell.

      3.d. Extended Cells

           When a corresponding EXTENDED/EXTENDED2 cell is received for
           an additional hop, H_i, Bob extracts the shared forward and
           reverse keys, Ki_f and Ki_b, respectively.

   4. Responding to the OP

      Now that the additional hops in Bob's loose-source routed circuit
      are chosen, and construction of the loose-source routed circuit
      has begun, Bob answers Alice's original CREATE/CREATE_FAST or
      CREATE2 cell (from Step 1) by sending the corresponding created
      cell type.

      Alice has now built a circuit through Bob, and the two share the
      negotiated forward and reverse keys, KB_n and KB_p, respectively.

      Note that Bob SHOULD do this step in tandem with the loose-source
      routed circuit construction procedure outlined in Step 3, above.

   5. OP Circuit Extension

      Alice then wants to extend the circuit to node Charlie.  She makes
      a hybrid-encrypted onionskin, encrypted to Charlie's public key,
      containing her chosen g^x value.  She puts this in an extend cell:
      "Extend (Charlie's address) (Charlie's OR Port) (Onionskin)
      (Charlie's ID)".  She encrypts this with KB_n and sends it as a
      RELAY_EARLY cell to Bob.

      Bob's behaviour is now dependent on whether the loose-source
      routed circuit construction steps (as outlined in Step 3, above)
      have already completed.

      5.a. The Loose-Source Routed Circuit Construction is Incomplete

           If Bob has not yet finished the loose-source routed circuit
           construction, then Bob MUST store the first outgoing
           (i.e. exitward) RELAY_EARLY cell received from Alice until
           the loose-source routed circuit construction has been
           completed.

           If any incoming (i.e. toward the OP) RELAY* cell is received
           while the loose-source routed circuit is not fully
           constructed, Bob MUST drop the cell.

           If Bob has already stored Alice's first RELAY_EARLY cell, and
           Alice sends any additional RELAY* cell, then Bob SHOULD mark
           the entire circuit for close with END_CIRC_REASON_TORPROTOCOL.

      5.b. The Loose-Source Routed Circuit Construction is Completed

           Later, when the loose-source routed circuit is fully
           constructed, Bob MUST send any stored cells from Alice
           outward by following the procedure described in Step 6.a.

   6. Relay Cells

      When receiving a RELAY* cell in either direction, Bob MAY keep
      statistics on the number of relay cells encountered, as well as
      the number of relay cells relayed.

      6.a. Outgoing Relay Cells

           Bob decrypts the RELAY* cell with KB_n.  If the cell becomes
           recognized, Bob should now follow the relay command checks
           described in Step 6.c.

           Bob MUST encrypt the relay cell's underlying payload to each
           additional hop in the loose-source routed circuit, in
           reverse: for each additional hop, H_i, in [H_N, ..., H_1],
           Bob encrypts the relay cell payload to Ki_f, the shared
           forward key for the hop H_i.

           Bob MUST update the forward digest, DG_f, of the relay cell,
           regardless of whether or not the cell is recognized.  See
           6.c. for additional information on recognized cells.

           Bob now sends the cell outwards through the additional hops.
           At each hop, H_i, the hop removes a layer of the onionskin by
           decrypting the cell with Ki_f, and then hop H_i forwards the
           cell to the next addition additional hop H_i+1.  When the
           final additional hop, H_N, received the cell, the OP's cell
           command and payload should be processed by H_N in the normal
           manner for an OR.

      6.b. Incoming Relay Cells

           Bob MUST decrypt the relay cell's underlying payload from
           each additional hop in the loose-source routed circuit (in
           forward order, this time): For each additional hop, H_i, in
           [H_1, ..., H_N], Bob decrypts the relay cell payload with
           Ki_b, the shared backward key for the hop H_i.

           If the cell has becomes recognized after all decryptions, Bob
           should now follow the relay command checks described in Step
           6.c.

           Bob MUST update the backward digest, DG_b, of the relay cell,
           regardless of whether or not the cell is recognized.  See
           6.c. for additional information on recognized cells.

           Bob encrypts the cell towards the OP with KB_p, and sends the
           cell inwards.

      6.c. Recognized Cells

           If a relay cell, either incoming or outgoing, becomes
           recognized (i.e. Bob sees that the cell was intended for him)
           after decryption, and there is no stream attached to the
           circuit, then Bob SHOULD mark the circuit for close if the
           relay command contained within the cell is any of the
           following types:

               - RELAY_BEGIN
               - RELAY_CONNECTED
               - RELAY_END
               - RELAY_RESOLVE
               - RELAY_RESOLVED
               - RELAY_BEGIN_DIR

           Apart from the above checks, Bob SHOULD essentially treat
           every cell as "unrecognized" by following the en-/de-cryption
           procedures in Steps 6.a. and 6.b. regardless of whether the
           cell is actually recognized or not.  That is, since this is a
           loose-source routed circuit, Bob SHOULD relay cells not
           intended for him *and* cells intended for him through the
           leaky pipe, no matter what the cell's underlying payload and
           command are.

3.1.2. Example Loose-Source Circuit Construction

   For example, given the following circuit path chosen by Alice:

       Alice -> Bob -> Charlie -> Deidra

   when Alice wishes to extend to node Charlie, and Bob the bridge is
   using only one additional loose-source routed hop, Guillaume, as his
   bridge guard, the following steps are taken:

       - Alice packages the extend into a RELAY_EARLY cell and encrypts
         the RELAY_EARLY cell with KB_f to Bob.

       - Bob receives the RELAY_EARLY cell from Alice, and he follows
         the procedure (outlined in §3.1.1. Step 6.a.) by:

           * Decrypting the cell with KB_f,
           * Encrypting the cell to the forward key, KG_f, which Bob
             shares with his guard node, Guillaume,
           * Updating the cell forward digest, DG_f, and
           * Sending the cell as a RELAY_EARLY cell to Guillaume.

       - When Guillaume receives the cell from Bob, he processes it by:

           * Decrypting the cell with KG_f.  Guillaume now sees that it
             is a RELAY_EARLY cell containing an extend cell "intended"
             for him, containing: "Extend (Charlie's address) (Charlie's
             OR Port) (Onionskin) (Charlie's ID)".
           * Performing the circuit extension to the specified node,
             Charlie, by acting accordingly: creating a connection to
             Charlie if he doesn't have one, ensuring that the ID is as
             expected, and then sending the onionskin in a create cell
             on that connection.  Note that Guillaume is behaving
             exactly as a regular node would upon receiving an Extend
             cell.
           * Now the handshake finishes.  Charlie receives the onionskin
             and sends Guillaume "CREATED g^y,KH".
           * Making an extended cell for Bob which contains
             "E(KG_b, EXTENDED g^y KH)", and
           * Sending the extended cell to Bob.  Note that Charlie and
             Guillaume are both still behaving in a manner identical to
             regular ORs.

       - Bob receives the extended cell from Guillaume, and he follows
         the procedure (outlined in §3.1.1. Step 6.b.) by:

           * Decrypting the cell with KG_b,
           * Encrypting the cell to Alice with KB_b,
           * Updating the cell backward digest, DG_b, and
           * Sending the cell to Alice.

        - Alice receives the cell, and she decrypts it with KB_b, just
          as she would have if Bob had extended to Charlie directly.
          She then processes the extended cell contained within to
          extract shared keys with Charlie.  Note that Alice's behaviour
          is identical to regular OPs.

3.2. Additional Notes on the Construction

   Note that this design does not require that our stream cipher
   operations be commutative, even though they are.

   Note also that this design requires no change in behavior from any
   node other than Bob, and as we can see in the above example in §3.1.2
   for Alice's circuit extension, Alice, Guillaume, and Charlie behave
   identical to a normal OP and normal ORs.

   Finally, observe that even though the circuit N hops longer than it
   would be otherwise, no relay's count of permissible RELAY_EARLY cells
   falls lower than it otherwise would.  This is because the extra hop
   that Bob adds is done with RELAY_EARLY cells, then he continues to
   relay Alice's cells as RELAY_EARLY, until the appropriate maximum
   number of RELAY_EARLY cells is reached.  Afterwards, further
   RELAY_EARLY cells from Alice are repackaged by Bob as normal RELAY
   cells.

4. Alternative designs

4.1. Client-enforced bridge guards

   What if Tor didn't have loose source routing?  We could have
   bridges tell clients what guards to use by advertising those guard
   in their descriptors, and then refusing to extend circuits to any
   other nodes.  This change would require all clients to upgrade in
   order to be able to use the newer bridges, and would quite possibly
   cause a fair amount of pain along the way.

   Fortunately, we don't need to go down this path.  So let's not!

4.2. Separate bridge-guards and client-guards

   In the design above, I specify that bridges should use the same
   guard nodes for extending client circuits as they use for their own
   circuits.  It's not immediately clear whether this is a good idea
   or not.  Having separate sets would seem to make the two kinds of
   circuits more easily distinguishable (even though we already assume
   they are distinguishable).  Having different sets of guards would
   also seem like a way to keep the nodes who guard our own traffic
   from learning that we're a bridge... but another set of nodes will
   learn that anyway, so it's not clear what we'd gain.

   One good reason to keep separate guard lists is to prevent the
   *client* of the bridge from being able to enumerate the guards that
   the bridge uses to protect its own traffic (by extending a circuit
   through the bridge to a node it controls, and finding out where the
   extend request arrives from).

5. Additional bridge enumeration methods and protections

   In addition to the design above, there are more ways to try to
   prevent enumeration.

   Right now, there are multiple ways for the node after a bridge to
   distinguish a circuit extended through the bridge from one
   originating at the bridge.  (This lets the node after the bridge
   tell that a bridge is talking to it.)

5.1. Make it harder to tell clients from bridges

   When using the older TAP circuit handshake protocol, one of the
   giveaways is that the first hop in a circuit is created with
   CREATE_FAST cells, but all subsequent hops are created with CREATE
   cells.

   However, because nearly everything in the network now uses the newer
   NTor circuit handshake protocol, clients send CREATE2 cells to all
   hops, regardless of position.  Therefore, in the above design, it's
   no longer quite so simple to distinguish an OP connecting through
   bridge from an actual OP, since all of the circuits that extend
   through a bridge now reach its guards through CREATE2 cells (whether
   the bridge originated them or not), and only as a fallback (e.g. if
   an additional node in the loose-source routed path does not support
   NTor) will the bridge ever use CREATE/CREATE_FAST.  (Additionally,
   when using the fallback mathod, the behaviour for choosing either
   CREATE or CREATE_FAST is identical to normal OP behaviour.)

   The CREATE/CREATE_FAST distinction is not the only way for a
   bridge's guard to tell bridges from orginary clients, however.
   Most importantly, a busy bridge will open far more circuits than a
   client would.  More subtly, the timing on response from the client
   will be higher and more highly variable that it would be with an
   ordinary client.  I don't think we can make bridges behave wholly
   indistinguishably from clients: that's why we should go with guard
   nodes for bridges.

   [XXX For further research: we should study the methods by which a
   bridge guard can determine that they are acting as a guard for a
   bridge, rather than for a normal OP, and which methods are likely to
   be more accurate or efficient than others. -IL]

5.2. Bridge Reachability Testing

   Currently, a bridge's reachability is tested both by the bridge
   itself (called "self-testing") and by the BridgeAuthority.

5.2.1. Bridge Reachability Self-Testing

   Before a bridge uploads its descriptors to the BridgeAuthority, it
   creates a special type of testing circuit which ends at itself:

       Bob -> Guillaume -> Charlie -> Bob

   Thus, going to all this trouble to later use loose-source routing in
   order to relay Alice's traffic through Guillaume (rather than
   connecting directly to Charlie, as Alice intended) is diminished by
   the fact that Charlie can still passively enumerate bridges by
   waiting to be asked to connect to a node which is not contained
   within the consensus.

   We could get around this option by disabling self-testing for bridges
   entirely, by automatically setting "AssumeReachable 1" for all bridge
   relays… although I am not sure if this is wise.

   Our best idea thus far, for bridge reachability self-testing, is to create
   a circuit like so:

       Bridge → Guard → Middle → OtherMiddle → Guard → Bridge

   While, clearly, that circuit is just a little bit insane, it must be that
   way because we cannot simply do:

       Bridge → Guard → Middle → Guard → Bridge

   because the Middle would refuse to extend back to the previous node
   (all ORs follow this rule).  Similarly, it would be inane to do:

       Bridge → Guard → Middle → OtherMiddle → Bridge

   because, obviously, that merely shifts the problem to OtherMiddle and
   accomplishes nothing.  [XXX Is there something smarter we could do? —IL]

5.2.2. Bridge Reachability Testing by the BridgeAuthority

   After receiving Bob's descriptors, the BridgeAuthority attempts to
   connect to Bob's ORPort by making a direct TLS connection to the
   bridge's advertised ORPort.

   Should we change this behaviour?  One the one hand, at least this
   does not enable any random OR in the entire network to enumerate
   bridges.  On the other hand, any adversary who can observe packets
   from the BridgeAuthority is capable of enumeration.

6. Other considerations

   What fraction of our traffic is bridge traffic?  Will this alter
   our circuit selection weights?
Filename: 189-authorize-cell.txt
Title: AUTHORIZE and AUTHORIZED cells
Author: George Kadianakis
Created: 04 Nov 2011
Status: Obsolete

1. Overview

   Proposal 187 introduced the concept of the AUTHORIZE cell, a cell
   whose purpose is to make Tor bridges resistant to scanning attacks.

   This is achieved by having the bridge and the client share a secret
   out-of-band and then use AUTHORIZE cells to validate that the
   client indeed knows that secret before proceeding with the Tor
   protocol.

   This proposal specifies the format of the AUTHORIZE cell and also
   introduces the AUTHORIZED cell, a way for bridges to announce to
   clients that the authorization process is complete and successful.

2. Motivation

   AUTHORIZE cells should be able to perform a variety of
   authorization protocols based on a variety of shared secrets. This
   forces the AUTHORIZE cell to have a dynamic format based on the
   authorization method used.

   AUTHORIZED cells are used by bridges to signal the end of a
   successful bridge client authorization and the beginning of the
   actual link handshake. AUTHORIZED cells have no other use and for
   this reason their format is very simple.

   Both AUTHORIZE and AUTHORIZED cells are to be used under censorship
   conditions and they should look innocuous to any adversary capable
   of monitoring network traffic.

   As an attack example, an adversary could passively monitor the
   traffic of a bridge host, looking at the packets directly after the
   TLS handshake and trying to deduce from their packet size if they
   are AUTHORIZE and AUTHORIZED cells. For this reason, AUTHORIZE and
   AUTHORIZED cells are padded with a random amount of padding before
   sending.

3. Design

3.1. AUTHORIZE cell

   The AUTHORIZE cell is a variable-sized cell.

   The generic AUTHORIZE cell format is:

         AuthMethod                       [1 octet]
         MethodFields                     [...]
         PadLen                           [2 octets]
         Padding                          ['PadLen' octets]

   where:

   'AuthMethod', is the authorization method to be used.

   'MethodFields', is dependent on the authorization Method used. It's
                   a meta-field hosting an arbitrary amount of fields.

   'PadLen', specifies the amount of padding in octets.
   Implementations SHOULD pick 'PadLen' to be a random integer from 1
   to 3141 inclusive.

   'Padding', is 'PadLen' octets of random content.

3.2. AUTHORIZED cell format

   The AUTHORIZED cell is a variable-sized cell.

   The AUTHORIZED cell format is:

         'AuthMethod'                       [1 octet]
         'PadLen'                           [2 octets]
         'Padding'                          ['PadLen' octets]

   where all fields have the same meaning as in section 3.1.

3.3. Cell parsing

   Implementations MUST ignore the contents of 'Padding'.

   Implementations MUST reject an AUTHORIZE or AUTHORIZED cell where
   the 'Padding' field is not 'PadLen' octets long.

   Implementations MUST reject an AUTHORIZE cell with an 'AuthMethod'
   they don't recognize.

4. Discussion

4.1. What's up with the [1,3141] padding bytes range?

   The upper limit is larger than the Ethernet MTU so that AUTHORIZE
   and AUTHORIZED cells are not always transmitted into a single
   packet. Other than that, it's indeed pretty much arbitrary.

4.2. Why not let the pluggable transports do the padding, like they
     are supposed to do for the rest of the Tor protocol?

   The arguments of section "Alternative design: Just use pluggable
   transports" of proposal 187, apply here as well:

   All bridges who use client authorization will also need padded
   AUTHORIZE and AUTHORIZED cells.

4.3. How should multiple round-trip authorization protocols be handled?

   Protocols that require multiple round trips between the client and
   the bridge should use AUTHORIZE cells for communication.

   The format of the AUTHORIZE cell is flexible enough to support
   messages from the client to the bridge and the reverse.

   At the end of a successful multiple-round-trip protocol, an
   AUTHORIZED cell must be issued from the bridge to the client.

4.4. AUTHORIZED seems useless. Why not use VPADDING instead?

   As noted in proposal 187, the Tor protocol uses VPADDING cells for
   padding; any other use of VPADDING makes the Tor protocol kludgy.

   In the future, and in the example case of a v3 handshake, a client
   can optimistically send a VERSIONS cell along with the final
   AUTHORIZE cell of an authorization protocol. That allows the
   bridge, in the case of successful authorization, to also process
   the VERSIONS cell and begin the v3 handshake promptly.

4.5. What should actually happen when a bridge rejects an AUTHORIZE
     cell?

   When a bridge detects a badly formed or malicious AUTHORIZE cell,
   it should assume that the other side is an adversary scanning for
   bridges. The bridge should then act accordingly to avoid detection.

   This proposal does not try to specify how a bridge can avoid
   detection by an adversary.

Filename: 190-shared-secret-bridge-authorization.txt
Title: Bridge Client Authorization Based on a Shared Secret
Author: George Kadianakis
Created: 04 Nov 2011
Status: Obsolete

Notes: This is obsoleted by pluggable transports.

1. Overview

   Proposals 187 and 189 introduced AUTHORIZE and AUTHORIZED cells.
   Their purpose is to make bridge relays scanning-resistant against
   censoring adversaries capable of probing hosts to observe whether
   they speak the Tor protocol.

   This proposal specifies a bridge client authorization scheme based
   on a shared secret between the bridge user and bridge operator.

2. Motivation

   A bridge client authorization scheme should only allow clients who
   show knowledge of a shared secret to talk Tor to the bridge.

3. Shared-secret-based authorization

3.1. Where do shared secrets come from?

   A shared secret is a piece of data known only to the bridge
   operator and the bridge client.

   It's meant to be automatically generated by the bridge
   implementation to avoid issues with insecure and weak passwords.

   Bridge implementations SHOULD create shared secrets by generating
   random data using a strong RNG or PRNG.

3.2. AUTHORIZE cell format

   In shared-secret-based authorization, the MethodFields field of the
   AUTHORIZE cell becomes:

       'shared_secret'               [10 octets]

   where:

   'shared_secret', is the shared secret between the bridge operator
                    and the bridge client.

3.3. Cell parsing

   Bridge implementations MUST reject any AUTHORIZE cells whose
   'shared_secret' field does not match the shared secret negotiated
   between the bridge operator and authorized bridge clients.

4. Tor implementation

4.1. Bridge side

   Tor bridge implementations MUST create the bridge shared secret by
   generating 10 octets of random data using a strong RNG or PRNG.

   Tor bridge implementations MUST store the shared secret in
   'DataDirectory/keys/bridge_auth_ss_key' in hexadecimal encoding.

   Tor bridge implementations MUST support the boolean
   'BridgeRequireClientSharedSecretAuthorization' configuration file
   option which enables bridge client authorization based on a shared
   secret.

   If 'BridgeRequireClientSharedSecretAuthorization' is set, bridge
   implementations MUST generate a new shared secret, if
   'DataDirectory/keys/bridge_auth_ss_key' does not already exist.

4.2. Client side

   Tor client implementations must extend their Bridge line format to
   support bridge shared secrets. The new format is:
     Bridge [<method>] <address[:port]> [["keyid="]<id-fingerprint>] ["shared_secret="<shared_secret>]

   where <shared_secret> is the bridge shared secret in hexadecimal
   encoding.

   Tor clients who use bridges with shared-secret-based client
   authorization must specify the bridge's shared secret as in:
     Bridge 12.34.56.78 shared_secret=934caff420aa7852b855

5. Discussion

5.1. What should actually happen when a bridge rejects an AUTHORIZE
     cell?

   When a bridge detects a badly formed or malicious AUTHORIZE cell,
   it should assume that the other side is an adversary scanning for
   bridges. The bridge should then act accordingly to avoid detection.

   This proposal does not try to specify how a bridge can avoid
   detection by an adversary.

6. Acknowledgements

   Thanks to Nick Mathewson and Robert Ransom for the help and
   suggestions while writing this proposal.

Filename: 191-mitm-bridge-detection-resistance.txt
Title: Bridge Detection Resistance against MITM-capable Adversaries
Author: George Kadianakis
Created: 07 Nov 2011
Status: Obsolete

1. Overview

   Proposals 187, 189 and 190 make the first steps toward scanning
   resistant bridges. They attempt to block attacks from censoring
   adversaries who provoke bridges into speaking the Tor protocol.

   An attack vector that hasn't been explored in those previous
   proposals is that of an adversary capable of performing Man In The
   Middle attacks to Tor clients. At the moment, Tor clients using the
   v3 link protocol have no way to detect such an MITM attack, and
   will gladly send a VERSIONS or AUTHORIZE cell to the MITMed
   connection, thereby revealing the Tor protocol and thus the bridge.

   This proposal introduces a way for clients to detect an MITMed SSL
   connection, allowing them to protect against the above attack.

2. Motivation

   When the v3 link handshake protocol is performed, Tor's SSL
   handshake is performed with the server sending a self-signed
   certificate and the client blindly accepting it. This allows the
   adversary to perform an MITM attack.

   A Tor client must detect the MITM attack before he initiates the
   Tor protocol by sending a VERSIONS or AUTHORIZE cell. A good
   moment to detect such an MITM attack is during the SSL handshake.

   To achieve that, bridge operators provide their bridge users with a
   hash digest of the public-key certificate their bridge is using for
   SSL. Bridge clients store that hash digest locally and associate it
   with that specific bridge. Bridge clients who have "pinned" a
   bridge to a certificate "fingerprint" can thereafter validate that
   their SSL connection peer is the intended bridge.

   Of course, the hash digest must be provided to users out-of-band
   and before the actual SSL handshake. Usually, the bridge operator
   gives the hash digest to her bridge users along with the rest of
   the bridge credentials, like the bridge's address and port.

3. Security implications

   Bridge clients who have pinned a bridge to a certificate
   fingerprint will be able to detect an MITMing adversary in time.
   If after detection they act as an innocuous Internet
   client, they can successfully remove suspicion from the SSL
   connection and subvert bridge detection.

   Pinning a certificate fingerprint and detecting an MITMing attacker
   does not automatically alleviate suspicions from the bridge or the
   client. Clients must have a behavior to follow after detecting the
   MITM attack so that they look like innocent Netizens. This proposal
   does not try to specify such a behavior.

   Implementation and use of this scheme does not render bridges and
   clients immune to scanning or DPI attacks. This scheme should be
   used along with bridge client authorization schemes like the ones
   detailed in proposal 190.

4. Tor Implementation

4.1. Certificate fingerprint creation

   The certificate fingerprints used on this scheme MUST be computed
   by applying the SHA256 cryptographic hash function upon the ASN.1
   DER encoding of a public-key certificate, then truncating the hash
   output to 12 bytes, encoding it to RFC4648 Base32 and omitting any
   trailing padding '='.

4.2. Bridge side implementation

   Tor bridge implementations SHOULD provide a command line option
   that exports a fully equipped Bridge line containing the bridge
   address and port, the link certificate fingerprint, and any other
   enabled Bridge options, so that bridge operators can easily send it
   to their users.

   In the case of expiring SSL certificates, Tor bridge
   implementations SHOULD warn the bridge operator a sensible amount
   of time before the expiration, so that she can warn her clients and
   potentially rotate the certificate herself.

4.3. Client side implementation

   Tor client implementations MUST extend their Bridge line format to
   support bridge SSL certificate fingerprints. The new format is:
     Bridge <method> <address:port> [["keyid="]<id-fingerprint>] \
       ["shared_secret="<shared_secret>] ["link_cert_fpr="<fingerprint>]

   where <fingerprint> is the bridge's SSL certificate fingerprint.

   Tor clients who use bridges and want to pin their SSL certificates
   must specify the bridge's SSL certificate fingerprint as in:
     Bridge 12.34.56.78 shared_secret=934caff420aa7852b855 \
         link_cert_fpr=GM4GEMBXGEZGKOJQMJSWINZSHFSGMOBRMYZGCMQ

4.4. Implementation prerequisites

   Tor bridges currently rotate their SSL certificates every 2
   hours. This not only acts as a fingerprint for the bridges, but it
   also acts as a blocker for this proposal.

   Tor trac ticket #4390 and proposal YYY were created to resolve this
   issue.

5. Other ideas

5.1. Certificate tagging using a shared secret

   Another idea worth considering is having the bridge use the shared
   secret from proposal 190 to embed a "secret message" on her
   certificate, which could only be understood by a client who knows
   that shared secret, essentially authenticating the bridge.

   Specifically, the bridge would "tag" the Serial Number (or any
   other covert field) of her certificate with the (potentially
   truncated) HMAC of her link public key, using the shared secret of
   proposal 190 as the key: HMAC(shared_secret, link_public_key).

   A client knowing the shared secret would be able to verify the
   'link_public_key' and authenticate the bridge, and since the Serial
   Number field is usually composed of random bytes a probing attacker
   would not notice the "tagging" of the certificate.

   Arguments for this scheme are that it:
   a) doesn't need extra bridge credentials apart from the shared secret
      of prop190.
   b) doesn't need any maintenance in case of certificate expiration.

   Arguments against this scheme are:
   a) In the case of self-signed certificates, OpenSSL creates an
      8-bytes random Serial number, and we would probably need
      something more than 8-bytes to tag. There are not many other
      covert fields in SSL certificates mutable by vanilla OpenSSL.
   b) It complicates the scheme, and if not implemented and researched
      wisely it might also make it fingerprintable.
   c) We most probably won't be able to tag CA-signed certificates.

6. Discussion

6.1. In section 4.1, why do you truncate the SHA256 output to 12 bytes?!

   Bridge credentials are frequently propagated by word of mouth or
   are physically written down, which renders the occult Base64
   encoding unsatisfactory. The 104 characters Base32 encoding or the
   64 characters hex representation of the SHA256 output would also be
   too much bloat.

   By truncating the SHA256 output to 12 bytes and encoding it with
   Base32, we get 39 characters of readable and easy to transcribe
   output, and sufficient security. Finally, dividing '39' by the
   golden ratio gives us about 24.10!

7. Acknowledgements

   Thanks to Robert Ransom for his great help and suggestions on
   devising this scheme and writing this proposal!

Filename: 192-store-bridge-information.txt
Title: Automatically retrieve and store information about bridges
Author: Sebastian Hahn
Created: 16-Nov-2011
Status: Obsolete
Target: 0.2.[45].x

Overview:
Currently, tor already stores some information about the bridges it is
configured to use locally, but doesn't make great use of the stored
data. This data is the Tor configuration information about the bridge
(IP address, port, and optionally fingerprint) and the bridge descriptor
which gets stored along with the other descriptors a Tor client fetches,
as well as an "EntryGuard" line in the state file. That line includes
the Tor version we used to add the bridge, and a slightly randomized
timestamp (up to a month in the past of the real date). The descriptor
data also includes some more accurate timestamps about when the
descriptor was fetched.

The information we give out about bridges via bridgedb currently only
includes the IP address and port, because giving out the fingerprint as
well might mean that Tor clients make direct connections to the bridge
authority, since we didn't design Tor's UpdateBridgesFromAuthority
behaviour correctly.

Motivation:

The only way to let Tor know about a change affecting the bridge (IP
address or port change) is to either ask the bridge authority directly,
or reconfigure Tor. The former requires making a non-anonymized direct
connection to the bridge authority Tonga and asking it for the current
descriptor of the bridge with a given fingerprint - this is unsafe and
also requires prior knowledge of the fingerprint. The latter requires
user intervention, first to learn that there was an update and second to
actually teach Tor about the change.

This is way too complicated for most users, and should be unnecessary
while the user has at least one bridge that remains working: Tonga can
give out bridge descriptors when asked for the descriptor for a certain
fingerprint, and Tor clients learn the fingerprint either from their
torrc file or from the first connection they make to a bridge.

For some users, however, this option is not what they want: They might
use private bridges or have special security concerns, which would make
them want to connect to the IP addresses specified in their
configuration only, and not tell Tonga about the set of bridges they
know about, even through a Tor circuit. Also see
https://blog.torproject.org/blog/different-ways-use-bridge for more
information about the different types of bridge users.

Design:

Tor should provide a new configuration option that allows bridge users
to indicate that they wish to contact Tonga anonymously and learn about
updates for the bridges that they know about, but can't currently reach.
Once those updates have been received, the clients would then hold on to
the new information in their state file, and use it across restarts for
connection attempts.

The option UpdateBridgesFromAuthority should be removed or recycled for
this purpose, as it is currently dangerous to set (it makes direct
connections to the bridge authority, thus leaking that a user is about
to use bridges). Recycling the option is probably the better choice,
because current users of the option get a surprising and never useful
behaviour. On the other hand, users who downgrade their Tors might get
the old behaviour by accident.

If configured with this option, tor would make an anonymized connection
to Tonga to ask for the descriptors of bridges that it cannot currently
connect to, once every few hours. Making more frequent requests would
likely not help, as bridge information doesn't typically change that
frequently, and may overload Tonga.

This information needs to be stored in the state file:

- An exact copy of the Bridge stanza in the torrc file, so that tor can
  detect when the bridge is unconfigured/the configuration is changed

- The IP address, port, and fingerprint we last used when making a
  successful connection to the bridge, if this differs from/supplements
  the configured data.

- The IP address, port, and fingerprint we learned from the bridge
  authority, if this differs from both the configured data and the data
  we used for the last successful connection.

We don't store more data in the state file to avoid leaking too much if
the state file falls into the hands of an adversary.

Security implications:

Storing sensitive data on disk is risky when the computer one uses gets
into the wrong hands, and state file entries can be used to identify
times the user was online. This is already a problem for the Bridge
lines in a user's configuration file, but by storing more information
about bridges some timings can be deduced.

Another risk is that this allows long-term tracking of users when the
set of bridges a user knows about is known to the attacker, and the set
is unique.  This is not very hard to achieve for bridgedb, as users
typically make requests to it non-anomymized and bridgedb can
selectively pick bridges to report. By combining the data about
descriptor fetches on Tonga and this fingerprint, a usage pattern can be
established. Also, bridgedb could give out a made-up fingerprint to a
user that requested bridges, thus easily creating a unique set.

Users of private bridges should not set this option, as it will leak the
fingerprints of their bridges to Tonga. This is not a huge concern, as
Tonga doesn't know about those descriptors, but private bridge users
will likely want to avoid leaking the existence of their bridge. We
might want to figure out a way to indicate that a bridge is private on
the Bridge line in the configuration, so fetching the descriptor from
Tonga is disabled for those automatically. This warrants more discussion
to find a solution that doesn't require bridge users to understand the
trade-offs of setting a configuration option.

One idea is to indicate that a bridge is private by a special flag in
its bridge descriptor, so clients can avoid leaking those to the bridge
authority automatically. Also, Bridge lines for private bridges
shouldn't include the fingerprint so that users don't accidentally leak
the fingerprint to the bridge authority before they have talked to the
bridge.

Specification:

No change/addition to the current specification is necessary, as the
data that gets stored at clients is not covered by the specification.
This document is supposed to serve as a basis for discussion and to
provide hints for implementors.

Compatibility:

Tonga is already set up to send out descriptors requested by clients, so
the bridge authority side doesn't need any changes. The new
configuration options governing the behaviour of Tor would be
incompatible with previous versions, so the torrc needs to be adapted.
The state file changes should not affect older versions.
Filename: 193-safe-cookie-authentication.txt
Title: Safe cookie authentication for Tor controllers
Author: Robert Ransom
Created: 2012-02-04
Status: Closed

Overview:

  Not long ago, all Tor controllers which automatically attempted
  'cookie authentication' were vulnerable to an information-disclosure
  attack.  (See https://bugs.torproject.org/4303 for slightly more
  information.)

  Now, some Tor controllers which automatically attempt cookie
  authentication are only vulnerable to an information-disclosure
  attack on any 32-byte files they can read.  But the Ed25519
  signature scheme (among other cryptosystems) has 32-byte secret
  keys, and we would like to not worry about Tor controllers leaking
  our secret keys to whatever can listen on what the controller thinks
  is Tor's control port.

  Additionally, we would like to not have to remodel Tor's innards and
  rewrite all of our Tor controllers to use TLS on Tor's control port
  this week (or deal with the many design issues which that would
  raise).

Design:

From af6bf472d59162428a1d7f1d77e6e77bda827414 Mon Sep 17 00:00:00 2001
From: Robert Ransom <rransom.8774@gmail.com>
Date: Sun, 5 Feb 2012 04:02:23 -0800
Subject: [PATCH] Add SAFECOOKIE control-port authentication method

---
 control-spec.txt |   59 ++++++++++++++++++++++++++++++++++++++++++++++-------
 1 files changed, 51 insertions(+), 8 deletions(-)

diff --git a/control-spec.txt b/control-spec.txt
index 66088f7..3651c86 100644
--- a/control-spec.txt
+++ b/control-spec.txt
@@ -323,11 +323,12 @@
   For information on how the implementation securely stores authentication
   information on disk, see section 5.1.
 
-  Before the client has authenticated, no command other than PROTOCOLINFO,
-  AUTHENTICATE, or QUIT is valid.  If the controller sends any other command,
-  or sends a malformed command, or sends an unsuccessful AUTHENTICATE
-  command, or sends PROTOCOLINFO more than once, Tor sends an error reply and
-  closes the connection.
+  Before the client has authenticated, no command other than
+  PROTOCOLINFO, AUTHCHALLENGE, AUTHENTICATE, or QUIT is valid.  If the
+  controller sends any other command, or sends a malformed command, or
+  sends an unsuccessful AUTHENTICATE command, or sends PROTOCOLINFO or
+  AUTHCHALLENGE more than once, Tor sends an error reply and closes
+  the connection.
 
   To prevent some cross-protocol attacks, the AUTHENTICATE command is still
   required even if all authentication methods in Tor are disabled.  In this
@@ -949,6 +950,7 @@
       "NULL"           / ; No authentication is required
       "HASHEDPASSWORD" / ; A controller must supply the original password
       "COOKIE"         / ; A controller must supply the contents of a cookie
+      "SAFECOOKIE"       ; A controller must prove knowledge of a cookie
 
      AuthCookieFile = QuotedString
      TorVersion = QuotedString
@@ -970,9 +972,9 @@
   methods that Tor currently accepts.
 
   AuthCookieFile specifies the absolute path and filename of the
-  authentication cookie that Tor is expecting and is provided iff
-  the METHODS field contains the method "COOKIE".  Controllers MUST handle
-  escape sequences inside this string.
+  authentication cookie that Tor is expecting and is provided iff the
+  METHODS field contains the method "COOKIE" and/or "SAFECOOKIE".
+  Controllers MUST handle escape sequences inside this string.
 
   The VERSION line contains the Tor version.
 
@@ -1033,6 +1035,47 @@
 
   [TAKEOWNERSHIP was added in Tor 0.2.2.28-beta.]
 
+3.24. AUTHCHALLENGE
+
+  The syntax is:
+    "AUTHCHALLENGE" SP "AUTHMETHOD=SAFECOOKIE"
+                    SP "COOKIEFILE=" AuthCookieFile
+                    SP "CLIENTCHALLENGE=" 2*HEXDIG / QuotedString
+                    CRLF
+
+  The server will reject this command with error code 512, then close
+  the connection, if Tor is not using the file specified in the
+  AuthCookieFile argument as a controller authentication cookie file.
+
+  If the server accepts the command, the server reply format is:
+    "250-AUTHCHALLENGE"
+            SP "CLIENTRESPONSE=" 64*64HEXDIG
+            SP "SERVERCHALLENGE=" 2*HEXDIG
+            CRLF
+
+  The CLIENTCHALLENGE, CLIENTRESPONSE, and SERVERCHALLENGE values are
+  encoded/decoded in the same way as the argument passed to the
+  AUTHENTICATE command.
+
+  The CLIENTRESPONSE value is computed as:
+    HMAC-SHA256(HMAC-SHA256("Tor server-to-controller cookie authenticator",
+                            CookieString)
+                ClientChallengeString)
+  (with the HMAC key as its first argument)
+
+  After a controller sends a successful AUTHCHALLENGE command, the
+  next command sent on the connection must be an AUTHENTICATE command,
+  and the only authentication string which that AUTHENTICATE command
+  will accept is:
+    HMAC-SHA256(HMAC-SHA256("Tor controller-to-server cookie authenticator",
+                            CookieString)
+                ServerChallengeString)
+
+  [Unlike other commands besides AUTHENTICATE, AUTHCHALLENGE may be
+  used (but only once!) before AUTHENTICATE.]
+
+  [AUTHCHALLENGE was added in Tor FIXME.]
+
 4. Replies
 
   Reply codes follow the same 3-character format as used by SMTP, with the
-- 
1.7.8.3

Rationale:

  The weird inner HMAC was meant to ensure that whatever impersonates
  Tor's control port cannot even abuse a secret key meant to be used
  with HMAC-SHA256.

  Then I added the server-to-controller challenge-response
  authentication step, to ensure that the server can only use a
  controller as an HMAC oracle if it already knows the contents of the
  cookie file.  Now, the inner HMAC is just a not-very-efficient way
  to keep controllers from using the server as an oracle for its own
  challenges (it could be replaced with a hash function).

Filename: 194-mnemonic-urls.txt
Title: Mnemonic .onion URLs
Author: Sai, Alex Fink
Created: 29-Feb-2012
Status: Superseded

1. Overview

  Currently, canonical Tor .onion URLs consist of a naked 80-bit hash[1]. This
  is not something that users can even recognize for validity, let alone produce
  directly. It is vulnerable to partial-match fuzzing attacks[2], where a
  would-be MITM attacker generates a very similar hash and uses various social
  engineering, wiki poisoning, or other methods to trick the user into visiting
  the spoof site.

  This proposal gives an alternative method for displaying and entering .onion
  and other URLs, such that they will be easily remembered and generated by end
  users, and easily published by hidden service websites, without any dependency
  on a full domain name type system like e.g. namecoin[3]. This makes it easier
  to implement (requiring only a change in the proxy).

  This proposal could equally be used for IPv4, IPv6, etc, if normal DNS is for
  some reason untrusted.

  This is not a petname system[4], in that it does not allow service providers
  or users[5] to associate a name of their choosing to an address[6]. Rather, it
  is a mnemonic system that encodes the 80 bit .onion address into a
  meaningful[7] and memorable sentence. A full petname system (based on
  registration of some kind, and allowing for shorter, service-chosen URLs) can
  be implemented in parallel[8].

  This system has the three properties of being secure, distributed, and
  human-meaningful — it just doesn't also have choice of name (except of course
  by brute force creation of multiple keys to see if one has an encoding the
  operator likes).

  This is inspired by Jonathan Ackerman's "Four Little Words" proposal[9] for
  doing the same thing with IPv4 addresses. We just need to handle 80+ bits, not
  just 32 bits.

  It is similar to Markus Jakobsson & Ruj Akavipat's FastWord system[10], except
  that it does not permit user choice of passphrase, does not know what URL a
  user will enter (vs verifying against a single stored password), and again has
  to encode significantly more data.

  This is also similar to RFC1751[11], RFC2289[12], and multiple other
  fingerprint encoding systems[13] (e.g.  PGPfone[14] using the PGP
  wordlist[15], and Arturo Filatsò's OnionURL[16]), but we aim to make something
  that's as easy as possible for users to remember — and significantly easier
  than just a list of words or pseudowords, which we consider only useful as an
  active confirmation tool, not as something that can be fully memorized and
  recalled, like a normal domain name.

2. Requirements

2.1. encodes at least 80 bits of random data (preferably more, eg for a
checksum)

2.2. valid, visualizable English sentence — not just a series of words[17]

2.3. words are common enough that non-native speakers and bad spellers will have
minimum difficulty remembering and producing (perhaps with some spellcheck help)

2.4. not syntactically confusable (e.g. order should not matter)

2.5. short enough to be easily memorized and fully recalled at will, not just
recognized

2.6. no dependency on an external service

2.7. dictionary size small enough to be reasonable for end users to download as
part of the onion package

2.8. consistent across users (so that websites can e.g. reinforce their random
hash's phrase with a clever drawing)

2.9. not create offensive sentences that service providers will reject

2.10. resistant against semantic fuzzing (e.g. by having uniqueness against
WordNet synsets[18])

3. Possible implementations

  This section is intentionally left unfinished; full listing of template
  sentences and the details of their parser and generating implementation is
  co-dependent on the creation of word class dictionaries fulfilling the above
  criteria. Since that's fairly labor-intensive, we're pausing at this stage for
  input first, to avoid wasting work.

3.1. Have a fixed number of template sentences, such as:

  1. Adj subj adv vtrans adj obj
  2. Subj and subj vtrans adj obj
  3. … etc

  For a 6 word sentence, with 8 (3b) templates, we need ~12b (4k word)
  dictionaries for each word category.

  If multiple words of the same category are used, they must either play
  different grammatical roles (eg subj vs obj, or adj on a different item), be
  chosen from different dictionaries, or there needs to be an order-agnostic way
  to join them at the bit level. Preferably this should be avoided, just to
  prevent users forgetting the order.

3.2. As 3.1, but treat sentence generation as decoding a prefix code, and have
  a Huffman code for each word class.

  We suppose it’s okay if the generated sentence has a few more words than it
  might, as long as they’re common lean words.  E.g., for adjectives, "good"
  might cost only six bits while "unfortunate" costs twelve.

  Choice between different sentence syntaxes could be worked into the prefix
  code as well, and potentially done separately for each syntactic constituent.

4. Usage

  To form mnemonic .onion URL, just join the words with dashes or underscores,
  stripping minimal words like 'a', 'the', 'and' etc., and append '.onion'. This
  can be readily distinguished from standard hash-style .onion URLs by form.

  Translation should take place at the client — though hidden service servers
  should also be able to output the mnemonic form of hashes too, to assist
  website operators in publishing them (e.g. by posting an amusing drawing of
  the described situation on their website to reinforce the mnemonic).

  After the translation stage of name resolution, everything proceeds as normal
  for an 80-bit hash onion URL.

  The user should be notified of the mnemonic form of hash URL in some way, and
  have an easy way in the client UI to translate mnemonics to hashes and vice
  versa. For the purposes of browser URLs and the like though, the mnemonic
  should be treated on par with the hash; if the user enters a mnemonic URL they
  should not become redirected to the hash version. (If anything, the opposite
  may be true, so that users become used to seeing and verifying the mnemonic
  version of hash URLs, and gain the security benefits against partial-match
  fuzzing.)

  Ideally, inputs that don't validly resolve should have a response page served
  by the proxy that uses a simple spell-check system to suggest alternate domain
  names that are valid hash encodings. This could hypothetically be done inline
  in URL input, but would require changes on the browser (normally domain names
  aren't subject so spellcheck), and this avoids that implementation problem.

5. International support

  It is not possible for this scheme to support non-English languages without
  a) (usually) Unicode in domains (which is not yet well supported by browsers),
  and
  b) fully customized dictionaries and phrase patterns per language

  The scheme must not be used in an attempted 'translation' by simply replacing
  English words with glosses in the target language. Several of the necessary
  features would be completely mangled by this (e.g. other languages have
  different synonym, homonym, etc groupings, not to mention completely different
  grammar).

  It is unlikely a priori that URLs constructed using a non-English
  dictionary/pattern setup would in any sense 'translate' semantically to
  English; more likely is that each language would have completely unrelated
  encodings for a given hash.

  We intend to only make an English version at first, to avoid these issues
  during testing.

________________

[1] https://trac.torproject.org/projects/tor/wiki/doc/HiddenServiceNames
https://gitweb.torproject.org/torspec.git/blob/HEAD:/address-spec.txt
[2] http://www.thc.org/papers/ffp.html
[3] http://dot-bit.org/Namecoin
[4] https://en.wikipedia.org/wiki/Zooko's_triangle
[5] https://addons.mozilla.org/en-US/firefox/addon/petname-tool/
[6] However, service operators can generate a large number of hidden service
descriptors and check whether their hashes result in a desirable phrasal
encoding (much like certain hidden services currently use brute force generated
hashes to ensure their name is the prefix of their raw hash). This won't get you
whatever phrase you want, but will at least improve the likelihood that it's
something amusing and acceptable.
[7] "Meaningful" here inasmuch as e.g. "Barnaby thoughtfully mangles simplistic
yellow camels" is an absurdist but meaningful sentence. Absurdness is a feature,
not a bug; it decreases the probability of mistakes if the scenario described is
not one that the user would try to fit into a template of things they have
previously encountered IRL. See research into linguistic schema for further
details.
[8] https://gitweb.torproject.org/torspec.git/blob/HEAD:/proposals/ideas/xxx-oni
on-nyms.txt
[9] http://blog.rabidgremlin.com/2010/11/28/4-little-words/
[10] http://fastword.me/
[11] https://tools.ietf.org/html/rfc1751
[12] http://tools.ietf.org/html/rfc2289
[13] https://github.com/singpolyma/mnemonicode
http://mysteryrobot.com
https://github.com/zacharyvoase/humanhash
[14] http://www.mathcs.duq.edu/~juola/papers.d/icslp96.pdf
[15] http://en.wikipedia.org/wiki/PGP_word_list
[16] https://github.com/hellais/Onion-url
https://github.com/hellais/Onion-url/blob/master/dev/mnemonic.py
[17] http://www.reddit.com/r/technology/comments/ecllk
[18] http://wordnet.princeton.edu/wordnet/man2.1/wnstats.7WN.html

Filename: 195-TLS-normalization-for-024.txt
Title: TLS certificate normalization for Tor 0.2.4.x
Author: Jacob Appelbaum, Gladys Shufflebottom, Nick Mathewson, Tim Wilde
Created: 6-Mar-2012
Status: Dead
Target: 0.2.4.x


0. Introduction

   The TLS (Transport Layer Security) protocol was designed for security
   and extensibility, not for uniformity.  Because of this, it's not
   hard for an attacker to tell one application's use of TLS from
   another's.

   We proposes improvements to Tor's current TLS certificates to
   reduce the distinguishability of Tor traffic.

0.1. History

   This draft is based on parts of Proposal 179, by Jacob Appelbaum
   and Gladys Shufflebottom, but removes some already implemented parts
   and replaces others.

0.2. Non-Goals

   We do not address making TLS harder to distinguish after the
   handshake is done.  We also do not discuss TLS improvements not
   related to distinguishability (such as increased key size, algorithm
   choice, and so on).

1. Certificate Issues

   Currently, Tor generates certificates according to a fixed pattern,
   where lifetime is fairly small, the certificate Subject DN is a
   single randomly generated CN, and the certificate Issuer DN is a
   different single randomly generated CN.

   We propose several ways to improve this below.

1.1. Separate initial certificate from link certificate

   When Tor is using the v2 or v3 link handshake (see tor-spec.txt), it
   currently presents an initial handshake authenticating the link key
   with the identity key.

   We propose instead that Tor should be able to present an arbitrary
   initial certificate (so long as its key matches the link key used in
   the actual TLS handshake), and then present the real certificate
   authenticating the link key during the Tor handshake.  (That is,
   during the v2 handshake's renegotiation step, or in the v3
   handshake's CERTS cell.)

   The TLS protocol and the Tor handshake protocol both allow this, and
   doing so will give us more freedom for the alternative certificate
   presentation ideas below.

1.2. Allow externally generated certificates

   It should be possible for a Tor relay operator to generate and
   provide their own certificate and secret key.  This will allow a relay or
   bridge operator to use a certificate signed by any member of the "SSL
   mafia,"[*] to generate their own self-signed certificate, and so on.

   For compatibility, we need to require that the key be an RSA secret
   key, of at least 1024 bits, generated with e=65537.

   As a proposed interface, let's require that the certificate be stored
   in ${DataDir}/tls_cert/tls_certificate.crt , that the secret key be
   stored in ${DataDir}/tls_cert/private_tls_key.key , and that they be
   used instead of generating our own certificate whenever the new
   boolean option "ProvidedTLSCert" is set to true.

   (Alternative interface: Allow the cert and key cert to be stored
   wherever, and have the user provide their respective locations with
   TLSCertificateFile and TLSCertificateKeyFile options.)

1.3. Longer certificate lifetimes

   Tor's current certificates aren't long-lived, which makes them
   different from most other certificates in the wild.

   Typically, certificates are valid for a year, so let's use that as
   our default lifetime.  [TODO: investigate whether "a year" for most
   CAs and self-signed certs have their validity dates running for a
   calendar year ending at the second of issue, one calendar year
   ending at midnight, or 86400*(365.5 +/- .5) seconds, or what.]

   There are two ways to approach this.  We could continue our current
   certificate management approach where we frequently generate new
   certificates (albeit with longer lifetimes), or we could make a cert,
   store it to disk, and use it for all or most of its declared
   lifetime.

   If we continue to use fairly short lifetimes for the _true_ link
   certificates (the ones presented during the Tor handshake), then
   presenting long-lived certificates doesn't hurt us much: in the event
   of a link-key-only compromise, the adversary still couldn't actually
   impersonate a server for long.[**]

   Using shorter-lived certificates with long nominal lifetimes doesn't
   seem to buy us much.  It would let us rotate link keys more
   frequently, but we're already getting forward secrecy from our use of
   diffie-hellman key agreement.  Further, it would make our behavior
   look less like regular TLS behavior, where certificates are typically
   used for most of their nominal lifetime.  Therefore, let's store and
   use certs and link keys for the full year.

1.4. Self-signed certificates with better DNs

   When we generate our own certificates, we currently set no DN fields
   other than the commonName.  This behavior isn't terribly common:
   users of self-signed certs usually/often set other fields too.
   [TODO: find out frequency.]

   Unfortunately, it appears that no particular other set of fields or
   way of filling them out _is_ universal for self-signed certificates,
   or even particularly common.  The most common schema seem to be for
   things most censors wouldn't mind blocking, like embedded devices.
   Even the default openssl schema, though common, doesn't appear to
   represent a terribly large fraction of self-signed websites.  [TODO:
   get numbers here.]

   So the best we can do here is probably to reproduce the process that
   results in self-signed certificates originally: let the bridge and relay
   operators to pick the DN fields themselves.  This is an annoying
   interface issue, and wants a better solution.

1.5. Better commonName values

   Our current certificates set the commonName to a randomly generated
   field like www.rmf4h4h.net.  This is also a weird behavior: nearly
   all TLS certs used for web purposes will have a hostname that
   resolves to their IP.

   The simplest way to get a plausible commonName here would be to do a
   reverse lookup on our IP and try to find a good hostname.  It's not
   clear whether this would actually work out in practice, or whether
   we'd just get dynamic-IP-pool hostnames everywhere blocked when they
   appear in certificates.

   Alternatively, if we are told a hostname in our Torrc (possibly in
   the Address field), we could try to use that.

2. TLS handshake issues

2.1. Session ID.

   Currently we do not send an SSL session ID, as we do not support session
   resumption.  However, Apache (and likely other major SSL servers) do have
   this support, and do send a 32 byte SSLv3/TLSv1 session ID in their Server
   Hello cleartext.  We should do the same to avoid an easy fingerprinting
   opportunity.  It may be necessary to lie to OpenSSL to claim that we are
   tracking session IDs to cause it to generate them for us.

   (We should not actually support session resumption.)




[*] "Hey buddy, it's a nice website you've got there.  Sure would be a
    shame if somebody started poppin' up warnings on all your user's
    browsers, tellin' everbody that you're _insecure_..."

[**] Furthermore, a link-key-only compromise isn't very realistic atm;
     nearly any attack that would let an adversary learn a link key would
     probably let the adversary learn the identity key too.  The most
     plausible way would probably be an implementation bug in OpenSSL or
     something.

Filename: 196-transport-control-ports.txt
Title: Extended ORPort and TransportControlPort
Author: George Kadianakis, Nick Mathewson
Created: 14 Mar 2012
Status: Closed
Implemented-In: 0.2.5.2-alpha

1. Overview

  Proposal 180 defined Tor pluggable transports, a way to decouple
  protocol-level obfuscation from the core Tor protocol in order to
  better resist client-bridge censorship. This is achieved by
  introducing pluggable transport proxies, programs that obfuscate Tor
  traffic to resist DPI detection.

  Proposal 180 defined a way for pluggable transport proxies to
  communicate with local Tor clients and bridges, so as to exchange
  traffic. This document extends this communication protocol, so that
  pluggable transport proxies can exchange arbitrary operational
  information and metadata with Tor clients and bridges.

2. Motivation

  The communication protocol specified in Proposal 180 gives a way
  for transport proxies to announce the IP address of their clients
  to tor. Still, modern pluggable transports might have more (?)
  needs than this. For example:

  1. Tor might want to inform pluggable transport proxies on how to
     rate-limit incoming or outgoing connections.

  2. Server pluggable transport proxies might want to pass client
     information to an anti-active-probing system controlled by tor.

  3. Tor might want to temporarily stop a transport proxy from
     obfuscating traffic.

  To satisfy the above use cases, there must be real-time
  communication between the tor process and the pluggable transport
  proxy. To achieve this, this proposal refactors the Extended ORPort
  protocol specified in Proposal 180, and introduces a new port,
  TransportControlPort, whose sole role is the exchange of control
  information between transport proxies and tor.

  Specifically, transports proxies deliver each connection to the
  "Extended ORPort", where they provide metadata and agree on an
  identifier for each tunneled connection.  Once this handshake
  occurs, the OR protocol proceeds unchanged.

  Additionally, each transport maintains a single connection to Tor's
  "TransportControlPort", where it receives instructions from Tor
  about rate-limiting on individual connections.

3. The new port protocols

3.1. The new extended ORPort protocol

3.1.1. Protocol

  The extended server port protocol is as follows:

     COMMAND [2 bytes, big-endian]
     BODYLEN [2 bytes, big-endian]
     BODY [BODYLEN bytes]

     Commands sent from the transport proxy to the bridge are:

     [0x0000] DONE: There is no more information to give. The next
       bytes sent by the transport will be those tunneled over it.
       (body ignored)

     [0x0001] USERADDR: an address:port string that represents the
       client's address.

     [0x0002] TRANSPORT: a string of the name of the pluggable
       transport currently in effect on the connection.

     Replies sent from tor to the proxy are:

     [0x1000] OKAY: Send the user's traffic. (body ignored)

     [0x1001] DENY: Tor would prefer not to get more traffic from
       this address for a while. (body ignored)

     [0x1002] CONTROL: a NUL-terminated "identifier" string. The
       pluggable transport proxy must use the "identifier" to access
       the TransportControlPort. See the 'Association and identifier
       creation' section below.

  Parties MUST ignore command codes that they do not understand.

  If the server receives a recognized command that does not parse, it
  MUST close the connection to the client.

3.1.2. Command descriptions

3.1.2.1. USERADDR

  An ASCII string holding the TCP/IP address of the client of the
  pluggable transport proxy. A Tor bridge SHOULD use that address to
  collect statistics about its clients.  Recognized formats are:
    1.2.3.4:5678
    [1:2::3:4]:5678

  (Current Tor versions may accept other formats, but this is a bug:
  transports MUST NOT send them.)

  The string MUST not be NUL-terminated.

3.1.2.2. TRANSPORT

  An ASCII string holding the name of the pluggable transport used by
  the client of the pluggable transport proxy. A Tor bridge that
  supports multiple transports SHOULD use that information to collect
  statistics about the popularity of individual pluggable transports.

  The string MUST not be NUL-terminated.

  Pluggable transport names are C-identifiers and Tor MUST check them
  for correctness.

3.2. The new TransportControlPort protocol

  The TransportControlPort protocol is as follows:

     CONNECTIONID[16 bytes, big-endian]
     COMMAND [2 bytes, big-endian]
     BODYLEN [2 bytes, big-endian]
     BODY [BODYLEN bytes]

     Commands sent from the transport proxy to the bridge:

     [0x0001] RATE_LIMITED: Message confirming that the rate limiting
       request of the bridge was carried out successfully (body
       ignored). See the 'Rate Limiting' section below.

     [0x0002] NOT_RATE_LIMITED: Message notifying that the transport
       proxy failed to carry out the rate limiting request of the
       bridge (body ignored). See the 'Rate Limiting' section below.

     Configuration commands sent from the bridge to the transport
     proxy are:

     [0x1001] NOT_ALLOWED: Message notifying that the CONNECTIONID
       could not be matched with an authorized connection ID. The
       bridge SHOULD shutdown the connection.

     [0x1001] RATE_LIMIT: Carries information on how the pluggable
       transport proxy should rate-limit its traffic. See the 'Rate
       Limiting' section below.

  CONNECTIONID should carry the connection identifier described in the
  'Association and identifier creation' section.

  Parties should ignore command codes that they do not understand.

3.3. Association and identifier creation

  For Tor and a transport proxy to communicate using the
  TransportControlPort, an identifier must have already been negotiated
  using the 'CONTROL' command of Extended ORPort.

  The TransportControlPort identifier should not be predictable by a
  user who hasn't received a 'CONTROL' command from the Extended
  ORPort. For this reason, the TransportControlPort identifier should
  not be cryptographically-weak or deterministically created.

  Tor MUST create its identifiers by generating 16 bytes of random
  data.

4. Configuration commands

4.1. Rate Limiting

  A Tor relay should be able to inform a transport proxy in real-time
  about its rate-limiting needs.

  This can be achieved by using the TransportControlPort and sending a
  'RATE_LIMIT' command to the transport proxy.

  The body of the 'RATE_LIMIT' command should contain two integers,
  4 bytes each, in big-endian format. The two numbers should represent
  the bandwidth rate and bandwidth burst respectively in 'bytes per
  second' which the transport proxy must set as its overall
  rate-limiting setting.

  When the transport proxy sets the appropriate rate limiting, it
  should send back a 'RATE_LIMITED' command. If it fails while setting
  up rate limiting, it should send back a 'NOT_RATE_LIMITED' command.

  After sending a 'RATE_LIMIT' command. the tor bridge MAY want to
  stop pushing data to the transport proxy, till it receives a
  'RATE_LIMITED' command. If, instead, it receives a 'NOT_RATE_LIMITED'
  command it MAY want to shutdown its connections to the transport
  proxy.

5. Authentication

  To defend against cross-protocol attacks on the Extended ORPort,
  proposal 213 defines an authentication scheme that should be used to
  protect it.

  If the Extended ORPort is enabled, Tor should regenerate the cookie
  file of proposal 213 on startup and store it in
  $DataDirectory/extended_orport_auth_cookie.

  The location of the cookie can be overriden by using the
  configuration file parameter ExtORPortCookieAuthFile, which is
  defined as:

    ExtORPortCookieAuthFile <path>

  where <path> is a filesystem path.

  XXX should we also add an ExtORPortCookieFileGroupReadable torrc option?

6. Security Considerations

  Extended ORPort or TransportControlPort do _not_ provide link
  confidentiality, authentication or integrity. Sensitive data, like
  cryptographic material, should not be transferred through them.

  An attacker with superuser access, is able to sniff network traffic,
  and capture TransportControlPort identifiers and any data passed
  through those ports.

  Tor SHOULD issue a warning if the bridge operator tries to bind
  Extended ORPort or TransportControlPort to a non-localhost address.

  Pluggable transport proxies SHOULD issue a warning if they are
  instructed to connect to a non-localhost Extended ORPort or
  TransportControlPort.

7. Future

  In the future, we might have pluggable transports which require the
  _client_ transport proxy to use the TransportControlPort and exchange
  control information with the Tor client. The current proposal doesn't
  yet support this, but we should not add functionality that will
  prevent it from being possible.
Filename: 197-postmessage-ipc.txt
Title: Message-based Inter-Controller IPC Channel
Author: Mike Perry
Created: 16-03-2012
Status: REJECTED


Overview

  This proposal seeks to create a means for inter-controller
  communication using the Tor Control Port.

Motivation

  With the advent of pluggable transports, bridge discovery mechanisms,
  and tighter browser-Vidalia integration, we're going to have an
  increasing number of collaborating Tor controller programs
  communicating with each other. Rather than define new pairwise IPC
  mechanisms for each case, we will instead create a generalized
  message-passing mechanism through the Tor Control Port.

Control Protocol Specification Changes

  CONTROLLERNAME command

    Sent from the client to the server. The syntax is:

      "CONTROLLERNAME" SP ControllerID
        ControllerID = 1*(ALNUM / "_")

    Server returns "250 OK" and records the ControllerID to use for
    this control port connection for messaging information if successful,
    or "553 Controller name already in use" if the name is in use by
    another controller, or if an attempt is made to register the special
    names "all" or "unset".

    [CONTROLLERNAME need not be issued to send POSTMESSAGE commands,
     and CONTROLLERNAME may be unsupported by initial POSTMESSAGE
     implementations in Tor.]

  POSTMESSAGE command

    Sent from the client to the server. The syntax is:

      "POSTMESSAGE" SP "@" DestControllerID SP LineItem CRLF
         DestControllerID = "all" / 1*(ALNUM / "_")

    If DestControllerID is "all", the message will be posted to all
    controllers that have "SETEVENTS POSTMESSAGE" set. Otherwise, the
    message should be posted to the controller with the appropriate
    ControllerID.

    Server returns "250 OK" if successful, or "552 Invalid destination
    controller name" if the name is not registered.

    [Initial implementations may require DestControllerID always be
     "all"]

  POSTMESSAGE event

      "650" SP "POSTMESSAGE" SP MessageID SP SourceControllerID SP
                        "@" DestControllerID SP LineItem CRLF
         MessageID = 1*DIGIT
         SourceControllerID = "unset" / 1*(ALNUM / "_")
         DestControllerID = "all" / 1*(ALNUM / "_")

      MessageID is an incrementing integer identifier that uniquely
      identifies this message to all controllers.

      The SourceControllerID is the value from the sending
      controller's CONTROLLERNAME command, or "unset" if the
      CONTROLLERNAME command was not used or unimplemented.

  GETINFO commands
    "recent-messages" -- Retrieves messages
      sent to ControllerIDs that match the current controller
      in POSTMESSAGE event format. This list should be generated
      on the fly, to handle disconnecting controllers.

    "new-messages" -- Retrieves the last 10 "unread" messages
      sent to this controller, in POSTMESSAGE event format. If
      SETEVENTS POSTMESSAGE was set, this command should always return
      nothing.

    "list-controllers" -- Retrieves a list of all connected controllers
      with either their registered ControllerID or "unset".

Implementation plan

  The POSTMESSAGE protocol is designed to be incrementally deployable.
  Initial implementations are only expected to implement broadcast
  capabilities and SETEVENTS based delivery. CONTROLLERNAME need not be
  supported, nor do non-"@all" POSTMESSAGE destinations.

  To support command-based controllers (which do not use SETEVENTS) such
  as Torbutton, at minimum the "GETINFO recent-messages" command is
  needed.  However, Torbutton does not have immediate need for this
  protocol.

Filename: 198-restore-clienthello-semantics.txt
Title: Restore semantics of TLS ClientHello
Author: Nick Mathewson
Created: 19-Mar-2012
Status: Closed
Target: 0.2.4.x

Status:

   Tor 0.2.3.17-beta implements the client-side changes, and no longer
   advertises openssl-supported TLS ciphersuites we don't have.

Overview:

   Currently, all supported Tor versions try to imitate an older version
   of Firefox when advertising ciphers in their TLS ClientHello.  This
   feature is intended to make it harder for a censor to distinguish a
   Tor client from other TLS traffic.  Unfortunately, it makes the
   contents of the ClientHello unreliable: a server cannot conclude that
   a cipher is really supported by a Tor client simply because it is
   advertised in the ClientHello.

   This proposal suggests an approach for restoring sanity to our use of
   ClientHello, so that we still avoid ciphersuite-based fingerprinting,
   but allow nodes to negotiate better ciphersuites than they are
   allowed to negotiate today.

Background reading:

   Section 2 of tor-spec.txt 2 describes our current baroque link
   negotiation scheme.  Proposals 176 and 184 describe more information
   about how it got that way.

   Bug 4744 is a big part of the motivation for this proposal: we want
   to allow Tors to advertise even more ciphers, some of which we would
   actually prefer to the ones we are using now.

   What you need to know about the TLS handshake is that the client
   sends a list of all the ciphersuites that it supports in its
   ClientHello message, and then the server chooses one and tells the
   client which one it picked.

Motivation and constraints:

   We'd like to use some of the ECDHE TLS ciphersuites, since they allow
   us to get better forward-secrecy at lower cost than our current
   DH-1024 usage.  But right now, we can't ever use them, since Tor will
   advertise them whether or not it has a version of OpenSSL that
   supports them.

   (OpenSSL before 1.0.0 did not support ECDHE ciphersuites; OpenSSL
   before 1.0.0e or so had some security issues with them.)

   We cannot have the rule be "Tors must only advertise ciphersuites
   that they can use", since current Tors will advertise such
   ciphersuites anyway.

   We cannot have the rule be "Tors must support every ECDHE ciphersuite
   on the following list", since current Tors don't do all that, and
   since one prominent Linux distribution builds OpenSSL without ECC
   support because of patent/freedom fears.

   Fortunately, nearly every ciphersuite that we would like to advertise
   to imitate FF8 (see bug 4744) is currently supported by OpenSSL 1.0.0
   and later.  This enables the following proposal to work:

Proposed spec changes:

   I propose that the rules for handling ciphersuites at the server side
   become the following:

   If the ciphersuites in the ClientHello contains no ciphers other than
   the following[*], they indicate that the Tor v1 link protocol is in use.

     TLS_DHE_RSA_WITH_AES_256_CBC_SHA
     TLS_DHE_RSA_WITH_AES_128_CBC_SHA
     SSL_DHE_RSA_WITH_3DES_EDE_CBC_SHA

   If the advertised ciphersuites in the ClientHello are _exactly_[*]
   the following, they indicate that the Tor v2+ link protocol is in
   use, AND that the ClientHello may have unsupported ciphers.  In this
   case, the server may choose DHE_RSA_WITH_AES_128_CBC_SHA  or
   DHE_RSA_WITH_AES_256_SHA, but may not choose any other cipher.

     TLS1_ECDHE_ECDSA_WITH_AES_256_CBC_SHA
     TLS1_ECDHE_RSA_WITH_AES_256_CBC_SHA
     TLS1_DHE_RSA_WITH_AES_256_SHA
     TLS1_DHE_DSS_WITH_AES_256_SHA
     TLS1_ECDH_RSA_WITH_AES_256_CBC_SHA
     TLS1_ECDH_ECDSA_WITH_AES_256_CBC_SHA
     TLS1_RSA_WITH_AES_256_SHA
     TLS1_ECDHE_ECDSA_WITH_RC4_128_SHA
     TLS1_ECDHE_ECDSA_WITH_AES_128_CBC_SHA
     TLS1_ECDHE_RSA_WITH_RC4_128_SHA
     TLS1_ECDHE_RSA_WITH_AES_128_CBC_SHA
     TLS1_DHE_RSA_WITH_AES_128_SHA
     TLS1_DHE_DSS_WITH_AES_128_SHA
     TLS1_ECDH_RSA_WITH_RC4_128_SHA
     TLS1_ECDH_RSA_WITH_AES_128_CBC_SHA
     TLS1_ECDH_ECDSA_WITH_RC4_128_SHA
     TLS1_ECDH_ECDSA_WITH_AES_128_CBC_SHA
     SSL3_RSA_RC4_128_MD5
     SSL3_RSA_RC4_128_SHA
     TLS1_RSA_WITH_AES_128_SHA
     TLS1_ECDHE_ECDSA_WITH_DES_192_CBC3_SHA
     TLS1_ECDHE_RSA_WITH_DES_192_CBC3_SHA
     SSL3_EDH_RSA_DES_192_CBC3_SHA
     SSL3_EDH_DSS_DES_192_CBC3_SHA
     TLS1_ECDH_RSA_WITH_DES_192_CBC3_SHA
     TLS1_ECDH_ECDSA_WITH_DES_192_CBC3_SHA
     SSL3_RSA_FIPS_WITH_3DES_EDE_CBC_SHA
     SSL3_RSA_DES_192_CBC3_SHA

  [*] The "extended renegotiation is supported" ciphersuite, 0x00ff, is
      not counted when checking the list of ciphersuites.

  Otherwise, the ClientHello has these semantics: The inclusion of any
  cipher supported by OpenSSL 1.0.0 means that the client supports it,
  with the exception of
      SSL_RSA_FIPS_WITH_3DES_EDE_CBC_SHA
  which is never supported. Clients MUST advertise support for at least one of
  TLS_DHE_RSA_WITH_AES_256_CBC_SHA or TLS_DHE_RSA_WITH_AES_128_CBC_SHA.

  The server MUST choose a ciphersuite with ephemeral keys for forward
  secrecy; MUST NOT choose a weak or null ciphersuite; and SHOULD NOT
  choose any cipher other than AES or 3DES.

Discussion and consequences:


  Currently, OpenSSL 1.0.0 (in its default configuration) supports every
  cipher that we would need in order to give the same list as Firefox
  versions 8 through 11 give in their default configuration, with the
  exception of the FIPS ciphersuite above.  Therefore, we will be able
  to fake the new ciphersuite list correctly in all of our bundles that
  include OpenSSL, and on every version of Unix that keeps up-to-date.

  However, versions of Tor compiled to use older versions of OpenSSL, or
  versions of OpenSSL with some ciphersuites disabled, will no
  longer give the same ciphersuite lists as other versions of Tor.  On
  these platforms, Tor clients will no longer impersonate Firefox.
  Users who need to do so will have to download one of our bundles, or
  use a non-system OpenSSL.


  The proposed spec change above tries to future-proof ourselves by not
  declaring that we support every declared cipher, in case we someday
  need to handle a new Firefox version.  If a new Firefox version
  comes out that uses ciphers not supported by OpenSSL 1.0.0, we will
  need to define whether clients may advertise its ciphers without
  supporting them; but existing servers will continue working whether
  we decide yes or no.


  The restriction to "servers SHOULD only pick AES or 3DES" is meant to
  reflect our current behavior, not to represent a permanent refusal to
  support other ciphers.  We can revisit it later as appropriate, if for
  some bizarre reason Camellia or Seed or Aria becomes a better bet than
  AES.

Open questions:

  Should the client drop connections if the server chooses a bad
  cipher, or a suite without forward secrecy?

  Can we get OpenSSL to support the dubious FIPS suite excluded above,
  in order to remove a distinguishing opportunity?  It is not so simple
  as just editing the SSL_CIPHER list in s3_lib.c, since the nonstandard
  SSL_RSA_FIPS_WITH_3DES_EDE_CBC_SHA cipher is (IIUC) defined to use the
  TLS1 KDF, while declaring itself to be an SSL cipher (!).

  Can we do anything to eventually allow the IE7+[**] cipher list as
  well?  IE does not support TLS_DHE_RSA_WITH_AES_{256,128}_SHA or
  SSL_DHE_RSA_WITH_3DES_EDE_CBC_SHA, and so wouldn't work with current
  Tor servers, which _only_ support those.  It looks like the only
  forward-secure ciphersuites that IE7+ *does* support are ECDHE ones,
  and DHE+DSS ones.  So if we want this flexibility, we could mandate
  server-side ECDHE, or somehow get DHE+DSS support (which would play
  havoc with our current certificate generation code IIUC), or say that
  it is sometimes acceptable to have a non-forward-secure link
  protocol[***].  None of these answers seems like a great one.  Is one
  best?  Are there other options?

  [**] Actually, I think it's the Windows SChannel cipher list we
  should be looking at here.
  [***] If we did _that_, we'd want to specify that CREATE_FAST could
  never be used on a non-forward-secure link.  Even so, I don't like the
  implications of leaking cell types and circuit IDs to a future
  compromise.

Filename: 199-bridgefinder-integration.txt
Title: Integration of BridgeFinder and BridgeFinderHelper
Author: Mike Perry
Created: 18-03-2012
Status: OBSOLETE


Overview

  This proposal describes how the Tor client software can interact with
  an external program that performs bridge discovery based on user input
  or information extracted from a web page, QR Code, online game, or
  other transmission medium.


Scope and Audience

  This document describes how all of the components involved in bridge
  discovery communicate this information to the rest of the Tor
  software. The mechanisms of bridge discovery are not discussed, though
  the design aims to be generalized enough to allow arbitrary new
  discovery mechanisms to be added at any time.
  
  This document is also written with the hope that those who wish to
  implement BridgeFinder components and BridgeFinderHelpers can get
  started immediately after a read of this proposal, so that development
  of bridge discovery mechanisms can proceed in parallel to supporting
  functionality improvements in the Tor client software.


Components and Responsibilities

 0. Tor Client
 
    The Tor Client is the piece of software that connects to the Tor
    network (optionally using bridges) and provides a SOCKS proxy for
    use by the user.
 
    In initial implementations, the Tor Client will support only
    standard bridges. In later implementations, it is expected to
    support pluggable transports as defined by Proposal 180.

 1. Tor Control Port
 
    The Tor Control Port provides commands to perform operations,
    configuration, and to obtain status information. It also optionally
    provides event driven status updates.
    
    In initial implementations, it will be used directly by BridgeFinder
    to configure bridge information via GETINFO and SETCONF. It is covered
    by control-spec.txt in the tor-specs git repository.

    In later implementations, it will support the inter-controller
    POSTMESSAGE IPC protocol as defined by Proposal 197 for use
    in conveying bridge information to the Primary Controller.
 
 2. Primary Controller
 
    The Primary Controller is the program that launches and configures the
    Tor client, and monitors its status.
    
    On desktop platforms, this program is Vidalia, and it also launches
    the Tor Browser. On Android, this program is Orbot. Orbot does not
    launch a browser.
    
    On all platforms, this proposal requires that the Primary Controller
    will launch one or more BridgeFinder child processes and provide
    them with authentication information through the environment variables
    TOR_CONTROL_PORT and TOR_CONTROL_PASSWD.

    In later implementations, the Primary Controller will be expected
    to receive Bridge configuration information via the free-form
    POSTMESSAGE protocol from Proposal 197, validate that information,
    and hold that information for user approval.
 
 3. BridgeFinder
 
    A BridgeFinder is a program that discovers bridges and configures
    Tor to use them.
    
    In initial implementations, it is likely to be very dumb, and its main
    purpose will be to serve as a layer of abstraction that should free
    the Primary Controller from having to directly implement numerous ways
    of retrieving bridges for various pluggable transports.
    
    In later implementations, it may perform arbitrary network operations
    to discover, authenticate to, and/or verify bridges, possibly using
    informational hints provided by one or more external
    BridgeFinderHelpers (see next component). It could even go so far as
    to download new pluggable transport plugins and/or transform
    definition files from arbitrary urls.
    
    It will be launched by the Primary Controller and given access to the
    Tor Control Port via the environment variables TOR_CONTROL_PORT and
    TOR_CONTROL_PASSWD.
    
    Initial control port interactions can be command driven via GETINFO
    and SETCONF, and do not need to subscribe to or process control port
    events. Later implementations will use POSTMESSAGE as defined in
    Proposal 197 to pass command requests to Vidalia, which will parse
    them and ask for user confirmation before deploying them. Use of
    POSTMESSAGE may or may not require event driven operation, depending
    on POSTMESSAGE implementation status (POSTMESSAGE is designed to
    support both command and event driven operation, but it is possible 
    event driven operation will happen first).
 
 4. BridgeFinderHelper
 
    Each BridgeFinder implementation can optionally communicate with one
    or more BridgeFinderHelpers. BridgeFinderHelpers are plugins to
    external 3rd party applications that can inspect traffic, handle mime
    types, or implement protocol handlers for accepting bridge discovery
    information to pass to BridgeFinder. Example 3rd party applications
    include Chrome, World of Warcraft, QR Code readers, or simple cut
    and paste.
    
    Due to the arbitrary nature of sandboxing that may be present in
    various BridgeFinderHelper host applications, we do not mandate the
    exact nature of the IPC between BridgeFinder instances and external
    BridgeFinderHelper addons. However, please see the "Security Concerns"
    section for common pitfalls to avoid. 
 
 5. Tor Browser
 
    This is the browser the user uses with Tor. It is not useful until Tor
    is properly configured to use bridges. It fails closed.
    
    It is not expected to run BridgeFinderHelper plugin instances, unless
    those plugin instances exist to ensure the user always has a pool of
    working bridges available after successfully configuring an
    initial bridge. Once all bridges fail, the Tor Browser is useless.
 
 6. Non-Tor Browser (aka BridgeFinderHelper host)
 
    This is the program the user uses for normal Internet activity to
    obtain bridges via a BridgeFinderHelper plugin. It does not have to be
    a browser. In advanced scenarios, this component may not be a browser
    at all, but may be a program such as World of Warcraft instead.


Incremental Deployability

  The system is designed to be incrementally deployable: Simple designs
  should be possible to develop and test immediately. The design is
  flexible enough to be easily upgraded as more advanced features become
  available from both Tor and new pluggable transports.

Initial Implementation

  In the simplest possible initial implementation, BridgeFinder will
  only discover Tor Bridges as they are deployed today. It will use the
  Tor Control Port to configure these bridges directly via the SETCONF
  command. It may or may not receive bridge information from a
  BridgeFinderHelper. In an even more degenerate case,
  BridgeFinderHelper may even be Vidalia or Orbot itself, acting upon
  user input from cut and paste.

 Initial Implementation: BridgeFinder Launch
 
   In the initial implementation, the Primary Controller will launch one
   or more BridgeFinders, providing control port authentication
   information to them through the environment variables TOR_CONTROL_PORT
   and TOR_CONTROL_PASSWD.
   
   BridgeFinder will then directly connect to the control port and
   authenticate. Initial implementations should be able to function
   without using SETEVENTS, and instead only using command-based
   status inquiries and configuration (GETINFO and SETCONF).
 
 Initial Implementation: Obtaining Bridge Hint Information
 
   In the initial implementation, to test functionality,
   BridgeFinderHelper can simply scrape bridges directly from
   https://bridges.torproject.org.
   
   In slightly more advanced implementations, a BridgeFinderHelper
   instance may be written for use in the user's Non-Tor Browser. This
   plugin could extract bridges from images, html comments, and other
   material present in ad banners and slack space on unrelated pages.
 
   BridgeFinderHelper would then communicate with the appropriate
   BridgeFinder instance over an acceptable IPC mechanism. This proposal
   does not seek to specify the nature of that IPC channel (because
   BridgeFinderHelper may be arbitrarily constrained due to host
   application sandboxing), but we do make several security
   recommendations under the section "Security Concerns: BridgeFinder and
   BridgeFinderHelper".
 
 Initial Implementation: Configuring New Bridges
 
   In the initial implementation, Bridge configuration will be done
   directly though the control port using the SETCONF command.
   
   Initial implementations will support only retrieval and configuration
   of standard Tor Bridges. These are configured using SETCONF on the Tor
   Control Port as follows:
     SETCONF Bridge="IP:ORPort [fingerprint]"


Future Implementations

  In future implementations, the system can incrementally evolve in a
  few different directions. As new pluggable transports are created, it
  is conceivable that BridgeFinder may want to download new plugin
  binaries (and/or new transport transform definition files) and
  provide them to Tor.
  
  Furthermore, it may prove simpler to deploy multiple concurrent
  BridgeFinder+BridgeFinderHelper pairs as opposed to adding new
  functionality to existing prototypes.
  
  Finally, it is desirable for BridgeFinder to obtain approval
  from the user before updating bridge configuration, especially for
  cases where BridgeFinderHelper is automatically discovering bridges
  in-band during Non-Tor activity.

  The exact mechanisms for accomplishing these improvements is
  described in the following subsections.

 Future Implementations: BridgeFinder Launch and POSTMESSAGE handshake
 
   The nature of the BridgeFinder launch and the environment variables
   provided is not expected to change. However, future Primary Controller
   implementations may decide to launch more than one BridgeFinder
   instance side by side.
 
   Additionally, to negotiate the IPC channel created by Proposal 197
   for purposes of providing user confirmation, it is recommended that
   BridgeFinder and the Primary Controller perform a handshake using
   POSTMESSAGE upon launch, to establish that all parties properly
   support the feature:
 
     Primary Controller: "POSTMESSAGE @all Controller wants POSTMESSAGE v1.1"
     BridgeFinder: "POSTMESSAGE @all BridgeFinder has POSTMESSAGE v1.0"
     Primary Controller: "POSTMESSAGE @all Controller expects POSTMESSAGE v1.0"
     BridgeFinder: "POSTMESSAGE @all BridgeFinder will POSTMESSAGE v1.0"
 
   If this 4 step handshake proceeds with an acceptable version,
   BridgeFinder must use POSTMESSAGE to transmit SETCONF Bridge lines
   (see "Future Implementations: Configuring New Bridges" below). If
   POSTMESSAGE support is expected, but the handshake does not complete
   for any reason, BridgeFinder should either exit or go dormant.
 
   The exact nature of the version negotiation and exactly how much
   backwards compatibility must be tolerated is unspecified.
   "All-or-nothing" is a safe assumption to get started.
 
 Future Implementations: Obtaining Bridge Hint Information
 
   Future BridgeFinder implementations may download additional
   information based on what is provided by BridgeFinderHelper. They
   may fetch pluggable transport plugins, transformation parameters,
   and other material.
 
 Future Implementations: Configuring New Bridges
 
   Future implementations will be concerned with providing two new pieces
   of functionality with respect to configuring bridges: configuring
   pluggable transports, and properly prompting the user before altering
   Tor configuration.
 
   There are two ways to tell Tor clients about pluggable transports
   (as defined in Proposal 180).
 
   On the control port, an external Proposal 180 transport will be
   configured with
     SETCONF ClientTransportPlugin=<method> socks5 <addr:port> [auth=X]
   as in
     SETCONF ClientTransportPlugin="trebuchet socks5 127.0.0.1:9999".
 
   A managed proxy is configured with
     SETCONF ClientTransportPlugin=<methods> exec <path> [options]
   as in
     SETCONF ClientTransportPlugin="trebuchet exec /usr/libexec/trebuchet --managed".
 
   This example tells Tor to launch an external program to provide a
   socks proxy for 'trebuchet' connections. The Tor client only
   launches one instance of each external program with a given set of
   options, even if the same executable and options are listed for
   more than one method.
 
   Pluggable transport bridges discovered for this transport by
   BridgeFinder would then be set with:
     SETCONF Bridge="trebuchet 3.2.4.1:8080 keyid=09F911029D74E35BD84156C5635688C009F909F9 rocks=20 height=5.6m".

   For more information on pluggable transports and supporting Tor
   configuration commands, see Proposal 180.
 
 Future Implementations: POSTMESSAGE and User Confirmation
 
   Because configuring even normal bridges alone can expose the user to
   attacks, it is strongly desired to provide some mechanism to allow
   the user to approve new bridges prior to their use, especially for
   situations where BridgeFinderHelper is extracting them transparently
   while the user performs unrelated activity.
 
   If BridgeFinderHelper grows to the point where it is downloading new
   transform definitions or plugins, user confirmation becomes
   absolutely required.
 
   To achieve user confirmation, we depend upon the POSTMESSAGE command
   defined in Proposal 197. 
 
   If the POSTMESSAGE handshake succeeds, instead of sending SETCONF
   commands directly to the control port, the commands will be wrapped
   inside a POSTMESSAGE:
     POSTMESSAGE @all SETCONF Bridge="www.example.com:8284"
 
   Upon receiving this POSTMESSAGE, the Primary Controller will
   validate it, evaluate it, store it to be later enabled by the
   user, and alert the user that new bridges are available for
   approval. It is only after the user has approved the new bridges
   that the Primary Controller should then re-issue the SETCONF commands
   to configure and deploy them in the tor client.
 
   Additionally, see "Security Concerns: Primary Controller" for more
   discussion on potential pitfalls with POSTMESSAGE.

Security Concerns

  While automatic bridge discovery and configuration is quite compelling
  and powerful, there are several serious security concerns that warrant
  extreme care. We've broken them down by component.
  
 Security Concerns: Primary Controller
 
   In the initial implementation, Orbot and Vidalia must take care to
   transmit the Tor Control password to BridgeFinder in such a way that
   it does not end up in system logs, process list, or viewable by other
   system users. The best known strategy for doing this is by passing the
   information through exported environment variables.
   
   Additionally, in future implementations, Orbot and Vidalia will need
   to validate Proposal 197 POSTMESSAGE input before prompting the user.
   POSTMESSAGE is a free-form message-passing mechanism. All sorts of
   unexpected input may be passed through it by any other authenticated
   Tor Controllers for their own unrelated communication purposes.

   Minimal validation includes verifying that the POSTMESSAGE data is a
   valid Bridge or ClientTransportPlugin line and is acceptable input for
   SETCONF. All unexpected characters should be removed through using a
   whitelist, and format and structure should be checked against a
   regular expression. Additionally, the POSTMESSAGE string should not be
   passed through any string processing engines that automatically decode
   character escape encodings, to avoid arbitrary control port execution.
   
   At the same time, POSTMESSAGE validation should be light. While fully
   untrusted input is not expected due to the need for control port
   authentication and BridgeFinder sanitation, complicated manual string
   parsing techniques during validation should be avoided. Perform simple
   easy-to-verify whitelist-based checks, and ignore unrecognized input.
   
   Beyond POSTMESSAGE validation, the manner in which the Primary
   Controller achieves consent from the user is absolutely crucial to
   security under this scheme. A simple "OK/Cancel" dialog is
   insufficient to protect the user from the dangers of switching
   bridges and running new plugins automatically.
   
   Newly discovered bridge lines from POSTMESSAGE should be added to a
   disabled set that the user must navigate to as an independent window
   apart from any confirmation dialog. The user must then explicitly
   enable recently added plugins by checking them off individually. We
   need the user's brain to be fully engaged and aware that it is
   interacting with Tor during this step.  If they get an "OK/Cancel"
   popup that interrupts their online game play, they will almost
   certainly simply click "OK" just to get back to the game quickly.
 
   The Primary Controller should transmit the POSTMESSAGE content to the
   control port only after obtaining this out-of-band approval.

Security Concerns: BridgeFinder and BridgeFinderHelper

  The unspecified nature of the IPC channel between BridgeFinder and
  BridgeFinderHelper makes it difficult to make concrete security
  suggestions. However, from past experience, the following best
  practices must be employed to avoid security vulnerabilities:

  1. Define a non-webby handshake and/or perform authentication

     The biggest risk is that unexpected applications will be manipulated
     into posting malformed data to the BridgeFinder's IPC channel as if it
     were from BridgeFinderHelper. The best way to defend against this is
     to require a handshake to properly complete before accepting input. If
     the handshake fails at any point, the IPC channel must be abandoned
     and closed. Do not continue scanning for good input after any bad
     input has been encountered.
     
     Additionally, if possible, it is wise to establish a shared secret
     between BridgeFinder and BridgeFinderHelper through the filesystem or
     any other means available for use in authentication. For an a good
     example on how to use such a shared secret properly for
     authentication, see Trac Ticket #5185 and/or the SafeCookie Tor
     Control Port authentication mechanism.

  2. Perform validation before parsing 

     Care must be taken before converting BridgeFinderHelper data into
     Bridge lines, especially for cases where the BridgeFinderHelper data
     is fed directly to the control port after passing through
     BridgeFinder.

     The input should be subjected to a character whitelist and possibly
     also validated against a regular expression to verify format, and if
     any unexpected or poorly-formed data is encountered, the IPC channel
     must be closed.

  3. Fail closed on unexpected input

     If the handshake fails, or if any other part of the BridgeFinderHelper
     input is invalid, the IPC channel must be abandoned and closed. Do
     *not* continue scanning for good input after any bad input has been
     encountered.


Filename: 200-new-create-and-extend-cells.txt
Title: Adding new, extensible CREATE, EXTEND, and related cells
Author: Robert Ransom
Created: 2012-03-22
Status: Closed
Implemented-In: 0.2.4.8-alpha

History

  The original draft of this proposal was from 2010-12-27; nickm revised
  it slightly on 2012-03-22 and added it as proposal 200.

Overview and Motivation:

  In Tor's current circuit protocol, every field, including the 'onion
  skin', in the EXTEND relay cell has a fixed meaning and length.
  This prevents us from extending the current EXTEND cell to support
  IPv6 relays, efficient UDP-based link protocols, larger 'onion
  keys', new circuit-extension handshake protocols, or larger
  identity-key fingerprints.  We will need to support all of these
  extensions in the near future.  This proposal specifies a
  replacement EXTEND2 cell and related cells that provide more room
  for future extension.

Design:

  FIXME - allocate command ID numbers (non-RELAY commands for CREATE2 and
  CREATED2; RELAY commands for EXTEND2 and EXTENDED2)

  The CREATE2 cell contains the following payload:

        Handshake type                        [2 bytes]
        Handshake data length                 [2 bytes]
        Handshake data                        [variable]

  The relay payload for an EXTEND2 relay cell contains the following
  payload:

        Number of link specifiers             [1 byte]
           N times:
            Link specifier type               [1 byte]
            Link specifier length             [1 byte]
            Link specifier                    [variable]
        Handshake type                        [2 bytes]
        Handshake data length                 [2 bytes]
        Handshake data                        [variable]

  The CREATED2 cell and EXTENDED2 relay cell both contain the following
  payload:

        Handshake data length                 [2 bytes]
        Handshake data                        [variable]

  All four cell types are padded to 512-byte cells.

  When a relay X receives an EXTEND2 relay cell:

  * X finds or opens a link to the relay Y using the link target
    specifiers in the EXTEND2 relay cell; if X fails to open a link, it
    replies with a TRUNCATED relay cell. (FIXME: what do we do now?)

  * X copies the handshake type and data into a CREATE2 cell and sends
    it along the link to Y.

  * If the handshake data is valid, Y replies by sending a CREATED2
    cell along the link to X; otherwise, Y replies with a TRUNCATED
    relay cell. (XXX: we currently use a DESTROY cell?)

  * X copies the contents of the CREATED2 cell into an EXTENDED2 relay
    cell and sends it along the circuit to the OP.


Link target specifiers:

  The list of link target specifiers must include at least one address and
  at least one identity fingerprint, in a format that the extending node is
  known to recognize.

  The extending node MUST NOT accept the connection unless at least one
  identity matches, and should follow the current rules for making sure that
  addresses match.

  [00] TLS-over-TCP, IPv4 address
       A four-byte IPv4 address plus two-byte ORPort
  [01] TLS-over-TCP, IPv6 address
       A sixteen-byte IPv6 address plus two-byte ORPort
  [02] Legacy identity
       A 20-byte SHA1 identity fingerprint. At most one may be listed.

  As always, values are sent in network (big-endian) order.

Legacy handshake type:

  The current "onionskin" handshake type is defined to be handshake type
  [00 00], or "legacy".

  The first (client->relay) message in a handshake of type “legacy”
  contains the following data:

        ‘Onion skin’ (as in CREATE cell)      [DH_LEN+KEY_LEN+PK_PAD_LEN bytes]

  This value is generated and processed as sections 5.1 and 5.2 of
  tor-spec.txt specify for the current CREATE cell.

  The second (relay->client) message in a handshake of type “legacy”
  contains the following data:

        Relay DH public key                   [DH_LEN bytes]
        KH (see section 5.2 of tor-spec.txt)  [HASH_LEN bytes]

  These values are generated and processed as sections 5.1 and 5.2 of
  tor-spec.txt specify for the current CREATED cell.

  After successfully completing a handshake of type “legacy”, the
  client and relay use the current relay cryptography protocol.

Bugs:

  This specification does not accommodate:

  * circuit-extension handshakes requiring more than one round

    No circuit-extension handshake should ever require more than one
    round (i.e. more than one message from the client and one reply
    from the relay).  We can easily extend the protocol to handle
    this, but we will never need to.

  * circuit-extension handshakes in which either message cannot fit in
    a single 512-byte cell along with the other required fields

    This can be handled by specifying a dummy handshake type whose
    data (sent from the client) consists of another handshake type and
    the beginning of the data required by that handshake type, and
    then u