The text below can be seen as a personal summary of the parts of
these threads that pertain to Request-IDs and privacy.
------snip----
The Request-ID: header field.
Adapted from the proposal in
<URL:http://www.w3.org/hypertext/WWW/Protocols/demographics.html>.
Am HTTP request may include a header field of the form:
Request-ID: $session $request++
e.g.
Request-ID: 342%33a4d443 12
The HTTP client chooses a random string as a "session identifier",
and each request in that session is identified by a number that
increases monotonically with time.
It is suggested that clients use a different random $session string
for each server they talk to. This will make it more difficult for
cooperating web service providers to match clicktrails in their
logfiles, thereby getting user profiling information that is much
more accurate than the user would want to give them without some
form of compensation. Note that it is illegal to match logfiles
under the privacy laws in some countries. The suggestion to use
different $session strings can be seen as supporting these laws by
making the crime of matching logfiles pay off less.
A "session" is not formally defined (other than "a set of requests
with the same $session id"), though I suggest that browsers begin a
session when they are invoked and when they have been idle for 30
minutes or more, and allow some user interface to say "start a new
session" (i.e. "choose a new random session ID").
Each user agent must provide a mechanism to turn the generation of
Request-Ids off, especially for site security administrators that
prohibit its use.
If no Request-ID headers are present, this should be interpreted by
web service providers as a statement that the user does not wish to
reveal his or her exact clicktrail for privacy reasons. An attempt
by service providers to silently obtain the clicktrail by some
other means (for example by using a session-id, cookie, or
anonymous authentication mechanism that could be part of future
versions of HTTP), should be considered to violate the privacy
wishes of the user.
Whether HTTP clients use a global $request counter, or one counter
for each server talked to, is up to the clients. HTTP clients
which are not traditional user agents (e.g. multi-threaded robots)
may use several sessions in parallel.
A proxy must pass the Request-ID: header through unmodified. One might
consider some sort of Proxy-Request-ID, though I doubt it would be
valuable.
An HTTP cache can assume that the response to an HTTP request does
_not_ vary as a function of the Request-ID. That is, an HTTP proxy
need not include the Request-ID in its "cache key." If the
response to a request can vary, an Expires header should be used in
the response to reflect this dynamism.
It is preferred that the request-ID header is _not_ used to
implement statefull dialogs, in which the content of pages is
different for different sessions. For statefull dialog support,
other mechanisms (for example a session-id, cookie, or anonymous
authentication mechanism that could be part of future versions of
HTTP) should be used.
Alternative proposal:
Instead of introducing a new Request-ID: header, include the
$session $request++
information in the From: header. Examples:
From: (#342%33a4d443 12)
From: "Roy T. Fielding" <[email protected]> (#342%33a4d443 12)
---snip---
Koen.