Session tracking

Brian Behlendorf ([email protected])
Mon, 17 Apr 1995 22:42:17 +0500


There are a couple systems starting to be deployed now that attempt to gain
information about "clickstreams". "Clickstreams" are the paths people take
when they traverse your site - many content providers would find it useful to
be able to detect common patterns or the effectiveness of various user
interfaces. The problem is, of course, that HTTP is stateless, and beyond
the hostname offers very little in the way of identification of unique
"trips" through the content site. Given that more than one person can use a
hostname (proxy servers, etc), there's no reliable way to exactly identify a
unique person without implementing access control (as I did at HotWired, and
believe me, it's not a general solution). Compound this with the fact that
people can begin and end their "trips" at any page on a site, and you'll see
this is a big problem for sites interested in this kind of statistical
information.

The systems being implemented, by companies like IPRO (http://www.ipro.com/)
and content providers like PathFinder (http://www.pathfinder.com/) are
fatally flawed. They create a unique session ID when a user touches their
home page that gets encoded between the hostname and the path/file in the URL
(in the case of pathfinder), and that session ID stays with you throughout
your journey through the site. Of course, this session ID also stays with me
if I save a hotlist reference to a page beyond the home page, or if I cut and
paste the URL and mail it around to my friends. In the latter case, if I'm
given the session ID of "KJHFJHDSF", then all my friends go visit that page,
they'll all see accesses under session-ID "KJHFJHDSF". This system also
destroys caching of documents, both local disk and proxy caches. I told
this much to a reporter at MediaWeek last week.

There is definitely a demand for this kind of information, and it would
help make professional web sites more responsive to what really works and
what doesn't - and this is also information that current web logs
and the HTTP protocol really can't provide. However, any proposed
solution *must* protect the anonymity of the user, for it's not really
necessary to lose that when all that's cared about is unique sessions.

So, I'd like to propose for discussion a new HTTP header (hi Roy!) called
"Session-ID". This would be optional, of course, and it would change any
time the browser is restarted (or when the user wished). It would
consist of a string of 32 random base 64 characters (or whatever encoding
is allowed in headers). It would allow the content provider to see the
"path" one takes through his system, even when two separate requests are
interlacing through a proxy server (HotWired would often get 5
individuals hitting it from antares.prodigy.com at the same time),
without requiring user authentication or divulging of any personal
information. The "From:" header would also work, but it would give away
information that most would probably prefer not to give.

The only flaw is that the session-ID is temporary and can't be used to
determine if 50 sessions are 5 people visiting you 10 times, or 50 people
visiting you once. An analysis of domains can help with that though.

Comments? I'd obviously like to try implementing this, maybe it's time
to learn elisp..... :)

Brian

--=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=--
[email protected] [email protected] http://www.[hyperreal,organic].com/