A proposal for a stateful web without stateful connections

Brian Behlendorf ([email protected])
Fri, 3 Feb 1995 06:19:53 +0100


For the last couple of weeks I have been running around like a chicken
with my head cut off trying to figure out the best way to implement a
stream-based publishing scheme into a system, HTTP, which is most ideally
designed for static archives of documents. I think I've found a
solution. I think some of you may shoot me, but I'm willing to take that
risk.

I propose a change to the HTTP spec which will require no changes to
servers and a small change to clients. I propose an additional header to
the request and response, "State:". When a client fetches a URL, if the
server returns a State: field in its header, the client stores the value
of that field with the URL. When the client revisits the URL, it sends
along that State: value, and the server can respond with a new value to be
associated with the URL.

Here's an example of its utility in a stream-based context. Assume that
Flux is a stream of content where new bits are posted every couple of
hours, and the actual file is a script that assembles Flux bits into a page.

John goes and visits HotWired for the first time. He goes to the section
called Flux.

C: GET /Flux HTTP/1.0

The server sees no state value associated with it, so it returns
the default page for Flux, which is comprised of bits from the last two
weeks. In this case, it's items 1-29.

S: Content-type: text/html
State: 1-29
<file with 29 elements>

Now, John comes back a couple of days later, in which time 5 new items
were posted to flux.

C: GET /Flux HTTP/1.0
State: 1-29
S: Content-type: text/html
State: 1-34
<file with just 5 elements>

Okay, now it's been two weeks, in which time 15 new pieces were posted to
flux. John first goes to a full-text-search page and looks for ISDN. He
gets back URL's pointing to bits 39 and 50 in Flux, among others (the URL
being constructed by the search engine). So he visits Flux:

C: GET /Flux?39,50
State: 1-34
S: Content-type: text/html
State: 1-34,39,50
<file>

(by the way, /Flux is a script, and since the URL's are generated by
scripts, there's no magic happening on the client side for this! The CGI
spec mandates that extra headers are encoded in the CGI environment, so
servers would get this data by looking in the environment for HTTP_STATE.)

He goes and does another search, this time for McDonalds, and gets
references to items 22,45, and 50. So he goes to that page:

C: GET /Flux?22,45,50
State: 1-34,39,50
S: Content-type: text/html
State: 1-34,39,45,50

it can be totally up to the script to determine whether it should also
send along items 22 and 50 given that the person has already seen them.
Think of this as exactly comparable to when a news reader tells an NNTP
server "hey, I have articles 1-34,39,45,50, what else is there?".

There's no reason why other types of data can't be encoded in this URL:
time/date info, customization info, etc. Browsers should be able to clear
state info, and state expiration should coincide with URL expiration. The
onus is on the servers to keep the amount of state info to a minimum - the
spec could declare something like 5K as the maximum amount of state info
allowed, unless there is a limit in HTTP headers themselves. The encoding
of the information must be the same as HTTP headers are allowed.

I can't see any problems with security that aren't encountered by people
using news readers in the first place. The only insecurities I see are
situations where people do not use their browser from the same account
all the time (like in a macintosh classroom or something) where those
values could contain highly personalized and private data, but we are
hopefully moving away from those systems anyways (we'll need to for real
public annotations). It will allow *any* publisher to do the kinds of mass
customizations which are currently only possible using password-protected
authentication and storing state data on the server side, like we do.

To sum up, this requires no changes to the server (since it could be
managed by scripts - but obviously a stream-based publishing engine built
into the server could use this intelligently), and minimal changes to the
client (clients keep around a history of visits to URL's anyways, this is
simply associating a value with that key).

So, what do you think? I can almost hear Tim BL and others grumbling now
about how GET requests are supposed to be idempotent... I understand the
need for that, but I have a hard time seeing how else something like this
could be implemented.

Brian

--=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=--
[email protected] [email protected] http://www.hotwired.com/Staff/brian/