I have some problems regarding caching that I hope someone here will
have some good suggestions how to solve. I've already read the
discussion on http://www.ics.uci.edu/WebSoft/caching.html, and it
didn't really help very much.
The problem is that the current round of browsers and the
current protocol do not seem to interact very smoothly when it comes
to caching pages. I'll explain what I mean:
The "expires" feature should cover the issue of when pages should be
flushed, but the world is apparently not ready for it, because:
- If you set documents to expire immediately, some major browsers
display "Data Missing" or equivalently scary messages when you use
browser commands to "back up" to that page. Since many users are not
going to understand what is going on and will be confused by such
messages, and may not know to "reload" the page at that point, it
would be better for them never to see messages like that. (I've
already had problems with some naive beta testers tripping over that.
They tend to think something must have broken. You can't argue that
we need more sophisticated users, because we don't have a choice!)
- Some browsers (such as Prodigy's) appear to ignore the "expires" header
and cache pages anyway. (and that's just their *browser*...)
So, I have a question and I have suggestions.
First, the question:
Is there any good workaround for the current problem, that would have
the properties of:
- forcing browsers to reload expired pages when someone explicitly requests
one, and
- either:
- allowing pages on the browser's history stack (for instance) to remain
in the local cache even if they are expired, or,
- *somehow* causing the browsers to gracefully and silently reload
expired pages when re-visited through history mechanisms.
No? I suspected as much...
The suggestions:
To make the web work more smoothly, it would be nice if browsers would
handle this situation more gracefully, by, for instance, not displaying
errors like "Data Missing", but just automatically reloading the page.
However, I also think it is worth considering for browser writers that
history stacks (that can be re-viewed with browser navigation
controls) are in a class of their own when it comes to caching.
However, while it might make sense to back up and see an expired
document, since history mechanisms are for "history", it does not make
sense to go through a link and see a cached copy of an expired
document. It is REALLY BAD for browsers to display cached copies of
expired documents when they are meant to be freshly displayed in
response to a direct user command, because a URL may be a request to a
program that is displaying dynamic information related to the user's
extended "session" with the server. (This is the core of the issue).
I realize these considerations may have no role in the HTTP spec,
however I feel there are serious problems in this area, which can only
be resolved by coordinating the behavior of browsers and servers.
Another thing that might help: perhaps there should be a way for
servers to "force" the URL (the *name*) handled by clients to something other
than the requested URL. This would allow, for example, the
requestor's URL to be used to encode information relating to a query,
but would then result in a single cache entry in the client.
To explain this a little more, if there were two GET requests, one for
/cgi-bin/food/hamburgers and one for /cgi-bin/food/french-fries, which
would result in a single page that ought to be cached as one page,
then the server ought to be able to say, "you asked for
/food/french-fries, but the page is called /food/generic-junk-food",
and to have the browser use that info to uniquely identify a cache
entry and update it with the newly fetched data. This might not help
to avoid fetching documents extra times, but it would help on cache
coherence if the intent was to display a dynamically generated document.
Anyway, just some thoughts. If you have any ideas, pointers or references
for me, I would really appreciate it.
--Shel Kaphan
[email protected]