I've been thinking about what it would take to create a 'mirror' program
for HTTP, similar to the program of the same name for anonymous ftp sites.
I want to give the program a single URL, and a local directory and have it
retrieve the document
determine which are "internal" links to more pieces of the same 'exhibit',
and which are "external" links. It would keep all the "external" links,
and change all "internal" links to refer to the local http server, and
retrieve their documents.
The only thing I can think of that would work well for determining whether
or not a link is "internal" or "external" would be to violate the URL,
looking inside the opaque descriptor and determine if the two documents
were in the same directory. But then, we see many closely related
'exhibits' that are spread among several directories. If those directories
are all subdirectories of a single directory, though, the parent directory
could be used in a comparison to determine whether the URL was "internal"
or "external".
With such a program (and it doesn't really look too difficult for a perl
script), we could have major 'cache' sites in different areas of the world
that maintain large 'archives' of great HTML material.
Of course, the URN concept makes this all work even better.
Curt