Reliable links [Was: Stab in the dark ]

Daniel W. Connolly ([email protected])
Fri, 18 Mar 1994 20:50:04 --100


In message <[email protected]>, Larry Masinter write
s:
>
>If URNs are allowed to refer to multiple formats of documents, or
>multiple versions of updating documents, or online streams of
>information that you might telnet to, then ...

.. then you can't do anything reliably!!!! Ha Ha Ha!!! I've been
trying to make this point for TWO YEARS!

There is some value in having names for things that are not defined
as octet streams (fulltext indexes, newsgroups, FTP directories, etc.)
but as it is, we are missing out on the tremendous value of taking
advantage of the multitude of things that _are_ defined as octet
streams: software distributions, documents (once represented in some
format), news articles, email messages, ... ... ...

I suggest that the basic http query:

GET url

is not reliable. In version HTTP 0.9 (and gopher, incidentally), there
isn't necessarily ANY relationship between the url and the returned
data. At least in HTTP 1.0, there's a status code so the server can
tell you whether it _thinks_ it has answered your query in a sensible
way.

But in either case, you can give the same url twice and there's no
mechanism to guarantee that you'll get the same thing back, and no way
to test to see if you did! Isn't this the basic feature of a
reference, link, or citation? I write "See page 123 for info on
Widget Co." with the understanding that when my reader turns to page
123, he'll see the same information I'm talking about.

With paper book publishing, the reference is bound with the target
information at publishing time. But in a distributed system, different
parts of the information base are changing at different times.

I perceive that there is a REQUIREMENT to be able to write reliable
links. The first step is to acknowledge this as a requirement and
define what reliable means. Then we can look into various methods for
various levels of Quality Of Service.

My working definition is that we define a namespace of keys and a
mapping:

resolve: key -> octet-string

such that it is a function; i.e. if resolve(x) = y and resolve(x) = z,
then y = z. (we can define a superset of this mapping to include the
things that aren't octet-strings...)

If we look at the original definition of the set of keys, i.e.

key = scheme x string

then we see that it wasn't designed to satisfy the definition of
reliability. For example,

resolve((http, "//info.cern.ch/default.html"))

has different values at different times. One solution is to extend the
http scheme to include a time. We can be sure, for example, that
resolve((http, "//info.cern.ch/default.html", March 18 1pm CST))
has only one value.

As for variations on format, language, etc, we can defevelop a syntax
for "the set of format variations of default.html", for example:

http://info.cern.ch/default.*

but resolve((http, "//info.cern.ch/default.*")) is not well-defined.

The HTTP Accept method does provide reliability to this situation.
The query "get the format-variant of /foo/bar that minimizes the
penalty function with these parameters..." is well-defined.

But I think all URI resolution schemes should address reliability. And
I think perhaps it should be evident from the syntax of a URI --
independent of scheme -- whether or not it identifies a unique. And on
a per-scheme basis, it should be specified how a URI can be resolved
reliably.

Dan