What URIs are and are not.

Tim Berners-Lee ([email protected])
Wed, 3 Nov 93 18:04:28 +0100


Let me put down the *original* functional spec for URIs.
I fear that some people have gotten away from the original
requirement, and wanted to start designing things.

Listen good if you are a newcomer to the list or on the
IESG ;-)

There are many protocols on the net which imply a data model
which can be mapped onto some concept of "objects" and
addresses/names/identifiers/locators for those objects.

Examples: Protocol Objects
FTP Directories
Files
SMTP mail addresses
mail messages
NNTP newsgroups
articles
HTTP objects
Gopher menus
documents
DNS hosts
Mail eXchanges
...

There will be many more future examples.

The characteristics of the objects and the properties of the
names.addresses/identifiers/locators vary and are defined by:-

a. The protocol specification
b. The way the protocol is actually used
c. The conventions which are used by people

(Example: a.The FTP RFC implies that a directory object may contain
files,
in defining that NLST on a directory returns a list of files.
b.The protocol is in fact often used using only A and I
modes, and with the user/pass pair being "anonymous" and
a mail address. c. A convention is that ftp.x.x.x host names
are not changed very often, but can change
Hence the properties of

ftp://info.cern.ch/pub/www
are that it contains files, maybe listed by anonymous
ftp to info.cern.ch, the files may change, but lifetimes
will be of the order of year for directories.)

There is for each protocol an implicit name/address/identifier
space for the n/a/i s in th implicit data model.

I am trying to get across the great variety of schemes.

What you can do with an address/name/identifier depends also
on who and where you are and what facilities you have. So
it is difficult to define. (This is why I don't feel that the
URL/URN taxonomy debate has given us much).

HOWEVER, it is still extremely useful to have the concept of
the universal set of all identifier/name/addresses in all
schemes.
It is also useful to have a syntax for writing down the value

One cannot deny that it is useful, because WWW *uses* it. This
is *not* to say that the WWW installed base prevents any bugs
in the URL spec from being fixed, but it is an existence proof
of the need.

The syntax for the universal set was called, in WWW, the URI
syntax, for Universal Resource Identifier. The WG changed
"Universal" to "Uniform", but in doing so lost the important
significance of the Universality: that fact that, if you create
a name space, whatever its properties, I can give it a name
and map its syntax into acceptable UDI syntax.

Note that attepmts to make URIs a subset of another
name space are of couse possible but pointless by
definition.

The URI working group pointed out very sensibly that a
system of more persistent names was necessary.

Unfortunately, and this was the *big mistake*, we then
set about a taxonomy of all name spaces, to divide them into
URLs (of which they had several) and URNs (of which they used
none as no lookup method existed), and worse, to extend the
taxonomy to new schemes not yet invented,

I had hoped that a distributed persistwent name lookup
service would arise, but it didn't. What did happen was
that great world-designing started and never finished.

Anyway, all existing schems have been called URLs, and
URN is a reserved name.

Since, there have been long discussion about, for example, whether
a news article id is a URL or a URN. The IIIR community is trying
to retrofit a top-down design onto all existing systems. This
is foolish because

1. If you retrofit a design onto existing practice
to make it clean you have to lie about existing
practice.

2. To do a top-down design in this area won't work.
We have to progress by a sequence of brilliant
independent ideas.

3. If you manage to categorize all the existing schemes
into a taxonomy you will only end up restricting the
future dschemes into yoru current mind set.

What SHOULD we be doing? Valid things to define and, therefore,
argue over are:

1. Interpretation of the implicit data model.
For example, my interpretaion of the FTP model
was that you browse directories, and the filenames
are the names, and the files the addresses.
The data type is guessed from the filename.
This was my laying of a formal model onto the FTP
protocol which didn't dedfine one.
Others take the view that one doesn't browse a
directory, one gets an address from a mail message,
and there is information th the filename (etc)
to tell you which transfer type to use.

Obviously both are valid mappings, we need to chose
and maybe use both.

2. Design of new data models. This is valid for HTTP
and for URNs.

3. The mapping of names in the model onto a concrete
string syntax. Malinly a question aof character sets,
and settled, thank you.

The URL document talked about "requirements" on names
and addresses in different schemes. That was a mistake. It should
have talked about "characteristics" of names in different models.
We can only document these characteristics for current protocols,
we can't define them. What we can do though is invent new schemes,
and in particular the fabled URN scheme.
Discussion of the relative merits of characteristics is
outside the bounds of the URL document.

In summary, the URL document

- defines a Universal syntax for ANY past or future
names/addresses/identifiers

- defines a spoecific mapping of name spaces
implicit in existing protocols into URI space.

The URIs defined for existing protocols are known as URLs
and they have the property that they map directly onto a
single protocol in each case.

If the URI WG wants to define something other than URIs
as defined above (and I hope in the document) then they
should first decide what to do with URIs.

Tim Berners-Lee
CERN