Re: URN single or multiple variants (was: four-part harmony?)

Terry Winograd ([email protected])
Sat, 2 Oct 1993 10:35:45 -0800


At 12:46 AM 9/23/93 -0700, John A. Kunze wrote:
>
>Foundation premises. See if you can sing these verses, or maybe hum along:
>

There's starting to be a nice melody there...

>(1) Documents, images, and other objects may be closely related enough that
> many people would agree they have the "same intellectual content". An
> absolute or universal concept of "sameness", however, does not concern
> us since it's a Very Hard Problem and a Very Big Meeting Time Sink.

Definitely!

>(2) Instead we're interested in "Who says What is the same or different".
> That way users benefit from different points of view, but they know
> whose point of view it is. Users will need more than one point of
> view because Entity A may have an opinion about objects X and Y, but
> not about Z, which Entity B may have an opinion about.

Point of view is not the same as an entity. A single person takes different
points of view for different purposes. From one of my own poinst of view
two versions of the same document have the "same intellectual content"
while from another point of view they don't.
>
>(3) An entity that has an opinion on this subject (I'll call it an
> IdAuthority for now -- I can't find the URN paper) can be anybody
> in principle; familiar examples will be publishers and libraries.

Even more familiar examples are people (like a department administrator or
working group secretary) who make materials available on the net by
sticking them in an FTP directory, Gopher server, etc. That is, I think we
shouldn't focus on the heavyweight institutional players and lose sight of
the fact that their current roles are going to be chopped up and
distributed around the net.

>(4) Two entities may have different opinions about any set of objects.

One entity may have different opinions from different points of view, as well.

>(5) The creator or owner (whatever that means) of an object may be one of
> several IdAuthorities for that object, or it may not be an IdAuthority.

Yes. There will be a spectrum of IdAuthorities, with the low end being "I
am my own ID authority for what I put on the net" and the high end being
"The Official International Government Poobah Everything Registration
Authority". People will choose (and at times pay for) a level of authority
that fits their needs. Homebrew is fine for sharing documents in a group
(but still needs to follow the semantics of identifiers, locators, etc. in
a consistent way). Higher levels bring more guarantees that the URNs will
be translatable to URLs via standard services, that the URLs won't point
into empty space, that the items will still be accessible a few centuries
from now, etc. (this means that the same authority organizations will
likely be providing services for access, storage, archiving, etc.)

I think it is useful to make some distinctions between the ORIGINATOR of
an information object, a PROVIDER of the object, the AUTHORITY of a URN and
the DECLARER of that URN:

ORIGINATOR: actually produces the object. May or may not have further
rights. (this is close to what we think of as "author"

PROVIDER: makes an object accessible electronically (this could be
generalized to handle other modes, but let's stop at that for now).
Any one object may be provided by any number of providers. The
provider may be the originator, the originator's institution,
a library, publisher, archive, etc. A URLs specifies an object with
respect to a particular provider.

AUTHORITY: assigns URNs, as in JAK's message. An authority may
do so without ever seeing the document, providing it, knowing what
is in it, etc. Typically the use of a particular authority will go
along with
the presence of lookup catalogs and the like.

DECLARER: an entity which requests an object (under a particular
point of view) be given a name by an authority.
the declarer is the arbiter of what constitutes
"same content" for that object.

In simple cases these collapse into one. If I put a file on a server, send
you a locator, and a unique name that I made up, I am doing all of the
above. In a fully disaggregated case, I write something (ORIGINATOR), my
boss (DECLARER) decides to make my version available to the world by
sending off for an official identifier from some trade-association
(AUTHORITY), and by shipping the bits to FindItHere.com (PROVIDER), which
sends back a URL and a monthly invoice for making the bits available on the
net.

>(6) An IdAuthority assigns a unique string (call it an IdDesignator for now)
> to each object that is "intellectually different" in its view. The
> IdAuthority name plus the IdDesignator make up a URN.

When it is an "authority for hire" the view will be that of the declarer,
not the authority. The authority is just there to serve.

>(7) The IdDesignator string is "officially" opaque, in the sense that it
> has no shared semantics across all URNs, *even* if it is widely known
> how to crack some of them (e.g., a Library of Congress catalog number).

It isn't clear whether "unofficial nonopacity" should be encouraged. It
allows for neat shortcuts in the short run, but leaves lots of problems
later, when it is desirable to unify different designator schemes, and
previous software built on the "unofficial" stuff stops working.

>Now for the baby step premises (building on the foundation premises).
>
>(8) One URN may designate a set of "intellectually equivalent" objects
> (to the IdAuthority) or may designate just one object.

Yes, but. This assumes that "one object" is well defined. I think it is
better to always think in terms of equivalence classes. I'm not even sure
whether you intended the equivalence class for "one object" to mean bitwise
equivalence, or would allow for encodings, different file system
conventions, and the like. If we give up the idea that there is an
inherent notion of "real" sameness, things get easier.

>(9) If a URN designates a set of objects, individual objects in the set are
> called "variants" with respect to that URN.

Reword: When distinct subsets can be identified within the class
designated by a URN, the equivalence classes associated with those subsets
are "variants" of the one designated by the URN.
>
>(10)Variants may be derived from all sorts of transformations of one object
> into another, either by machine or by hand, but we don't deal with that
> Very Hard Problem.

Yes.

>(11)Instead we only need to know how to tell one variant from another in
> the set of variants (intellectually equivalent objects). A string
> (which I'll call a "variant specifier") is used to identify a variant.

This is a syntactic device, which can be used in various ways. I interpret
it as "The base URN will uniquely determine the equivalence class, and the
string is anything that can be read by any program which will allow it to
determine a subclass." The good news is that you can do anything with it.
The bad news, is that everyone will want to do something different.

>(12)A variant specifer is *not* carried below the URN level. So a variant
> specifier never meets a URL. Instead a lookup of the <URN plus variant
> specifier> produces a URL for that particular variant. This Very
> Common Mistake wastes vast quantities of discussion time.

Yes, although I would also allow URN + specifier to produce a new URN (for
the relevant subset), as either Tim or John said in the following (It gets
hard to keep track of the nesting level):

>> **NO**. This is more like contraphrasing me. I don't believe in
>> variant specifiers, so I wouln't have said that. I can imagine
>> saying "Give me a URN for a postscript version of document <urn>".
>> I can imagine saying "Give my a URN for a 600x300 pixel 5-colour GIF
>> of document <urn>". I can imagine there being an infinite
>> number of possible variants on a document, so I wouldn't ask for a list.
>> But what I would get would be another, more specific, URN.
>>

Note that in cases like this, you will start with a fairly "heavyweight"
URN (e.g., the one registerd with the Library of Congress for the document"
and get in turn a lightweight one (the authority is the same server that
will be the provider, and the URN will be "virtual" in the sense that it is
generated from the description acoording to some convention used by that
authority, and not saved.

>(13)To paraphrase Tim, given a URN you should be able to ask some server
> to return you one or more variant specifiers, one for each variant.
> You select the variant you want, and pass it off together with the
> URN when you need to lookup the corresponding URL.

Sounds right, with the providso that "one for each variant" means "one for
each of the variants in the set you ask for" rather than "one for each
possible variant". I might ask for all the different formats of a
document, but what it would send back is a variant URN for each of the ones
it is willing to provide, not each of the ones that might in principle be
generated.

>(14)For various reasons (e.g., optimization), you may get a variant
> specifier at the same time as the URN and packaged together with it.

Yes, or you may get a URN for the subclass designated by the variant, or
both. That is, an embedded reference may include any number of URNs for
different variants (properly identified), so that the client that runs
across it can simply go get the right one. Or it may have a single URN for
the larger class, and the client needs to do dynamic lookup of the kind
described in the previous section to find out what variants (e.g., formats)
are available. The language for embedding pointers should be flexible
enough to allow this.

>(15)If you like the variant specifier, you may use it without needing
> to looking up other variants.
>
>(16)If you don't like the variant specifier, you may want to go ahead
> and look up the other variants to see what else is available.
>
>(17)The variant specifier is thus a thing that optionally accompanies
> a URN, at the same level in our UR* scheme of things.
>
>(18)We need a new member of the UR* family for variants. How about URV?

This is movement towards making the "string" have more semantics. This is
the hard part that deals with all of the different ways in which things
vary. The URV has the function of "Given a URN plus the description
contained in the URV, produce either a new URN or a URL that points to an
object meeting the description." This requires a full-fledged descriptive
language to do it right. A small vocabulary of "variant types" or "link
types" may be a useful taxonomy for starting, but isn't the real thing. It
is more like "attributes," which can be used as a general representation
for relationships.

--t