Re: Suggestion: URL string-search syntax

Rob Raisch, The Internet Company ([email protected])
Mon, 30 May 1994 12:17:28 -0700 (PDT)


Stephen,

Indeed, the URN work answers only some of the important questions. The
higher purpose here, I believe, is to answer this question:

"What information do we need to make an "appropriate" retrieval decision?"

I've been thinking about this for some time and have a short list of
questions.

<SOAPBOX ON>

Before I present them, I'd like to go on record once again and state my
belief that this is the single most important issue facing the global
Internet short of the exhaustion of the address space.

As an on-line publishing enabler, we have seen what happens when an
Internet repository achieves even the least level of notoriety.

In some cases, this has had a positive effect and has been used to limit
socially irresponsible behavior. See "Canter and Siegal."

But consider the current barriers which traditional publishers face when
providing useful and popular information to their customers.

I have had to turn away business -- or have lost potential business on the
basis of cost -- because we could not support the infrastructure required
without unfairly requiring the publisher to shoulder the entire cost of
delivery. (And the business I lost did not go elsewhere. Currently no
one is capable of supporting it.)

I say "unfairly" because we already have support for cost effective
information distribution in the real world. There are trucking companies
and bookstores and fulfillment houses which exist and compete with each
other to keep costs down. The publisher can leverage these existing
services.

On-line, we have no such extant infrastructure and without it -- or a
technical infrastructure which supports the on-line analog -- publishing
on the global Internet will remain what it is now, an unsupportable and
ineffective hack.

It's a "chicken and egg" problem. Without an established technical
infrastructure, the publisher cannot participate in anything other than a
cursory fashion, and without the publisher -- and its content -- there is
little incentive to provide this infrastructure.

This is not simply a commercial issue. Interesting content is interesting
irrespective of its pricing model.

And I fear that, should we look, we would find a full 40% of resource
object retrieval across the Internet to be ill considered and wasteful.

<SOAPBOX OFF>

Ok, What do we need to know to make an "appropriate" retrieval decision?

First, let's assume the following:

- a URN is -- AT THE VERY LEAST -- a reference to a collection of
zero or more URLs.

- a URL uniquely identifies a single instance of a resource object

- a resource object is some thing which can be retrieved from a
repository

- a repository is a collection of resource objects which supports
one or more methods of external retrieval

Our goal:

- an appropriate retrieval decision must provide an optimal solution
in terms of the provider's resources, the consumer's use, and the
use of the network infrastructure between provider and consumer.

The questions --

(Consumer Use Questions)

- if we retrieve this URL, can we use what we get?
- ... do we have enough local cache to hold a copy?
- ... is it in a form we can use (render/manipulate)?
- ... ... is it in a language we can understand?
- ... ... if it is non-text, can we use it without conversion?
- ... ... ... can we convert it to a form we can use?
- ... ... is it compressed?
- ... ... ... can we uncompress it?

- ... is there a fee to retrieve and use it?
- ... ... can we afford it?
- ... ... can we pay for it?

(Repository Use Questions)

- is the repository active?
- ... do we have permission to use it?
- ... does it support a retrieval service we can use?
- ... is it free enough from use to acceptably fulfill the request?

(Network Use Questions)

- which repository is the closest?
(Where "close" is measured in terms of
network distance (hops),
cost of bandwidth,
timeliness of response)

Who can answer these questions --

(Easiest to Hardest)

The consumer or her agent is most able to answer the Consumer Use
Questions. The renderer "knows" what it can render, what it can
convert, what it can interpret.

The repository is the only place to answer the Repository Use
Questions since it is the only comprehensive source of the answer.

(It is possible to query the "main" repositories from some central
service to monitor its load and accessability. This implies a far
larger intrenched infrastructure than we currently support.)

Now, the hardest...

I believe it is only truely appropriate to answer the Network Use
Questions at the consumer's site. To be able to effectively
retrieve, I or my agent needs to know a hell of a lot about the
intervening network infrastructure than you might expect.

Here's and example: Assume I live in Los Angeles and know that O'Reilly
and Associates has something really nifty to retrieve. (Easy assumption,
that. ;)

Now, I find that ORA has two servers, one in Cambridge MA and one in
Sebastapool CA. Which do I choose?

Well, (and this is obviously a stacked deck) Sebastapool would be the
worst choice since ORA's Sebastapool office is connected to the network
through Cambridge.

The user should never have to know any of this.

As someone who provides mechanisms for publishers to provide content to
consumers, I am EXTREMELY interested in exploring this problem and
helping to provide a workable solution.

-- </rr> Rob Raisch, The Internet Company