Re: The future of meta-indices/libraries?

Peter Deutsch ([email protected])
Wed, 16 Mar 1994 02:59:18 --100


[ You wrote: ]
. . .
> I also don't think that the manual creation of entries in a document
> is prohibitive. After all, these maintainers are putting a lot of
> effort in publishing material such as HTML pages, gateways and the
> like. Why should they shy away from a small text file? There are very
> real advantages to them: their information is not just "in the Web",
> but is findable in the web. They don't need to maintain their own
> "lists of interesting places".

This does create a potential maintenance problem in the
long run, since in my experience the enthusiasm for
creating or maintaining meta-info is inversely
proportional to the time since the document was created.

As one data point, there is a document I wrote several
years ago called "What is archie?". It is everywhere.
It's been indexed in WAIS, menuized in gopher, included in
bibliographies, etc. I wrote this when still at McGill and
it mentioned "archie.mcgill.ca" as "the" archie server.
Even though this machine hasn't been with us for several
years now we _still_ get mail from people mentioning this
doc and asking about the status of archie.mcgill.ca.

I'd hope we end up with a system where it's a little
easier to correct or recall such documents and the info
about such documents. WWW promises this, but I suspect if
we require people to do such things by hand it will suffer
in consequence. (I know this analogy is a bit weak, since
my problem above is with the content, not the meta-info,
but please cut me some slack... :-)

> Actually it turns out it that the fact that most (all?) people write
> the index files by hand is more a feature than a problem. Because
> people don't want to manually create massive index files they ony
> describe the most important services, which results in a database
> with little irrelevant material.

But one person's irrelevant is another person's useful
data. We've had people mine the current archie collection
just so they could study such things as the proportion of
file types and other information about the data. We
certainly didn't forsee such applications when we started
so I'd rather not hard-wire in too many assumption about
what people might want or need at this point. We
definitely want to stay flexible.

> > > I'm not sure it is going to be sensible
> > > to index all titles on a server and search those, even though it sounds
> > > attractive. You do need to retain the context of the titles.

In theory I agree, although in practice we may find that
titles alone (machine generated and thus accurate) are
more useful than full templates which are hand-generated
and thus inaccurate. Filenames alone have proved of use
in archie even though in theory descriptive information
would be more useful.

. . .
> > The bottom line choice is between an index of 50 servers with
> > carefully hand-crafted templates and an index of 5000 servers with
> > machine generated templates which are less well constructed but up to
> > date. I would certainly opt for the later.
>
> Well, maybe. If these 5000 servers all index only the titles of all
> their 1000 documents each the resulting database will not be that
> useful. Try a Veronica search for Perl: it comes up with > 4000
> matches, how am I supposed to find the servers dedicated to Perl?

Can we aim for both? I see wanting both a cheap,
relatively useful set of machine-generated and accurate
titles plus more descriptive info where available. That's
why we added the WAIS indexing capability and worked on
the IAFA template stuff. All the components are now in
place to do more detailed descriptive info once we figure
out where it is. Meanwhile, and until enough sites agree
to do this, filenames/menu items/titles are not a bad
first approximation and a lot easier to automate.

- peterd

-- 
------------------------------------------------------------------------------
  My proposal for funding the Internet is pretty simple. I vote we institute
  an "Information Superhighway" tax, the proceeds of which will be used to
  fund network infrastructure. The way this would work is simple - every time
  someone uses the words "Information Superhighway" or any of its derivatives
  we strike them with a sharp object and make them pay a $10 fee (of course,
  the sharp object is not actually needed to make this scheme work, it's just
  in there because it seems an appropriate thing to do...)
------------------------------------------------------------------------------