Re: The future of meta-indices/libraries?

Kevin Altis ([email protected])
Tue, 15 Mar 1994 20:50:01 --100


At 7:54 PM 3/15/94 +0000, Peter Deutsch wrote:
>Actually, we do plan to add this capability to the archie
>server system in the very near future. For WWW there's the
>obvious problem of what to index, since there is no real
>useful meta-info in the URL itself (how many copies of
>"default.html" are there, anyways? :-) so at this point
>we'd be happy to be told what to collect and serve.

We already have some tools that will build a list of document titles and
associated URLS so you get something that looks like (<tab> is the ASCII
tab character):
What's New With NCSA
Mosaic<tab>http://www.ncsa.uiuc.edu/SDG/Software/Mosaic/Docs/whats-new.html

We can search on a list like this (just the titles or titles and URLs) on
our local server which has turned out to be quite simple and fast. Simply
extending this list to include <A HREF> items makes it extremely easy to
find items on your *local* server or local links to documents on other
servers as long as the link name isn't something like "here." For local
servers this can be extended to include <A NAME> tags. Document authors can
include <A NAME> tags to go along with H1-H6 headers to increase hits.

Following existing practices with FTP, Gopher, etc. these lists can be made
available at the root of a Web server with a special file name. It is not
necessary then for a Web robot to do anything besides pick up the one file.
It would probably be useful to separate local links to local files and
local titles from local links to other servers, but a filter program could
do that along with breaking out ftp, telnet, gopher URLs. Meta-indexes
would have to merge these lists to reduce duplication of course.

I would like to be able to see other meta-information included in these
lists such as document author <LINK REV="made"
HREF="mailto:[email protected]">, language, whether the document
requires authentication, etc. This additional meta-information would
require the file to include the same information returned by the HTTP
server itself.

Much of the above text may be obvious, but I haven't seen it said on this
list, so comment away.

ka