Re: The future of meta-indices/libraries?

Martijn Koster ([email protected])
Wed, 16 Mar 1994 00:07:48 --100


John Franks writes:

> There are many good things about ALIWEB. However, my impression from
> reading the documents referenced above is that the templates must be
> human generated. I am firmly convinced that any scheme which is not
> almost completely automated is doomed fail. Many maintainers will
> simply not create the templates and the ones who do will not keep them
> up to date. I have no doubt that a human writing an ALIWEB form will
> do a better job than software, but the unfortunate fact is that most
> maintainers will simply not make the effort (often they cannot).

There is no reason index files couldn't be produced by automatic
means. As far as ALIWEB is concerned it is a text file, who/what
creates it is immaterial.

I also don't think that the manual creation of entries in a document
is prohibitive. After all, these maintainers are putting a lot of
effort in publishing material such as HTML pages, gateways and the
like. Why should they shy away from a small text file? There are very
real advantages to them: their information is not just "in the Web",
but is findable in the web. They don't need to maintain their own
"lists of interesting places".

Actually it turns out it that the fact that most (all?) people write
the index files by hand is more a feature than a problem. Because
people don't want to manually create massive index files they ony
describe the most important services, which results in a database
with little irrelevant material.

> > I'm not sure it is going to be sensible
> > to index all titles on a server and search those, even though it sounds
> > attractive. You do need to retain the context of the titles.
>
> I think this should be the default. Of course, the maintainer should
> be given as much flexibility as possible in eliminating titles from
> the index.

I personally would prefer to create a sensible index file by hand
myself, but there is nothing in ALIWEB that dictates that this should
be the only/best way, and if you have other methods of making sure
only sensible documents are indexed that is great.

> Of course retaining the context is desirable, but the time for doing
> this is when the document is created, not when it is indexed.

What I meant was that sometimes it makes more sense to only index
one main document for a specific service instead of all associated
pages. For example, I wouldn't want an entire PC Software directory
indexed if my service provides a nice specialised search interface
for this. As said this responsibility is with the maintainer.

> The bottom line choice is between an index of 50 servers with
> carefully hand-crafted templates and an index of 5000 servers with
> machine generated templates which are less well constructed but up to
> date. I would certainly opt for the later.

Well, maybe. If these 5000 servers all index only the titles of all
their 1000 documents each the resulting database will not be that
useful. Try a Veronica search for Perl: it comes up with > 4000
matches, how am I supposed to find the servers dedicated to Perl?

> I would also do everything possible to encourage maintainers to
> massage their templates to improve them.

Quite. We both agree that what is indexed should be up to the
maintainer. How the index is created is also up to the maintainer.
I hope the maintainers will take their responsibility seriously, and
that these indices will be in a standard format so that they can be
integrated automatically.

-- Martijn
__________
Internet: [email protected]
X-400: C=GB; A= ; P=Nexor; O=Nexor; S=koster; I=M
X-500: c=GB@o=NEXOR Ltd@cn=Martijn Koster
WWW: http://web.nexor.co.uk/mak/mak.html