Re: What if we offered a local spider?

Martijn Koster ([email protected])
Fri, 14 Oct 1994 08:14:09 +0100


> The robots discussion that I prompted with my indexing offer gave me an idea.
>
> If we built a free spider that operated only via the file system, which
> would build an index mapped to URL-space,

I suggested this to at least one robot author a while ago in the
context of URL checking (Hi Roy :-), but there are a number of
problems: CGI-script generated pages are excluded, access
authorisation is ignored, and you need to parse server config files to
look at URL mappings.

> then offered to serve those indexes from here, would people use it?

Well, by just making the file available on a well-known place anybody
can use locally-generated map. Ehr /ls-R.txt ?

> In other words, as was suggested here, you'd maintain your index locally,
> then ship it to Verity to be served by our Web server.

Or rather, you pull it whenever needed.

> Thoughts?

I think the problems identified above are rather non-trivial; and that
a trivial solution may give a significant number of bogus URL's. Even
with a local HTTP robot you have access-permission issues, but at
least you know that correct URL's get out.

-- Martijn
__________
Internet: [email protected]
X-400: C=GB; A= ; P=Nexor; O=Nexor; S=koster; I=M
X-500: c=GB@o=NEXOR Ltd@cn=Martijn Koster
WWW: http://web.nexor.co.uk/mak/mak.html