I expect this would require installing Verity's Topic at the various
information providers' sites. Not practical, I expect.
> At a minimum, the spider should be forced to
>delay between consecutive requests (about 15-30 seconds, depending on the
>network throughput and speed of the server).
When we at HaL built our CD ROM of abstracts of 10,000 web documents
(with links to the documents themselves, with our OLIAS browser on the
CD-ROM.. ask [email protected] for details), we implemented a "spider" that
visited the various sites in an order such that no site was visited
more than once per minute.
It was only a few hours of head-scratching and testing to get it to
work. Vince Taluskie <[email protected]> did the
implementation. I'm sure he wouldn't mind helping you out a little.
We already paid him to do it once -- I don't think he'd make you
pay him again for the same info ;-)
Vince consulted the published guidelines[1], I believe. You will not
please the net.folk if you blatantly disregard them.
Dan
[1] "Guidelines for Robot Writers"
Martijn Koster
http://web.nexor.co.uk/mak/doc/robots/guidelines.html