> The spider will hit your server fairly hard.  We have a real-time indexing
> engine and a T-1...
This is just plain irresponsible.  You are not only affecting their server,
you will also effect every network connection between your site and theirs.
People pay good money for that bandwidth -- you should not attempt to hog it.
Your spider should be running on their local net -- running at your site
provides no added value.  At a minimum, the spider should be forced to
delay between consecutive requests (about 15-30 seconds, depending on the
network throughput and speed of the server).
......Roy Fielding   ICS Grad Student, University of California, Irvine  USA
                                     <[email protected]>
                     <URL:http://www.ics.uci.edu/dir/grad/Software/fielding>