> One suggestion was to use the security idea of realms as named document
> groups. I'm wondering if this will effectively map, or perhaps there
> should be a separate scheme.
>
> Immediately popping to mind is the "robots.txt" file, whose purpose in life
> is to instruct indexing agents and their ilk.
I'm glad you like the "named formatted file" approach :-), but I would
strongly recommend not to overload the /robots.txt file.
The two requirements (indexing meta information vs limiting robots)
are quite distinct, and IMHO better handled separately. If in future
other meta indexing systems get deployed you can just switch to those
for indexing purposes, but the robots.txt will still have a role in
controlling robots.
Overloading the robots.txt file would also cause confusion among
system admins (few enough people know what it is in the first place
:). It would also require more effort on the part of robot writers,
so some may not bother, and then we're worse off.
> We're intending to use robots.txt as the configuration file even
> for Web indexers.
well, there is no reason a robot couldn't:
- check the /robots.txt file to find out what's restricted
- retrieve and parse some explicit indexing info such as
a /site.idx IAFA template
- use the URL's mention in the explicit info as starting points
for limited-depth gathering
- fall back on existing traversal strategies.
-- Martijn
__________
Internet: [email protected]
X-400: C=GB; A= ; P=Nexor; O=Nexor; S=koster; I=M
X-500: c=GB@o=NEXOR Ltd@cn=Martijn Koster
WWW: http://web.nexor.co.uk/mak/mak.html