Re: Resource discovery, replication (WWW Announcements archives?)

Daniel W. Connolly ([email protected])
Tue, 03 May 1994 23:50:31 -0500


In message <[email protected]>, Markus
Stumpf writes:
>
>There is an archive of comp.infosystems.announce at
>ftp://ftp.informatik.tu-muenchen.de/pub/comp/usenet/comp.infosystems.announce/
>Look at :INDEX for a list of "filename -> subject"

Great! Thanks. Hmmm... I wonder how many hops from hal.com to .de?

>|>But beyond that, it allows a distributed solution to the resource
>|>discovery problem: Any site could build an index of available internet
>|>resources just by archiving news.resources, indexing the contents, and
>|>expiring old articles.
>
>Hmmm ... this is exactly the same idea thats behind ALIWEB and IAFA,
>isn't it? So why have a new one?

I dunno... Maybe I just need to play with/read about those systems
more. For some reason, they strike me as too centralized: there's the
per-site data, and there's the list of all sites. What's in-between?
How is the global list maintained? Suppose I want an index of
resources related to biochemistry: can I build one? (with my strategy,
I can filter the articles however I want and build custom indexes)

>|>This could also be used as a way of distributing information about
>|>replicated data. A mirror archive site could post a summary of its
>|>contents, with (a) references to its own contents(A), and (b)
>|>references to the original materials that it mirrors(B), and (c) a
>|>machine-readable indication that A is a copy of B. Then any client
>|>looking for any part of B that also has access to c can try looking at
>|>A if it's closer or more available.
>
>I think we should really go away from "mirror" as known by now.
>The solution used by caching HTTPds right now is IMHO the more
>efficent and more transparent.

But it only works if you know where to point your client to find a
proxy server! I'm trying to work out a way where clients can discover
"proxy caches" automatically, and proxies can find proxies, etc.

> And you always have the information
>where you got it from and a kind of expiry mechanism.
>I really hate all those sites "mirroring" e.g. Xmosaic documentation
>or the like. It is NEVER accurate and uptodate und you NEVER know
>whether this is a selfmade or a mirrored.
>Using caches solves the problem of e.g. slow links and is really
>still kind of a pointer to the original.

Yes: we need pointers to the original (or enough information in the
original reference to determine whether the cached data is the right
data, like dates and/or MD5 checksums) and audit trails.

>I am currently working on a caching-only server, that would also
>allow for some hierachy in caching. This would completely solve the
>problem without having the client to know about net topology and
>replications.

Yes: hierarchy in caching is the key feature. I just thought I had an
idea for how to realize it. I didn't expect I had the only idea ;-)
I have to code mine up and try it out to really understand the issues.

Dan