Requirements:
0. Support MIME ala mhonarc.
1. Base the published URLs on the global message-ids, not on local
sequence numbers. So in stead of:
http://www.foo.com/archive/mlist/00345.html
I want to see:
http://www.foo.com/archive/[email protected]
This allows folks to write down the URL of the message as soon
as they post it -- they don't have to wait for it to show
up in the archive.
Hhmmm... I wonder if deployed web clients handle relative query
urls correctly, e.g.:
References: <a href="?message-id=0923408.xxx.yyy">09823408.xxx.yyy</a>
2. Support format negotiation. Make the original message/rfc822 data
available as well as the enhanced-with-links html format -- at the
same address. This _should_ allow clients to treat the message as a
message, i.e. reply to it, etc. by specifying:
Accept: message/rfc822
3. Keep the index pages to a reasonable size. Don't list 40000
messages by default. The cover page should show the last 50 or so
messages, plus a query form where folks can select articles...
4. Allow relational queries: by date, author, subject, message-id,
keywords, or any combination. Essentially, treat the archive as a
relational database table with fields message-id, from, date, subject,
keywords, and body.
In fact, consider this table to consist of all the mail messages
and news articles ever posted (past, present, and future). Any
given archive has partial knowledge of the table. Let's call
this global service the message-archive service. So rather than:
http://www.foo.com/archive/[email protected]
I want to see:
http://www.foo.com/[email protected];[email protected]
Goals:
5. Generate HTML on the fly, not in batch. Cache the most recent pages
of course (in memory?), but don't waste all that disk space. (support
if-modified-since in the on-the-fly generator, by the way)
Update the index in real-time, as messages arrive, not in batch.
6. Allow batch query results. Offer to return the raw message/rfc822
data (optionally compressed) for, e.g. "all messages from july 7 to
dec 1 with fred in the from field".
7. Export a harvest gatherer interface, so that collections of mail
archives can be combined into harvest broker search services where
folks can do similar relational and full-text queries.
8. Allow annotations (using PICS ratings???) for "yeah, that
was a really good post!" or "hey: if you liked that, you
should take a look at ..."
9. Make it a long-running process exporting an ILU interface, rather
than a fork-per-invocation CGI script. Provide a CGI-to-ILU hack for
interoperability with pre-ILU web servers.
Major brownie points to anybody who builds something that supports at
least 1 thru 4 and makes it available to the rest of us. I'd really
like to use it for all the mailing lists around here.
I saw some activity in news:comp.lang.python along this direction. Any
takers around here?
Dan