Re: filetype extensions

Daniel W. Connolly ([email protected])
Mon, 09 May 1994 16:11:20 -0500


In message <[email protected]>, Rob Earhart writes:
>
> I wrote a www server from scratch at the same time that I began
>supporting Mosaic as a contributed application at Andrew, for the
>experience of writing the server, gaining a full knowledge of the
>protocol, and because I was bored :-)
>

Isn't it interesting that each time a new implementor comes along,
s/he has to trip over all the hacks, kludges, and general differences
between the specs and the existing implementations and practices...
Perhaps one day the specs will be caught up...

> Embracing the http/1.0 concept of multiple content types for the same
>document, the server takes the Accept: list from the client, turns it
>into a list of extensions, and attempts to access each path.extension in
>turn.

Good idea, but I'd suggest a slight twist: I don't think it's wise
to assume that there is a well-defined mapping:
ext : ContentType -> String
so I wouldn't encourage the approach of working from content types
to extensions. The technique I like (found it in WWWLibrary) is to
keep a table of:
ContentType, extension String, confidence Float

Then, start with the given path; find all (type, ext, conf) such that path.ext
exists in the filesystem, and maximize conf where type is in the
client's Accept: list. (Actually, you're supposed to take into account
cost-per-byte to transmit and translation quality in the metric function...
details are in the HTTP spec...)

> The problem: I've run into substantial resistance to the idea from
>the user community. They want to add hyperlinks to "foo.gif", not
>"foo".

So they don't get multi-format magic. Their loss. See below about symlinks...

> I'm getting two arguments for the use of extensions in the URL's:
>People want to be able to use 'file:' and relative links to view their
>files without going through the server (and maybe get the server to
>translate pages on the fly when requested from AFS sites into 'file:'
>links, reducing HTTP server load),

ACK! Thou Shalt Not Promote The Use of The Unclean 'file:' Scheme!
Surely it will lead you down The Path to Confusion and Dispair!
The Holy Access Types Are, As Set Forth in RFC 1521 (MIME):
local-file: (obsoletes file:)
afs:
anon-ftp:
ftp:
mail-server: (obsoletes mailto:)
plus the Other Happily Well-Defined URL Schemes:
http:
gopher:
and the Somewhat Unstable But Forthcoming:
wais:
news:
and the Hoplessly Wierd But Useful:
telnet:
tn3270:

if you want to write an HREF that means "get file X the same way
you got this one," you can just leave the access type implicit; e.g.:

HREF="/pub/stuff/file.txt"

> and they want to maintain
>compatibility with the other AFS www server on campus (run by our School
>of Computer Science), which handles extensions the "normal" way.

Is it too much to
ask them to use HREF="foo" and make a link/symlink from foo to foo.gif
for local-file access and to support other servers?

> I've also had a request to try to resolve document types only when the
>client doesn't send an extension on the request; the problems here are
>that the extension in the URL is still significant (which seems a bit
>backwords), and that eventually I'd like to implement a mechanism to
>allow the user to use the extensionless file to specify versions of
>documents in different languages, character sets, and encodings.

Hmmm... yess... "the extension... is still significant." If a client
specifically wants the gif version of foo, I'd rather see it send:

GET foo HTTP/1.0
Accept: image/gif

than

GET foo.gif HTTP/1.0
Accept: */*

The latter form will probably work for now... but what about the future
when there may be caching proxy servers with built-in graphics conversion?
Such a proxy may have image/tiff, and it may be able to generate image/gif
faster than going round-trip to the original server. But extensions are
an out-of-band technique: a proxy server can't "peek" at the extensions
the way it can look at the Accept: header.

We must be very careful about time-to-live, conversion quality, etc.
to be sure that the proxy servers don't compromise the protocol.

There's some stuff in the HTTP protocol spec about a URI: and Vary:
header in the server's response to address this. Basically, a server
is supposed to tell the client how long it can cache the document,
and whether there are variations on the document.

Oh... we also need a way to express

GET foo HTTP/1.0
Accept: image/gif

in an HREF (or in HTML somewhere)... because sometimes you want to
refer to a specific version/format of a document. I've suggested
<A HREF="foo" Content-Type="image/gif">...</a>
and
<A HREF="foo;content-type=image/gif">...</a>
in the past without much luck. The second form, which puts the data
in the URL, has some chance of being deployed... stay tuned...

> So... what do people think? Pragmatism or Purism? Should I bow down
>to the pressure to stop having it try to add extensions?

Enough Purism to save yourself from having to re-engineer your solution
down the road, mixed in the enough Pragamatism to make it useful to
your user community today.

Dan