binary file access via Mosaic

Marc Andreessen ([email protected])
Sat, 13 Mar 93 01:14:01 -0800


A frequent comment of Mosaic users is that binary files of a type that
Mosaic doesn't recognize (i.e., with a filename extension that Mosaic
doesn't recognize) aren't just saved to disk rather than uselessly
displayed as text. The following document (online as
http://hoohoo.ncsa.uiuc.edu:80/mosaic-docs/file-typing-issues.html)
details the issues involved and explains the solution developed for
the uncoming 0.10 release; I'd appreciate any comments or feedback.

Cheers,
Marc

--
Marc Andreessen
Software Development Group
National Center for Supercomputing Applications
[email protected]

NCSA Mosaic File Typing Issues ******************************

Motivation ==========

Quite independent from their sources, files have types. A given file can be plaintext, HTML, GIF, JPEG, AIFF, MPEG, PostScript, you name it. (MIME provides a way to type data elements within a file, but the file itself still has a type: MIME.)

In an ideal world, the type of each file would be well-defined metadata always accessible to an information browsing client prior to the act of accessing the file. This is true on the Macintosh, but not on most other systems, and certainly not on the Unix-dominated Internet. Bummer.

Therefore, in an imperfect world, Mosaic uses the common (but not mandated and not standardized) convention of examining a file's extension to attempt to determine its type. PostScript files are assumed to be suffixed '.ps', GIF files '.gif', etc. In this way, Mosaic can correctly determine file types for the majority of the data files available on the Internet.

When a file type cannot be thus derived, Mosaic makes a guess. Files coming over a HTTP server are assumed to be HTML; files coming from any other source (except Gopher; see below) are assumed to be plaintext.

This, of course, causes a problem. What happens when a file is assumed to be viewable text, but it really isn't? Well, Mosaic attempts to display it as text. Boom. Serious badness.

Solution ========

In Mosaic version 0.10 (and later), there is a solution. The user is allowed to select, on the fly, whether untyped files (i.e., files with no recognizable suffix) are to be assumed to be viewable (text) or not viewable (data). If the former, such files will be displayed; if the latter, such files will be dumped to a local disk and the user will be notified appropriately.

When files of unrecognized types are automatically dumped to disk as binary data in this manner, Mosaic is said to be in "binary transfer mode".

A toggle button in the Options menu allows you to turn binary transfer mode on and off, on the fly, at the per-window level. It is implicitly assumed that binary transfer mode will generally be off, since many common documents (for example, the results of a WAIS query, with a URL something like '.../my-database?query') need to be assumed to be text for the usual Mosaic interfaces to function. However, when you're in a situation, e.g. while browsing an anonymous FTP site, where you know you want to pull over 'file.xyz' and you know that file 'file.xyz' is really binary data and shouldn't be displayed as text, then you turn on binary transfer mode, access the file, grab the file from the local disk (where it is dumped automatically by Mosaic), and switch back out of binary transfer mode.

Note that binary transfer mode does not impact other Mosaic functionality; notably:

o Automatic and transparent uncompression of compressed (.Z) and gzip'd (.z) files still occurs with untyped files. o Binary (data) files with recognized extensions (e.g., '.gif' and '.jpeg') are still passed off to the appropriate external viewers. (Note that a new feature in 0.10 is that if the name of an data type's external viewer is defined via the X resource mechanism to be "dump", then data files in that format will be dumped to disk as if they were untyped files being retrieved in binary transfer mode.)

Notes =====

Note that it is practically impossible to heuristically determine whether an untyped file is viewable or not viewable as text for the simple reason that 8-bit text is now commonplace.

Gopher has its own typing system (and not a very good one, at that). As such, the rules don't apply. Mosaic tries to do the right thing with various types of Gopher files, but using a different scheme, based on the Gopher-defined file types.

[email protected]