Re: LANG: VRML 1.x Binary Format Proposal
Tony Godshall ([email protected])
Tue, 30 May 1995 11:36:55 +0000
[[email protected] (Mark Waks)]
> Mark presents a proposal for a Binary format; it's sort of halfway
> between requirements and a design, I think. I general, it looks
> pretty sensible, but (inevitably) I have a few kibitzes and
> questions...
[clip clip]
> >1) Tokenization of keywords
> >
> >The VRML 1.0 keyword set is full of compound words, like PerspectiveCamera,
> >Translation, or IndexedFaceSet. These are human-readable keywords, but the
> >computer certainly doesn't care about them. We have well under 255 keywords
> >in VRML 1.x, so a single-byte token stream for keywords would save as much
> >as 90% of total file size.
>
> Check; this makes plenty of sense. However, a suggestion: let's make
> sure we leave room for an expanded keyword set. We're well under 255
> keywords now, but I'm not confident that we will remain so forever.
> I'd suggest a one-or-two byte format, with the high-order bit indicating
> whether it's a one or two byte token. Put the 128 most common keywords
> into 1-byte tokens, and we've got nigh-infinite expansion space for more
> later. We get most of the benefit, with *far* less risk later.
[snip]
> (I will admit that there's a real temptation to do something like
> Huffman coding here -- a few keywords, like "}" and "Separator {",
> are going to be so common it almost might be worth keeping them down
> to four bits. Probably more hassle than it's worth, though...)
>
> And we should bear expansion in mind. Extension nodes will (by
> definition) not be part of the standard, so they won't have
> predefined tokens. We should chew on how to handle them. We
> could run them in full ASCII, or have some kind of prelude that
> defines (for the duration of the file) the token for that
> keyword, or maybe something else. There are numerous ways we
> could deal, but we will need to choose one.
I would like to see a dynamic LZW compression envelope - this would
exploit reference locality (shorter codes for keywords that are
common in this space) much more effectively than a tokenizing or
huffman scheme and a simple solution because the problem has already
been solved. A standard file format like GZIP could even be used and
code could be incorporated into browsers to decompress on the fly.