Re: LANG: VRML 1.x Binary Format Proposal

Peter J. McCann ([email protected])
Thu, 1 Jun 1995 11:34:44 -0500 (BLT)


Justin writes:
>
> Brian points out:
> >What no one has talked about on the list yet is whether a binary file
> >format could provide a benefit other than compression - parsing speed.
> >Compiled C programs are very rarely smaller than their source, yet no one
> >disputes than an object code interpreter is faster than a source code
> >interpreter :) Does anyone think there might be a gain that could be
> >made in this area from a binary format?

Yes, I think this is very important, and I was hoping someone would
bring it up. String hashing and lookup can take a large amount of
time. Our system, which converts from an internal tokenized representation
to Inventor, used to perform the conversion early, which meant the Inventor
file had to be re-parsed after a transmission step. Now we delay the
conversion to the last possible stage and transmit only the tokenized
representation of graphic objects. The savings have been substantial.

> Some, definitely. But probably not a lot. One of the advantages of
> Inventor ASCII format (and one of the highest reasons I was arguing
> strongly for it) is that it is *very* easy to parse. It's almost
> completely unambiguous syntactically (maybe completely unambiguous),
> and not even terribly ambiguous lexically. It's sufficiently

I'd like to point out that this could also support an argument in *favor*
of tokenization. Easy-to-parse grammatically means that lexical tokenization
takes a larger percentage of the overall parsing time, which means a greater
savings could be achieved by pre-tokenizing the input.

Of course, there are still some very good arguments in favor of *not*
tokenizing. These include the difficulty of maintaining a centralized
database of keywords (our system does it right now with a global enum{},
which is probably not feasible on the net), and also how to deal with
unknown tokens. It is much easier for a human to look for a missing
keyword name than a keyword token.

-- 
Pete McCann                                          [email protected]
Department of Computer Science           http://swarm.wustl.edu/~mccap/
Washington University in St. Louis