Yes, I think this is very important, and I was hoping someone would
bring it up. String hashing and lookup can take a large amount of
time. Our system, which converts from an internal tokenized representation
to Inventor, used to perform the conversion early, which meant the Inventor
file had to be re-parsed after a transmission step. Now we delay the
conversion to the last possible stage and transmit only the tokenized
representation of graphic objects. The savings have been substantial.
> Some, definitely. But probably not a lot. One of the advantages of
> Inventor ASCII format (and one of the highest reasons I was arguing
> strongly for it) is that it is *very* easy to parse. It's almost
> completely unambiguous syntactically (maybe completely unambiguous),
> and not even terribly ambiguous lexically. It's sufficiently
I'd like to point out that this could also support an argument in *favor*
of tokenization. Easy-to-parse grammatically means that lexical tokenization
takes a larger percentage of the overall parsing time, which means a greater
savings could be achieved by pre-tokenizing the input.
Of course, there are still some very good arguments in favor of *not*
tokenizing. These include the difficulty of maintaining a centralized
database of keywords (our system does it right now with a global enum{},
which is probably not feasible on the net), and also how to deal with
unknown tokens. It is much easier for a human to look for a missing
keyword name than a keyword token.
-- Pete McCann [email protected] Department of Computer Science http://swarm.wustl.edu/~mccap/ Washington University in St. Louis