BINARY: Tokenization/Point-Clouds/Numbers

Mark Pesce ([email protected])
Wed, 31 May 95 11:55:54 -0700


Justin sez -

>Over on the theory side -- Mark pointed out that Gavin's .iv model,
>with an enormously high percentage of points, may not be the same
>structure as VRML in the long run. I'll also point out that while
>Gavin's model argues that tokenization isn't so important, it *also*
>argues that Mark's fixed-point binary may well be *very*
>important. Even with restricted precision, an ASCII integer is going
>to take a lot more bytes than a 2-byte binary one. I'd be interested
>in seeing how they compare after zipping; I'd expect the ASCII to
>compress more, but possibly not enough to compensate for the enormous
>initial savings from the binary format.

I would agree with this assessment. Even though GZIP will compress the
keywords, it's unlikely that it will compress an 11-byte keyword or
enumerated type as effectively as a single byte token. Do we need to have
"MatrixTransform" or "PER_VERTEX_INDEXED" explicitly in a VRML file?

Further, stripping precision from the nubmers and then using a fixed-point
representation will be immediately more efficient and hopefully more
compressible. Evidentially, it also speeds rendering time - this is a
lesson they've learned in the IMF VRML project.

>The main point, though, is: numbers, numbers, numbers. Until several
>different tests are run, using some kind of mockup of the binary
>format plus zipping and compared with the ASCII plus zipping, with
>models of different structures (including Mark's structure-rich one
>and Gavin's point cloud), we simply don't have enough information
>to make an intelligent decision. It's not especially difficult to
>create this mockup; I have to believe that a good programmer could
>do it with the QvLib in a couple of hours. No, I'm not volunteering;
>I simply don't have either the time or the passion right now. But it's
>the right way to make this argument a good deal more convincing...

I agree; I would do it myself, but I too have many other things on my plate.

The hypothesis remains that VRML point clouds benefit most from numeric
compression methods (fixed-point numbers, etc.), while VRML arrangements
using URN techniques benefit most from keyword tokenization. If we accept
as valid that every reasonable compression technique be used to optimize
data transfer time, then both of these techniques should be employed.

Mark