Re: LANG: VRML 1.x Binary Format Proposal

Mark Pesce ([email protected])
Tue, 30 May 95 17:51:42 -0700


Gavin -

>Inventor's binary format has neither the '{' nor the '}'. Inventor's binary
>parser always knows exactly how many bytes need to be read.

This implies that the undocumented OI binary format has an "extent" field
(presumably following the keyword token) which defines the length of the
field. This is another approach, although it will use the same, if not more
space (token + 2 byte extent vs. token + end-of-node).

>Extensions are probably rare enough that they should just be represented in
>ASCII. Now is probably a good time to start thinking about
>internationalization, though-- in the Inventor group, we're contemplating
>changing from a pure 7-bit ASCII format to a UTF8-based 8-bit format. And
>we'll probably allow node type names, DEF/USE names, string fields--
>EVERYTHING-- to be UTF8.

This seems reasonable to implement in VRML 1.x; what will change as a result?

>I think that adding some more efficient primitives (e.g. ElevationGrid,
>IndexedTriangleStripSet and QuadMesh), when combined with already-existing
>compression, will give us more bang for our design buck than coming up with
>tokenization schemes or fixed-point number representations.

My argument is that *every* reasonable method be used; every byte saved is a
faster download, a faster response time, and a more pleasant VRML browsing
experience.

>And I don't think any of these will be good enough. We need good tools to
>automatically create low levels of detail and smart browsers that know not to
>pull across huge files (= the higher levels of detail) unless the user says
>that they are willing to wait.

Absolutely true. It does not obsolete any of my points, however. All
transfers should be optomized at all times; that's good networking.

>I was curious to see how much tokenizing could possibly improve on of my
>favorite models. To get an idea of how many characters were dedicated to
>Separator {, fieldName [ ... etc, I did:
>ivcat /usr/share/data/models/buildings/Barcelona.iv | grep '[a-zA-Z]' | wc -c
> ... and got 346 characters, out of a total of 542,302 characters. Or .06
>percent. Well-structured VRML files should be similar, with lots and lots of
>numbers and very little format overhead.

That is true for VRML-as-OI, which is most files in the VRML universe right
now, certainly most files any of us are familiar with. However, in 12
months, things will probably look a lot like this:

#VRML 1.x ASCII # imperfect example of highly tokenizable VRML

Separator {
DEF OBJECT_ONE {
LOD { blah, blah, blah
WWWInline {
low LOD object # from CD-ROM cache
}
WWWInline {
medium LOD object # from CD-ROM cache
}
WWWInline {
high LOD object # from CD-ROM cache
}
}
}

DEF OBJECT_TWO {
LOD { blah, blah, blah
WWWInline {
low LOD object # from URN/CD-ROM cache
}
WWWInline {
medium LOD object # from CD-ROM cache
}
WWWInline {
high LOD object # from CD-ROM cache
}
}
}

DEF OBJECT_THREE {
LOD { blah, blah, blah
WWWInline {
low LOD object # from CD-ROM cache
}
WWWInline {
medium LOD object # from CD-ROM cache
}
WWWInline {
high LOD object # from CD-ROM cache
}
}
}

# and so forth, until the entire scene is defined...
}

There aren't many numbers in this file at all, but there are lots of keywords.

This file could be at least 10x more compact with a token/compression scheme
- even leaving fixed-point math out of the equation - and would still be
more compact than compression alone, probably significantly so. The
difference between 60 and 90 seconds is highly important, as far as the user
is concerned.

As we move into extensive caching and URN schemes, the amount of geometry
that gets shipped around will decrease dramatically and the number of
arrangements of that geometry will increase. Arrangements are primarily
textual; their "percentage text" rating with the script you wrote will be
rather higher than the figure you've given, for a file which is primarily a
point cloud.

In VRML things will be the same far more often then they are different.
Most people will design spaces with "canned" objects and textures.

>And just limiting your ASCII numbers to a couple of digits after the decimal
>point and then compressing should make them just as small as fixed-point
>numbers...

If this is true, then I have no objection to limiting precision in binary
VRML as an alternative to fixed-point representation.

Mark