[some really good philosophy on the importance of conserving bandwidth removed]
Well, my gut feeling is that gzip alone does the job,
and that implementing/standardizing yet-another-compression-scheme
is 1) not worth the effort and 2) delaying more important issues.
So, I hacked some code to test this. The shell script attached below
will find all .iv files in your current directory (it assumes they are
binary already), turn them into ascii, gzip both forms, record the
sizes, clean up (watch out, I do some rm's) then tabulate the results.
For an approximation to Mark Pesce's compression scheme,
I changed each node/field name to one byte, each value (number) to
two bytes, stripped off the {} and the [] and the white space.
Table is shown for one group of .iv files distributed with Inventor
(from /usr/share/data/models/*.iv):
(a) Inventor ascii
(b) Inventor binary,
(gz) gzipped (a)
(bgz) gzipped (b)
(mp) A crude implementation of Mark Pesce's suggestions
Filename a (bytes) a/b a/gz a/bgz a/mp
----------------------------------------------------------------------
Banana.iv 36788 1.9660 2.9641 3.3526 4.0276
Pear.iv 113459 1.8468 3.6660 4.2631 3.7223
SgiLogo.iv 102099 2.1633 7.1936 8.7971 4.3654
X29.iv 79730 1.9049 4.2329 4.8013 3.8549
bird.iv 4177 1.0085 5.4744 4.8911 3.5762
chair.iv 283542 2.2669 6.4909 7.7888 4.7757
diamond.iv 17947 2.5306 6.1294 6.8815 5.3446
engine.iv 374400 2.4906 4.5341 4.9211 5.0361
heart.iv 17057 2.3366 6.0208 6.6943 5.1438
moon.iv 6791 2.1064 5.9885 6.4066 4.8438
shamrock.iv 37770 2.3719 5.8823 6.7471 4.9515
shell.iv 123721 2.8896 3.1561 3.6635 5.8175
slotMachine.iv 52795 1.8385 4.3715 4.9102 4.3542
spongetri4.iv 112957 2.0941 8.4854 10.7845 4.2252
star.iv 11643 2.2829 6.0420 6.5484 5.1655
torus.iv 50964 2.4711 7.0237 8.3493 5.0315
wheel.iv 65187 1.6543 4.9053 5.4623 3.3476
----------------------------------------------------------------------
Average Values 87707 2.1308 5.4448 6.1919 4.5637
^ best
^ nearly as good
Conclusions:
============
1) gzip is good, and good enough.
2) Defining a binary format is of marginal use at best.
(gz/a = 82% compression while bgz/a = 84% compression, YMMV)
3) No need to invent our own compression scheme.
Let's just conclude that gzip is good enough and save some bandwith
on this thread :-)
Tim.
(Real embarrassing shell script follows for all who are interested.
Sorry, only works on an SGI since it assumes you have ivcat to read
binary Inventor.
Try it yourself and let me know if I'm wrong.)
#!/bin/csh
#
#
# First change binary .iv to ascii using ivcat
# Use ascii to kludge up a Pesce-like compression
#
foreach i (*.iv)
ivcat $i > $i:r.ascii
cat $i:r.ascii | \
awk '{for (i = 1; i <= NF; i++) { \
if ($i ~ /[a-zA-Z]/) printf "A" \
else if ($i ~ /[{}\[\]]/) printf "" \
else if ($i ~ /[0-9]/) printf "12" \
else printf $i \
} \
}' > $i:r.mp
end
#
# Create list of file names
#
ls -al *.iv | awk '{print $9}' > names
#
# Create lists of file sizes
#
ls -al *.iv > b.size
ls -al *.ascii > a.size
ls -al *.mp > mp.size
#
# gzip everything and create list of file sizes
#
foreach i (*.iv)
gzip $i
gzip $i:r.ascii
end
#
# Create more lists of file sizes
#
ls -al *.iv.gz > bgz.size
ls -al *.ascii.gz > agz.size
#
# Put things back how we found them
#
gunzip *.gz
rm *.ascii *.mp
#
# Get size from size files
#
foreach i (*.size)
awk '{print $5}' < $i > $i:r.sizec
end
#
# Paste them all together so we can awk it
#
paste names a.sizec b.sizec agz.sizec bgz.sizec mp.sizec > summary.out
rm names *.size *.sizec
#
# Create the table
#
cat summary.out | \
awk 'BEGIN { \
printf("%-20.20s %8s %8s %8s %8s %8s\n","Filename", "a", "a/b",
"a
/gz","a/bgz", "a/mp")\
printf("---------------------------------------------------------------
-------\n")\
} \
\
{ \
printf("%-20.20s %8d %8.4f %8.4f %8.4f %8.4f\n", $1, $2, $2/$3,
$2
/$4, $2/$5, $2/$6)\
SUMa += $2 \
SUMab += $2/$3 \
SUMagz += $2/$4 \
SUMagzb += $2/$5 \
SUMamp += $2/$6 \
} \
\
END { \
printf("---------------------------------------------------------------
-------\n")\
printf("Average Values %8d %8.4f %8.4f %8.4f %8.4f\n", SUMa
/NR, SUMab / NR, SUMagz / NR, SUMagzb / NR, SUMamp / NR)\
}'
rm summary.out