Re:MISC: Inlined Sound Support

[email protected]
Sat, 15 Apr 1995 20:23:22 -0700


Hi!

My name is David Jedynak and I'm an Electrical Eng. student at UCLA.
I'm new to the list and have been reading all this info as fast as I can to
catch up.

I had some thoughts on the sound issue based off my knowledge of film sound.
Dolby Labs use the principle of "Psycho-acoustics" in its matrixing of 4
analog tracks down to two and decoding back up to 4 for presentation. (This
is called Dolby Stereo in the theatres and Dolby Pro-Logic Surround at home.)
Since the original 4 tracks of sound were discrete when coded, but are now
no longer discrete when decoded, psycho-acoustic masking is employed to fool
the mind into not hearing the imperfections. For instance, since the 4
tracks (Left, Center, Right, Mono Surround) are combined into
Left Total (Lt) and Right Total (Rt) as follows:
Lt = L + .5(C) + .5(S + 90degrees)
Rt = R + .5(C) + .5(S - 90degrees)
and decoded (roughly) as follows:
L = Lt - (Center and Surround Components)
R = Rt - (Center and Surround Components)
C = Lt + Rt
S = Lt - Rt
some cross-talk between channels occurs. The most noticeable is the cross-
talk between center and surround. Thus, the surrounds have a delay setting
which is not to "make the room sound bigger, etc." but to delay the dialogue
bleeding from the center into the surrounds long enough that it arrives at
at the ear roughly at the same time that the echo of the dialogue from the
center channel hits the ear. This prevents the perception that the person
talking on the screen is not talking from the side and rear walls of the
theatre as well. This is just a one example of the psycho-acoustic masking
techniques used in this field.

Speaking of Dolby Labs, they have a 5.1 channel (L,C,R,Left Surround, Right
Surround, and limited bandwidth Subwoofer - the .1) discrete digital
encoding scheme called AC-3. This has been used in theatres since June
of 1992 (Batman Returns) and is now being placed on laser discs (True Lies,
Clear and Present Danger) and has also been chosen as the sound for Grand
Alliance HDTV by the FCC. The bit rate is 384 kbits/s total for all 5.1
channels. I know this is awful high for realtime transmission (48k/sec)
from the server to the client, but for use as inlined sound, it would be
really cool.

I know this opens up a whole extra can of worms, but since the idea
of convergence of many forms of communication into one platform
is a possibility, shouldn't we think about the possibility that at one
point we might be able to drop a VRML browser CD into our HDTV set-top
game/database/movie player (like CD-i) and navigate though virtual worlds
sitting in the middle of our living room? Chances are, if we do, we won't
be satisfied with 8-bit mono 11 KHz sounds coming from "Cyberspace" right
after we finish watching "Star Trek: Dental Hygine" or playing Doom 10 in
glorious 5.1 channel sound.

When we program our own virtual worlds, we won't be using text to create our
images, we'll use a 3d modeling "paintbrush" sort of thing. Why not assign
basic sounds to our nodes and then fly around our world and tweak them as
we see fit. Each point in space could be described by a second of AC-3
encoded audio, or a 96th of a second (AC-3 is in chunks this size between
the sprocket holes on film which zip by at 96/sec). Depending on how long
one stays there, the ambiant noises will continue to loop and triggered
sounds will insert themselves into the the proper channel depending on their
direction.

Anyway, these are a bunch of crazy ideas, but we might be able to find a
solution to these problems in industries (like film) that have been dealing
with sound for many years in "virtual" environments. We just need an AI
program (or even an expert program) called "Dolby Stereo Consultant" or
"Sound Editor/Mixer" to help us when we create our home worlds. :)

Thanks for reading!
David Jedynak ([email protected])