Re: MISC: Inlined Sound Support

Brygg Ullmer ([email protected])
Sat, 15 Apr 1995 00:30:22 -0800


At 8:35 PM 4/14/95, Kevin Goldsmith wrote:
>> The sound source is transformed by the current transformation and can be
>> heard if the viewer is withing radiusOfEffect of it (radiusOfEffect is
>> important to allow sound sources to be view-volume culled, allowing a
>> potentially infinite number sound sources to be handled).
>>
> Are all sounds looping or are they triggered? VRML should do
>both for ambient sounds and one shot sounds.

Well, some sounds sources will certainly be live -- whether via the MBONE,
from individual sources like Maven or Internet Phone, etc. -- ne? So I
suspect some nodes will be bindings to static sound files (whether one-shot
invokable or presence-triggered, quasi-periodic, or whatnot), while other
nodes will serve more as portals for funneling live sound links into the
metaverse.

As others mention, I'd like to add an orientational vector to Gavin's short
sound definition. It could be an optional field defaulting to (0,0,0) with
the implication of omnidirectional, and clients could certainly ignore it
(in the same sense that many clients lacking spatialized sound support may
punt on anything but a simple distance->volume adjustment when processing
the position field). It would be a quite useful alternative for, say,
directed speech short of a physical-modelling alternative. (Though I am
very interested in the realization of physical relations under VRML re:
attractive and repulsive forces, etc., especially with relations like
gravity which span large spatial scales.) Seems reasonable to expect from
PointSound...

I like Kevin's notion of support for labeling sound as ambient, and/or
background vs. foreground sound. In some respects this is similar to
Netscape's support for background image-textures... as I walk through a
factory or mall, it's interesting and useful to have a background
audiospace of rumbling or muzak, though under bandwidth constraints I sure
hope the foreground sounds of my neighbor's voice are attended to first (or
at the higher bitrate, if we have tiered lossy sound delivery). Actually,
walking through the city-space is an even more interesting example in my
mind, though I'm not as comfortable where ambience ends and point sources
begin... e.g., do I link in a quasi-random car-honk and rumble generator,
or are these sorts of sounds more directly attributable to distinct events
in the surrounding space (e.g., perhaps one wants the honks to be attached
to definite causal sources). Probably varies case-by-case...

Also, I think it's probably worth reasoning through what one expects from
radiusOfEffect. Several distinct uses could be attributable to
radiusOfEffect, among these spatial culling, expressions of audio "scale"
or magnitude, and utility for privacy in conversations. Because Gavin
mentions...

>... allowing a potentially infinite number sound sources to be handled.

I suspect the spatial culling case was foremost in mind at the time. While
I completely agree that provisions for unboundedness in our infospaces is a
Good Thing, I'd also imagine that browsers might as a general case try to
diffuse the volume of sounds as a function of distance, which is derivable
from PointSound's position relative to our own (at least for live or
implicitly-invoked sounds, as differentiated from click-to-hear-this
explicitly-requested sounds). In this case, it's not immediately clear to
me how great a win radiusOfEffect in fact is (unless it's somehow used to
express whether sound should fall off linearly or with the square of
distance, the constant of this falloff, etc.).

On the other hand, I do think radiusOfEffect is more relevant in the scale
and privacy cases. To consider "scale" vs. "volume," I have a different
expectation of "loud" when the source is a cannon, a siren, a stereo, or a
cricket. If I'm facing the engine of a 747, I expect it'll be "loud" even
when it's at low-blast ("soft?") and a few hundred feet away from me;
similarly, I'd expect a church-bell to reach an entirely different spatial
scale than a boombox or human voice. Here, the notion of "radiusOfEffect"
might be useful to express the intended range of a sound source -- say, 100
feet to a quarter-mile falloff for the 747 and church bell, while 3-100
feet for the stereo or human speaker. Perhaps this is what Gaven had in
mind... but maybe what I'm really trying to say is that we might express
volume in decibels or some other measure which, while perhaps less
instinctive when we're used to adjusting min..max volume knobs on our
computers or stereos, may be more meaningful in a spatial environment
(especially one of significant spatial expanse, as Gavin suggests).

To quickly mention the privacy case, another notion of what radiusOfEffect
might be is a privacy-centric notion useful for specifying my intended
audience. If I'm lecturing to an audience in Monterey (alas, wasn't there
at all), perhaps the "radius of effect" is 100 meters; alternatively, if
I'm talking with a friend in the hallway, the radius might be 3m, or
perhaps 1m (or explicit-audience, if supported) if it's confidential, or
10cm if I'm whispering. It's also interesting to consider the interactions
between volume, radiusOfEffect, and perhaps a client-side
radiusOfPerception here... if I'm in the Black Sun bar where a band's on
the rampage while I want to talk to my friend, I probably don't want to
adjust my radiusOfEffect much beyond 3m, but may want to distinguish
between adjusting my externally-perceived volume up towards 80 or 90 dB vs.
tweaking my client-side radiusOfPerception down to, say, 5m to actively
filter the more distant musical source...

Just my two cents...

--Brygg