Re: MISC: Inlined Sound Support

Gavin Bell ([email protected])
Sat, 15 Apr 1995 10:13:50 -0700


On Apr 15, 12:30am, Brygg Ullmer wrote:
> Subject: Re: MISC: Inlined Sound Support
> > Are all sounds looping or are they triggered? VRML should do
> >both for ambient sounds and one shot sounds.
>
> Well, some sounds sources will certainly be live -- whether via the MBONE,
> from individual sources like Maven or Internet Phone, etc. -- ne? So I
> suspect some nodes will be bindings to static sound files (whether one-shot
> invokable or presence-triggered, quasi-periodic, or whatnot), while other
> nodes will serve more as portals for funneling live sound links into the
> metaverse.

Live hooks are just different transport mechanisms-- e.g. mbone://....
instead of http://....

> As others mention, I'd like to add an orientational vector to Gavin's short
> sound definition. It could be an optional field defaulting to (0,0,0) with
> the implication of omnidirectional, and clients could certainly ignore it
> (in the same sense that many clients lacking spatialized sound support may
> punt on anything but a simple distance->volume adjustment when processing
> the position field). It would be a quite useful alternative for, say,
> directed speech short of a physical-modelling alternative. (Though I am
> very interested in the realization of physical relations under VRML re:
> attractive and repulsive forces, etc., especially with relations like
> gravity which span large spatial scales.) Seems reasonable to expect from
> PointSound...

I'd be inclined to add a DirectedSound node instead of overloading
PointSound. Implementations could, of course, implement PointSound as a
special case of DirectedSound, if convenient.

> I like Kevin's notion of support for labeling sound as ambient, and/or
> background vs. foreground sound. In some respects this is similar to
> Netscape's support for background image-textures... as I walk through a
> factory or mall, it's interesting and useful to have a background
> audiospace of rumbling or muzak, though under bandwidth constraints I sure
> hope the foreground sounds of my neighbor's voice are attended to first (or
> at the higher bitrate, if we have tiered lossy sound delivery). Actually,
> walking through the city-space is an even more interesting example in my
> mind, though I'm not as comfortable where ambience ends and point sources
> begin... e.g., do I link in a quasi-random car-honk and rumble generator,
> or are these sorts of sounds more directly attributable to distinct events
> in the surrounding space (e.g., perhaps one wants the honks to be attached
> to definite causal sources). Probably varies case-by-case...
> Also, I think it's probably worth reasoning through what one expects from
> radiusOfEffect. Several distinct uses could be attributable to
> radiusOfEffect, among these spatial culling, expressions of audio "scale"
> or magnitude, and utility for privacy in conversations. Because Gavin
> mentions...
>
> >... allowing a potentially infinite number sound sources to be handled.
>
> I suspect the spatial culling case was foremost in mind at the time. While
> I completely agree that provisions for unboundedness in our infospaces is a
> Good Thing, I'd also imagine that browsers might as a general case try to
> diffuse the volume of sounds as a function of distance, which is derivable
> from PointSound's position relative to our own (at least for live or
> implicitly-invoked sounds, as differentiated from click-to-hear-this
> explicitly-requested sounds). In this case, it's not immediately clear to
> me how great a win radiusOfEffect in fact is (unless it's somehow used to
> express whether sound should fall off linearly or with the square of
> distance, the constant of this falloff, etc.).

Yes, culling sounds was foremost in my mind. I imagined that browsers would
automatically adjust the volume of the sound as the viewer moved
closer/farther from the sound's location. I don't know enough about human
perception of sounds to specify what the right function for that fall-off is,
so I thought leaving it up to the browser implementors to experiment with
would be best.

<Some good stuff deleted>
> ...but maybe what I'm really trying to say is that we might express
> volume in decibels or some other measure which, while perhaps less
> instinctive when we're used to adjusting min..max volume knobs on our
> computers or stereos, may be more meaningful in a spatial environment
> (especially one of significant spatial expanse, as Gavin suggests).

Yes! Lets use real-world units wherever we can; volume should be specified
in decibels (again, with the disclaimer that I'm a 3D graphics expert, not a
sound expert), with an appropriate default that corresponds to the volume of
normal conversation.

Really really smart browsers (of which there will be exactly zero for quite a
while...) could then define a new kind of Material node that specified a
physically realistic sound interaction model, and do all the wacky
sound-tracing, stereo, echos, doppler effects, ... to use up all those spare
CPU cycles we all have on our Crays.

> To quickly mention the privacy case, another notion of what radiusOfEffect
> might be is a privacy-centric notion useful for specifying my intended
> audience. If I'm lecturing to an audience in Monterey (alas, wasn't there
> at all), perhaps the "radius of effect" is 100 meters; alternatively, if
> I'm talking with a friend in the hallway, the radius might be 3m, or
> perhaps 1m (or explicit-audience, if supported) if it's confidential, or
> 10cm if I'm whispering. It's also interesting to consider the interactions
> between volume, radiusOfEffect, and perhaps a client-side
> radiusOfPerception here... if I'm in the Black Sun bar where a band's on
> the rampage while I want to talk to my friend, I probably don't want to
> adjust my radiusOfEffect much beyond 3m, but may want to distinguish
> between adjusting my externally-perceived volume up towards 80 or 90 dB vs.
> tweaking my client-side radiusOfPerception down to, say, 5m to actively

Even radiusOfEffect is going beyond "real-world" physics. When talking, all
I can control is volume. radiusOfEffect in the real-world is determined by
my surroundings (echo-cancellation chamber vs. concert hall vs. rock concert
with lots of other noise). I think it will be a useful hack, though, until
we have the compute power to accurately simulate sound in virtual worlds;
ambient sounds have low volume, large radiusOfEffect, conversations during a
rock concert will have high volume, small radiusOfEffect, etc. I'm not
convinced adding another non-physical parameter (radiusOfPerception) will add
anything.