The Matrix

January 2006 - April 2007

Original name: Immersive and Spatial Voice Audio in Networked Virtual Environments
Software: C/C++

Description

Visions of the future such as The Matrix, The Street (from the book Snow Crash), or Cyberspace (from the book Neuromancer) all support audio and visual communication in a way which works naturally with reality. In the virtual world, when someone walks around a corner, you no longer see him nor hear him. The promise of virtual reality has been within the public consciousness for decades, however, the technology for achieving an immersive experience has only been available fairly recently.

In the current work on multiplayer virtual reality, the research has focused largely on the visual aspect combined with text. Audio is often neglected and when present typically ignores the structure of the virtual world. In this project we have created a new system which integrates the audio, visual and 3D structure of the virtual world. Specifically, our novel contribution is the creation of a system which models the effect of the 3D world structure upon the audio and visual aspects in a natural and intuitive manner: players in the massive multiplayer world can now talk with each other as in real life.

Client and server architecture

Besides providing clients in the virtual world with a realistic audio experience, we wanted to achieve the following:

- Allow a large number or clients (>100) to connect
- Require only moderate bandwidth (~128kbps)
- Work from behind routers and firewalls
- Maximize portability (any virtual world can be used)

The central server architecture - where all clients connect with a single server - was our optimal choice, fullfilling all design goals. More advanced architectures would definitely result in more clients being able to connect, but would have a downside that the system will be much more complex and less easy to set up and get running. To illustrate our audio system, we integrated it in the Quake 3 game. This game has the advantage of being open source and well-known for its low bandwidth usage. Note that any other game could have been used instead, provided access to the source code to enable it to use our audio framework.

Sound attenuation

We have modeled how the distance and angle between sounds and listeners affects the audio perception and additionally devised a novel algorithm to handle structural audio attenuation. The structural audio problem occurs when the 3D structure interacts with the audio signal. Examples include simply going around a corner or walking into a room and closing the door. In both cases, the 3D structure affects the audio – typically lowering the amplitude but potentially also causing audio reflections and refractions. For structural audio, we can not simply cut off the audio when a wall is in between a sound and a listener, rather we must have a natural drop off due to the interference with the 3D world.

Our novel structural audio algorithm employs Cauchy’s probability distribution, see Figure 1, to weight a grid that is placed with its center at the listener’s location, pointing at the origin of the sound. The weights near the center of the grid have higher values than those along the edges.

Figure 1. Cauchy probability distribution

The ‘audibility’ of each point on the grid its determined by tracing the visibility between itself and the sound, see Figure 2. The attenuation factor is formed by adding only the weights of the grid points that are ‘audible’. This technique results in smooth sound transitions when moving around objects and corners while talking to other players.

Cauchy-based grid, direct line-of-sight Cauchy-based grid, line-of-sight partially obstructed
Figure 2. Tracing visibility using the Cauchy-based grid: with direct line-of-sight (left) and a
partially obstructed line-of-sight (right)

Our audio algorithm utilizes the Cauchy distribution because it (a) has been shown in other areas to be more realistic to real world distributions and (b) in the future will allow us to adaptively adjust its parameters, e.g. simulate different environments or modifying sound perception through the use of in-game items.

With our audio framework, players can have conversations with many people at the same time, because the audio correctly appears to originate from the visual location of the players that are talking. Moreover, players are able to localize any sound source and direct visual attention to where the sound is coming from. Note that our method does not take reflections, refractions and interference with other sound waves into account.

Publications

For more information and experimental results, take a look at my Master's Thesis and the VRIC2007 paper in the publications section.