3D sound is most commonly defined as the sounds of everyday human experience. Sound arrives at the ears from every direction and distance, which contribute to the

three-dimensional In geometry, a three-dimensional space (3D space, 3-space or, rarely, tri-dimensional space) is a mathematical space in which three values (''coordinates'') are required to determine the position (geometry), position of a point (geometry), poi ...

aural image of what humans hear. Scientists and engineers who work with 3D sound work to accurately synthesize the complexity of real-world sounds.

Purpose

Due to the presence of 3D sound in daily life and the widespread use of 3D sound localization, the application of 3D sound synthesis rose in popularity in areas such as games, home theatres, and human aid systems. The purpose of 3D sound synthesis is to interpret the information gathered from 3D sound, in a way that enables the data to be studied and applied.

Applications

An application of 3D sound synthesis is the sense of presence in a

virtual environment A virtual environment is a networked application that allows a user to interact with both the computing environment and the work of other users. Email, chat, and web-based document sharing applications are all examples of virtual environments. Sim ...

, by producing more realistic environments and sensations in games,

teleconferencing A teleconference or telecon is a live exchange of information among several people remote from one another but linked by a communications system. Terms such as audio conferencing, telephone conferencing, and phone conferencing are also sometime ...

systems, and tele-ensemble systems. 3D sound can also be used to help those with sensory impairments, such as the

visually impaired Visual or vision impairment (VI or VIP) is the partial or total inability of visual perception. In the absence of treatment such as corrective eyewear, assistive devices, and medical treatment, visual impairment may cause the individual difficul ...

, and act as a substitute for other sensory feedback. The 3D sound may include the location of a source in three-dimensional space, as well as the three-dimensional sound

radiation In physics, radiation is the emission or transmission of energy in the form of waves or particles through space or a material medium. This includes: * ''electromagnetic radiation'' consisting of photons, such as radio waves, microwaves, infr ...

characteristics of a sound source.

Problem statement and basics

The three main problems in 3D sound synthesis are front-to-back reversals, intracranially heard sounds, and HRTF measurements. Front-to-back reversals are sounds that are heard directly in front of a subject when it is located at the back, and vice versa. This problem can be lessened by accurate inclusion of the subject's head movement and pinna response. When these two are missed during the HRTF calculation, the reverse problem will occur. Another solution is the early echo response, which exaggerates the differences of sounds from different directions and strengthens the pinna effects to reduce the front-to-back reversal rates. Intracranially-heard sounds are external sounds that seem to be heard inside a person's head. This can be resolved by adding

reverberation In acoustics, reverberation (commonly shortened to reverb) is a persistence of sound after it is produced. It is often created when a sound is reflection (physics), reflected on surfaces, causing multiple reflections that build up and then de ...

cues. HRTF measurements are the sounds, noises, and linearity problems that occur. By using several primary auditory cues with a subject that is skilled in localization, an effective HRTF can be generated for most cases.

Methods

The three main methods used in the 3D sound synthesis are the

head-related transfer function A head-related transfer function (HRTF) is a response that characterizes how an ear receives a sound from a point in space. As sound strikes the listener, the size and shape of the head, ears, ear canal, density of the head, size and shape of n ...

, sound rendering, and synthesizing 3D sound with speaker location.

Head-related transfer function

Head-related transfer function ( HRTF) is a

linear function In mathematics, the term linear function refers to two distinct but related notions: * In calculus and related areas, a linear function is a function whose graph is a straight line, that is, a polynomial function of degree zero or one. For di ...

based on the sound source position and considers other information humans use to localize the sounds, such as the

interaural time difference The interaural time difference (or ITD) when concerning humans or animals, is the difference in arrival time of a sound between two ears. It is important in the Sound localization, localization of sounds, as it provides a cue to the direction or ...

head shadow A head shadow (or acoustic shadow) is a region of reduced amplitude of a sound because it is obstructed by the head. It is an example of diffraction. Sound may have to travel through and around the head in order to reach an ear. The obstruction ...

, pinna response, shoulder echo, head motion, early echo response,

, and vision. The system attempts to model the human acoustic system by using an array of

microphone A microphone, colloquially called a mic (), or mike, is a transducer that converts sound into an electrical signal. Microphones are used in many applications such as telephones, hearing aids, public address systems for concert halls and publi ...

s to record sounds in human ears, which allows for more accurate synthesis of 3D sounds. The HRTF is obtained by comparing these recordings to the original sounds. Then, the HRTF is used to develop pairs of Finite Impulse Response (

FIR Firs are evergreen coniferous trees belonging to the genus ''Abies'' () in the family Pinaceae. There are approximately 48–65 extant species, found on mountains throughout much of North and Central America, Eurasia, and North Africa. The genu ...

) filters for specific sound positions with each sound having two filters for left and right. In order to place a sound at a certain position in 3D space, the set of FIR filters that correspond to the position is applied to the incoming sound, yielding a spatial sound. The computations involved in convolving the sound signal from a particular point in space is typically large, therefore lots of work is generally needed to reduce the complexity. One such work is based on combining

Principal Component Analysis Principal component analysis (PCA) is a linear dimensionality reduction technique with applications in exploratory data analysis, visualization and data preprocessing. The data is linearly transformed onto a new coordinate system such that th ...

(PCA) and Balanced Model Truncation (BMT) together. PCA is a widely used method in

data mining Data mining is the process of extracting and finding patterns in massive data sets involving methods at the intersection of machine learning, statistics, and database systems. Data mining is an interdisciplinary subfield of computer science and ...

and

data reduction Data reduction is the transformation of numerical or alphabetical digital information derived empirically or experimentally into a corrected, ordered, and simplified form. The purpose of data reduction can be two-fold: reduce the number of data rec ...

, which was used in 3D sound synthesis prior to the BMT to reduce redundancy. The BMT is applied to lower the computation complexity.

Sound rendering

The method of sound rendering involves creating a sound world by attaching a characteristic sound to each object in the scene to synthesize it as a 3D sound. The sound sources can be obtained either by sampling or artificial methods. There are two distinct passes in the method. The first pass computes the propagation paths from each object to the microphone and the result is collected for the

geometric Geometry (; ) is a branch of mathematics concerned with properties of space such as the distance, shape, size, and relative position of figures. Geometry is, along with arithmetic, one of the oldest branches of mathematics. A mathematician w ...

transformations of the sound source. The transformation from the first step is controlled by both delay and

attenuation In physics, attenuation (in some contexts, extinction) is the gradual loss of flux intensity through a Transmission medium, medium. For instance, dark glasses attenuate sunlight, lead attenuates X-rays, and water and air attenuate both light and ...

. The second pass creates the final soundtrack of the sound objects after being created, modulated and summed. The rendering method, a simpler method than HRTF generation, uses the similarity between light and

sound waves In physics, sound is a vibration that propagates as an acoustic wave through a transmission medium such as a gas, liquid or solid. In human physiology and psychology, sound is the ''reception'' of such waves and their ''perception'' by the br ...

because sounds in space propagate in all directions. The sound waves reflect and refract just like light. The final sound heard is the integral of multi-path transmitted signals. There are four steps to the processing procedure. The first step involves generating the characteristic sound in each object. The second step is when the sound is created and attached to the moving objects. The third step is to calculate the

convolution In mathematics (in particular, functional analysis), convolution is a operation (mathematics), mathematical operation on two function (mathematics), functions f and g that produces a third function f*g, as the integral of the product of the two ...

s, which are related to the effect of reverberation. Sound rendering approximates this by using the

wavelength In physics and mathematics, wavelength or spatial period of a wave or periodic function is the distance over which the wave's shape repeats. In other words, it is the distance between consecutive corresponding points of the same ''phase (waves ...

of sound similar to the object so it diffuses in its reflections, providing a smoothing effect of the sound. The last step is applying the calculated convolutions to the sound sources in step two. These steps allow a simplified soundtracking algorithm to be used without making much difference.

Synthesizing 3D sound with speaker location

This method involves strategically placing eight speakers to simulate spatial sound, instead of attaching sampled sound to objects. The first step consists of capturing the sound by using a cubic microphone array in the original sound field. The sound is then captured using the cubic loudspeaker array in the reproduced sound field. The listener, who is in the loudspeaker array, will feel that the sound is moving above their head when the sound is moving above the microphone array. The

Wave field synthesis Wave field synthesis (WFS) is a spatial audio rendering technique, characterized by creation of virtual acoustic environments. It produces ''artificial'' wavefronts synthesized by a large number of individually driven loudspeakers from elemen ...

is a spatial audio rendering technique that synthesizes wavefronts by using

Huygens–Fresnel principle The Huygens–Fresnel principle (named after Netherlands, Dutch physicist Christiaan Huygens and France, French physicist Augustin-Jean Fresnel) states that every point on a wavefront is itself the source of spherical wavelets, and the secondary w ...

. First, the original sound is recorded by microphone arrays and then loudspeaker arrays are used to reproduce the sound in the listening area. The arrays are placed along the boundaries of their own area where the microphones and the loudspeakers are placed as well. This technique allows multiple listeners to move in the listening area and still hear the same sound from all directions, which the binaural and crosstalk cancellation techniques cannot achieve. Generally, the sound reproduction systems using wave field synthesis place the loudspeakers in a line or around the listener in a 2D space.

References

{{reflist Sound synthesis types