3D sound localization refers to an
acoustic technology that is used to locate the source of a sound in a
three-dimensional space. The source location is usually determined by the direction of the incoming sound waves (horizontal and vertical angles) and the distance between the source and sensors. It involves the structure arrangement design of the
sensors and
signal processing techniques.
Most
mammal
Mammals () are a group of vertebrate animals constituting the class Mammalia (), characterized by the presence of mammary glands which in females produce milk for feeding (nursing) their young, a neocortex (a region of the brain), fur or ...
s (including humans) use
binaural hearing to localize sound, by comparing the information received from each ear in a complex process that involves a significant amount of synthesis. It is difficult to localize using
monaural hearing, especially in
3D space
Three-dimensional space (also: 3D space, 3-space or, rarely, tri-dimensional space) is a geometric setting in which three values (called ''parameters'') are required to determine the position of an element (i.e., point). This is the informal ...
.
Technology
Sound localization technology is used in some
audio and
acoustics
Acoustics is a branch of physics that deals with the study of mechanical waves in gases, liquids, and solids including topics such as vibration, sound, ultrasound and infrasound. A scientist who works in the field of acoustics is an acoustician ...
fields, such as
hearing aids, surveillance and
navigation. Existing real-time passive sound localization systems are mainly based on the time-difference-of-arrival (
TDOA) approach, limiting sound localization to
two-dimensional space, and are not practical in noisy conditions.
Applications

Applications of sound source localization include sound source separation, sound source tracking, and speech enhancement.
Sonar uses sound source localization techniques to identify the location of a target. 3D sound localization is also used for effective human-robot interaction. With the increasing demand for robotic hearing, some applications of 3D sound localization such as human-machine interface, handicapped aid, and military applications, are being explored.
Cues for sound localization
Localization cues are features that help localize sound. Cues for sound localization include binaural and monoaural cues.
* Monoaural cues can be obtained via
spectral analysis and are generally used in vertical localization.
* Binaural cues are generated by the difference in hearing between the left and right ears. These differences include the
interaural time difference (ITD) and the interaural intensity difference (IID). Binaural cues are used mostly for horizontal localization.
How does one localize sound?
The first clue our hearing uses is interaural time difference. Sound from a source directly in front of or behind us will arrive simultaneously at both ears. If the source moves to the left or right, our ears pick up the sound from the same source arriving at both ears - but with a certain delay. Another way of saying it could be, that the two ears pick up different phases of the same signal.
Methods
There are many different methods of 3D sound localization. For instance:
* Different types of sensor structure, such as
microphone array and binaural hearing robot head.
* Different techniques for optimal results, such as
neural network
A neural network is a network or circuit of biological neurons, or, in a modern sense, an artificial neural network, composed of artificial neurons or nodes. Thus, a neural network is either a biological neural network, made up of biological ...
,
maximum likelihood and
Multiple signal classification (MUSIC).
* Real-time methods using an Acoustic Vector Sensor (AVS) array
* Scanning techniques
* Offline methods (according to timeliness)
* Microphone Array Approach
Steered Beamformer Approach
This approach utilizes eight microphones combined with a steered beamformer enhanced by the Reliability Weighted Phase Transform (RWPHAT). The final results are filtered through a
particle filter that tracks sources and prevents false directions.
The motivation of using this method is that based on previous research. This method is used for multiple sound source tracking and localizing despite soundtracking and localization only apply for a single sound source.
Beamformer-based Sound Localization
To maximize the output energy of a delay-and-sum
beamformer in order to find the maximum value of the output of a beamformer steered in all possible directions.
Using the Reliability Weighted Phase Transform (RWPHAT) method,
The output energy of M-microphone delay-and-sum beamformer is
:
Where E indicates the energy, and K is a constant,
is the microphone pairs
cross-correlation
In signal processing, cross-correlation is a measure of similarity of two series as a function of the displacement of one relative to the other. This is also known as a ''sliding dot product'' or ''sliding inner-product''. It is commonly used fo ...
defined by Reliability Weighted Phase Transform:
:
the weighted factor
reflect the reliability of each frequency component, and defined as the Wiener Filter gain
, where
is an estimate of a prior SNR at
microphone, at time frame
, for frequency
, computed using the decision-directed approach.
The
is the signal from
microphone and
is the delay of arrival for that microphone. The more specific procedure of this method is proposed by Valin and Michaud
The advantage of this method is that it detects the direction of the sound and derives the distance of sound sources. The main drawback of the beamforming approach is the imperfect nature of sound localization accuracy and capability, versus the neural network approach, which uses moving speakers.
Collocated Microphone Array Approach
This method relates to the technique of Real-Time sound localization utilizing an Acoustic Vector Sensor (AVS) array, which measures all three components of the acoustic particle velocity, as well as the sound pressure, unlike conventional acoustic sensor arrays that only utilize the pressure information and delays in the propagating acoustic field. Exploiting this extra information, AVS arrays are able to significantly improve the accuracy of source localization.
Acoustic Vector Array

• Contains three orthogonally placed acoustic particle velocity sensors (shown as X, Y and Z array) and one omnidirectional acoustic microphone (O).
• Commonly used both in air and underwater.
• Can be used in combination with the Offline Calibration Process to measure and interpolate the impulse response of X, Y, Z and O arrays, to obtain their steering vector.
A sound signal is first windowed using a rectangular window, then each resulting segment signal is created as a frame. 4 parallel frames are detected from XYZO array and used for DOA estimation. The 4 frames are split into small blocks with equal size, then the Hamming window and FFT are used to convert each block from a time domain to a frequency domain. Then the output of this system is represented by a horizontal angle and a vertical angle of the sound sources which is found by the peak in the combined 3D spatial spectrum.
The advantages of this array, compared with past microphone array, are that this device has a high performance even if the aperture is small, and it can localize multiple
low frequency
Low frequency (LF) is the ITU designation for radio frequencies (RF) in the range of 30–300 kHz. Since its wavelengths range from 10–1 km, respectively, it is also known as the kilometre band or kilometre wave.
LF radio waves exh ...
and
high frequency
High frequency (HF) is the ITU designation for the range of radio frequency electromagnetic waves (radio waves) between 3 and 30 megahertz (MHz). It is also known as the decameter band or decameter wave as its wavelengths range from one to ten ...
wide band sound sources simultaneously. Applying an O array can make more available acoustic information, such as amplitude and time difference. Most importantly, XYZO array has a better performance with a tiny size.
The AVS is one kind of collocated multiple microphone array, it makes use of a multiple microphone array approach for estimating the sound directions by multiple arrays and then finds the locations by using reflection information such as where the direction is detected where different arrays cross.
Motivation of the Advanced Microphone array
Sound reflections always occur in an actual environment and microphone arrays cannot avoid observing those reflections. This multiple array approach was tested using fixed arrays in the ceiling; the performance of the moving scenario still need to be tested.
Learning how to apply Multiple Microphone Array
Angle uncertainty (AU) will occur when estimating direction, and position uncertainty (PU) will also aggravate with increasing distance between the array and the source.
We know that:
:
Where r is the distance between array center to source, and AU is angle uncertainly.
Measurement is used for judging whether two directions cross at some location or not.
Minimum distance between two lines:
:
where
and
are two directions,
are vectors parallel to detected direction, and
are the position of arrays.
If
: