Sound localization is a listener's ability to identify the location or origin of a detected
sound
In physics, sound is a vibration that propagates as an acoustic wave, through a transmission medium such as a gas, liquid or solid.
In human physiology and psychology, sound is the ''reception'' of such waves and their ''perception'' by the ...
in direction and distance.
The sound localization mechanisms of the mammalian
auditory system
The auditory system is the sensory system for the sense of hearing. It includes both the sensory organs (the ears) and the auditory parts of the sensory system.
System overview
The outer ear funnels sound vibrations to the eardrum, increasin ...
have been extensively studied. The auditory system uses several cues for sound source localization, including time difference and level difference (or intensity difference) between the ears, and spectral information. These cues are also used by other animals, such as birds and reptiles, but there may be differences in usage, and there are also localization cues which are absent in the human auditory system, such as the effects of ear movements. Animals with the ability to localize sound have a clear evolutionary advantage.
How sound reaches the brain
Sound is the perceptual result of mechanical vibrations traveling through a medium such as air or water. Through the mechanisms of compression and rarefaction, sound waves travel through the air, bounce off the
pinna and concha of the exterior ear, and enter the ear canal. The sound waves vibrate the tympanic membrane (
ear drum
In the anatomy of humans and various other tetrapods, the eardrum, also called the tympanic membrane or myringa, is a thin, cone-shaped membrane that separates the external ear from the middle ear. Its function is to transmit sound from the air ...
), causing the three bones of the
middle ear
The middle ear is the portion of the ear medial to the eardrum, and distal to the oval window of the cochlea (of the inner ear).
The mammalian middle ear contains three ossicles, which transfer the vibrations of the eardrum into waves in the ...
to vibrate, which then sends the energy through the
oval window
The oval window (or ''fenestra vestibuli'' or ''fenestra ovalis'') is a membrane-covered opening from the middle ear to the cochlea of the inner ear.
Vibrations that contact the tympanic membrane travel through the three ossicles and into the in ...
and into the
cochlea
The cochlea is the part of the inner ear involved in hearing. It is a spiral-shaped cavity in the bony labyrinth, in humans making 2.75 turns around its axis, the modiolus. A core component of the cochlea is the Organ of Corti, the sensory org ...
where it is changed into a chemical signal by
hair cells
Hair cells are the sensory receptors of both the auditory system and the vestibular system in the ears of all vertebrates, and in the lateral line organ of fishes. Through mechanotransduction, hair cells detect movement in their environment.
...
in the
organ of Corti
The organ of Corti, or spiral organ, is the receptor organ for hearing and is located in the mammalian cochlea. This highly varied strip of epithelial cells allows for transduction of auditory signals into nerve impulses' action potential. Transd ...
, which
synapse
In the nervous system, a synapse is a structure that permits a neuron (or nerve cell) to pass an electrical or chemical signal to another neuron or to the target effector cell.
Synapses are essential to the transmission of nervous impulses from ...
onto
spiral ganglion
The spiral (cochlear) ganglion is a group of neuron cell bodies in the modiolus, the conical central axis of the cochlea. These bipolar neurons innervate the hair cells of the organ of Corti. They project their axons to the ventral and dorsal co ...
fibers that travel through the
cochlear nerve
The cochlear nerve (also auditory nerve or acoustic nerve) is one of two parts of the vestibulocochlear nerve, a cranial nerve present in amniotes, the other part being the vestibular nerve. The cochlear nerve carries auditory sensory information ...
into the brain.
Neural interactions
In
vertebrate
Vertebrates () comprise all animal taxa within the subphylum Vertebrata () ( chordates with backbones), including all mammals, birds, reptiles, amphibians, and fish. Vertebrates represent the overwhelming majority of the phylum Chordata, ...
s, interaural time differences are known to be calculated in the
superior olivary nucleus
The superior olivary complex (SOC) or superior olive is a collection of brainstem nuclei that functions in multiple aspects of hearing and is an important component of the ascending and descending auditory pathways of the auditory system. The SO ...
of the
brainstem
The brainstem (or brain stem) is the posterior stalk-like part of the brain that connects the cerebrum with the spinal cord. In the human brain the brainstem is composed of the midbrain, the pons, and the medulla oblongata. The midbrain is cont ...
. According to
Jeffress, this calculation relies on
delay lines:
neuron
A neuron, neurone, or nerve cell is an electrically excitable cell that communicates with other cells via specialized connections called synapses. The neuron is the main component of nervous tissue in all animals except sponges and placozoa. N ...
s in the superior olive which accept innervation from each ear with different connecting
axon
An axon (from Greek ἄξων ''áxōn'', axis), or nerve fiber (or nerve fibre: see spelling differences), is a long, slender projection of a nerve cell, or neuron, in vertebrates, that typically conducts electrical impulses known as action po ...
lengths. Some cells are more directly connected to one ear than the other, thus they are specific for a particular interaural time difference. This theory is equivalent to the mathematical procedure of
cross-correlation
In signal processing, cross-correlation is a measure of similarity of two series as a function of the displacement of one relative to the other. This is also known as a ''sliding dot product'' or ''sliding inner-product''. It is commonly used fo ...
. However, because Jeffress's theory is unable to account for the
precedence effect
The precedence effect or law of the first wavefront is a binaural psychoacoustical effect. When a sound is followed by another sound separated by a sufficiently short time delay (below the listener's echo threshold), listeners perceive a single ...
, in which only the first of multiple identical sounds is used to determine the sounds' location (thus avoiding confusion caused by echoes), it cannot be entirely used to explain the response. Furthermore, a number of recent physiological observations made in the midbrain and brainstem of small mammals have shed considerable doubt on the validity of Jeffress's original ideas.
Neurons sensitive to interaural level differences (ILDs) are excited by stimulation of one ear and inhibited by stimulation of the other ear, such that the response magnitude of the cell depends on the relative strengths of the two inputs, which in turn, depends on the sound intensities at the ears.
In the auditory midbrain nucleus, the
inferior colliculus
The inferior colliculus (IC) (Latin for ''lower hill'') is the principal midbrain nucleus of the auditory pathway and receives input from several peripheral brainstem nuclei in the auditory pathway, as well as inputs from the auditory cortex. The ...
(IC), many ILD sensitive neurons have response functions that decline steeply from maximum to zero spikes as a function of ILD. However, there are also many neurons with much more shallow response functions that do not decline to zero spikes.
The cone of confusion
Most mammals are adept at resolving the location of a sound source using
interaural time difference
The interaural time difference (or ITD) when concerning humans or animals, is the difference in arrival time of a sound between two ears. It is important in the localization of sounds, as it provides a cue to the direction or angle of the sound s ...
s and interaural level differences. However, no such time or level differences exist for sounds originating along the circumference of circular conical slices, where the
cone
A cone is a three-dimensional geometric shape that tapers smoothly from a flat base (frequently, though not necessarily, circular) to a point called the apex or vertex.
A cone is formed by a set of line segments, half-lines, or lines con ...
's axis lies along the line between the two ears.
Consequently, sound waves originating at any point along a given circumference
slant height
Slant can refer to:
Bias
*Bias or other non- objectivity in journalism, politics, academia or other fields
Technical
* Slant range, in telecommunications, the line-of-sight distance between two points which are not at the same level
*Slant d ...
will have ambiguous perceptual coordinates. That is to say, the listener will be incapable of determining whether the sound originated from the back, front, top, bottom or anywhere else along the circumference at the base of a cone at any given distance from the ear. Of course, the importance of these ambiguities are vanishingly small for sound sources very close to or very far away from the subject, but it is these intermediate distances that are most important in terms of fitness.
These ambiguities can be removed by tilting the head, which can introduce a shift in both the
amplitude
The amplitude of a periodic variable is a measure of its change in a single period (such as time or spatial period). The amplitude of a non-periodic signal is its magnitude compared with a reference value. There are various definitions of amplit ...
and
phase
Phase or phases may refer to:
Science
*State of matter, or phase, one of the distinct forms in which matter can exist
*Phase (matter), a region of space throughout which all physical properties are essentially uniform
* Phase space, a mathematic ...
of sound waves arriving at each ear. This translates the vertical orientation of the interaural axis horizontally, thereby leveraging the mechanism of localization on the horizontal plane. Moreover, even with no alternation in the angle of the interaural axis (i.e. without tilting one's head) the hearing system can capitalize on interference patterns generated by pinnae, the torso, and even the temporary re-purposing of a hand as extension of the pinna (e.g., cupping one's hand around the ear).
As with other sensory stimuli, perceptual disambiguation is also accomplished through integration of multiple sensory inputs, especially visual cues. Having localized a sound within the circumference of a circle at some perceived distance, visual cues serve to fix the location of the sound. Moreover,
prior knowledge of the location of the sound generating agent will assist in resolving its current location.
Sound localization by the human auditory system
Sound localization is the process of determining the location of a
sound
In physics, sound is a vibration that propagates as an acoustic wave, through a transmission medium such as a gas, liquid or solid.
In human physiology and psychology, sound is the ''reception'' of such waves and their ''perception'' by the ...
source. The brain utilizes subtle differences in intensity, spectral, and timing cues to allow us to localize sound sources.
[Thompson, Daniel M. Understanding Audio: Getting the Most out of Your Project or Professional Recording Studio. Boston, MA: Berklee, 2005. Print.] In this section, to more deeply understand the human auditory mechanism, we will briefly discuss about human ear localization theory.
General introduction
Localization can be described in terms of three-dimensional position: the azimuth or horizontal angle, the elevation or vertical angle, and the distance (for static sounds) or velocity (for moving sounds).
[Roads, Curtis. The Computer Music Tutorial. Cambridge, MA: MIT, 2007. Print.]
The azimuth of a sound is signaled by the
difference in arrival times between the ears, by the relative amplitude of high-frequency sounds (the shadow effect), and by the asymmetrical spectral reflections from various parts of our bodies, including torso, shoulders, and
pinnae
The auricle or auricula is the visible part of the ear that is outside the head. It is also called the pinna (Latin for "wing" or " fin", plural pinnae), a term that is used more in zoology.
Structure
The diagram shows the shape and location ...
.
The distance cues are the loss of amplitude, the loss of high frequencies, and the ratio of the direct signal to the reverberated signal.
Depending on where the source is located, our head acts as a barrier to change the
timbre
In music, timbre ( ), also known as tone color or tone quality (from psychoacoustics), is the perceived sound quality of a musical note, sound or musical tone, tone. Timbre distinguishes different types of sound production, such as choir voice ...
, intensity, and
spectral
''Spectral'' is a 2016 3D military science fiction, supernatural horror fantasy and action-adventure thriller war film directed by Nic Mathieu. Written by himself, Ian Fried, and George Nolfi from a story by Fried and Mathieu. The film stars ...
qualities of the sound, helping the brain orient where the sound emanated from.
These minute differences between the two ears are known as interaural cues.
Lower frequencies, with longer wavelengths, diffract the sound around the head forcing the brain to focus only on the phasing cues from the source.
Helmut Haas discovered that we can discern the sound source despite additional reflections at 10 decibels louder than the original wave front, using the earliest arriving wave front.
This principle is known as the
Haas effect
Haas may refer to:
People
* Haas (surname)
* Haas Visser 't Hooft (1905–1977), Dutch field hockey player
Auto racing
* Haas F1 Team, a 21st-century Formula 1 auto racing team
* Haas Lola, a 20th-century Formula 1 auto racing team
* Newman/Haa ...
, a specific version of the
precedence effect
The precedence effect or law of the first wavefront is a binaural psychoacoustical effect. When a sound is followed by another sound separated by a sufficiently short time delay (below the listener's echo threshold), listeners perceive a single ...
.
Haas measured down to even a 1 millisecond difference in timing between the original sound and reflected sound increased the spaciousness, allowing the brain to discern the true location of the original sound. The nervous system combines all early reflections into a single perceptual whole allowing the brain to process multiple different sounds at once.
[Benade, Arthur H. Fundamentals of Musical Acoustics. New York: Oxford UP, 1976. Print.] The nervous system will combine reflections that are within about 35 milliseconds of each other and that have a similar intensity.
Duplex theory
To determine the lateral input direction (left, front, right), the
auditory system
The auditory system is the sensory system for the sense of hearing. It includes both the sensory organs (the ears) and the auditory parts of the sensory system.
System overview
The outer ear funnels sound vibrations to the eardrum, increasin ...
analyzes the following
ear
An ear is the organ that enables hearing and, in mammals, body balance using the vestibular system. In mammals, the ear is usually described as having three parts—the outer ear, the middle ear and the inner ear. The outer ear consists of ...
signal information:
In 1907, Lord Rayleigh utilized tuning forks to generate monophonic excitation and studied the lateral sound localization theory on a human head model without auricle. He first presented the interaural clue difference based sound localization theory, which is known as Duplex Theory. Human ears are on different sides of the head, and thus have different coordinates in space. As shown in the duplex theory figure, since the distances between the acoustic source and ears are different, there are time difference and intensity difference between the sound signals of two ears. We call those kinds of differences as Interaural Time Difference (ITD) and Interaural Intensity Difference (IID) respectively.
ITD and IID
From the duplex theory figure we can see that for source B1 or source B2, there will be a propagation delay between two ears, which will generate the ITD. Simultaneously, human head and ears may have a shadowing effect on high-frequency signals, which will generate IID.
*
Interaural time difference
The interaural time difference (or ITD) when concerning humans or animals, is the difference in arrival time of a sound between two ears. It is important in the localization of sounds, as it provides a cue to the direction or angle of the sound s ...
(ITD) – Sound from the right side reaches the right ear earlier than the left ear. The auditory system evaluates interaural time differences from: (a)
Phase delay
In signal processing, group delay and phase delay are delay times experienced by a signal's various frequency components when the signal passes through a system that is linear time-invariant (LTI), such as a microphone, coaxial cable, amplifier, ...
s at low frequencies and (b)
group delays at high frequencies.
* Theory and experiments show that ITD relates to the signal frequency f. Suppose the angular position of the acoustic source is θ, the head radius is r and the acoustic velocity is c, the function of ITD is given by:
[Zhou X. Virtual reality technique Telecommunications Science, 1996, 12(7): 46-–.]. In above closed form, we assumed that the 0 degree is in the right ahead of the head and counter-clockwise is positive.
* Interaural intensity difference (IID) or interaural level difference (ILD) – Sound from the right side has a higher level at the right ear than at the left ear, because the
head shadow A head shadow (or acoustic shadow) is a region of reduced amplitude of a sound because it is obstructed by the head. It is an example of diffraction.
Sound may have to travel through and around the head in order to reach an ear. The obstruction c ...
s the left ear. These level differences are highly frequency dependent and they increase with increasing frequency. Massive theoretical researches demonstrate that IID relates to the signal frequency f and the angular position of the acoustic source θ. The function of IID is given by:
* For frequencies below 1000 Hz, mainly ITDs are evaluated (
phase delay
In signal processing, group delay and phase delay are delay times experienced by a signal's various frequency components when the signal passes through a system that is linear time-invariant (LTI), such as a microphone, coaxial cable, amplifier, ...
s), for frequencies above 1500 Hz mainly IIDs are evaluated. Between 1000 Hz and 1500 Hz there is a transition zone, where both mechanisms play a role.
* Localization accuracy is 1 degree for sources in front of the listener and 15 degrees for sources to the sides. Humans can discern interaural time differences of 10 microseconds or less.
Evaluation for low frequencies
For frequencies below 800 Hz, the dimensions of the head (ear distance 21.5 cm, corresponding to an interaural time delay of 625 µs) are smaller than the half
wavelength
In physics, the wavelength is the spatial period of a periodic wave—the distance over which the wave's shape repeats.
It is the distance between consecutive corresponding points of the same phase on the wave, such as two adjacent crests, tro ...
of the sound waves. So the auditory system can determine phase delays between both ears without confusion. Interaural level differences are very low in this frequency range, especially below about 200 Hz, so a precise evaluation of the input direction is nearly impossible on the basis of level differences alone. As the frequency drops below 80 Hz it becomes difficult or impossible to use either time difference or level difference to determine a sound's lateral source, because the phase difference between the ears becomes too small for a directional evaluation.
Evaluation for high frequencies
For frequencies above 1600 Hz the dimensions of the head are greater than the length of the sound waves. An unambiguous determination of the input direction based on interaural phase alone is not possible at these frequencies. However, the interaural level differences become larger, and these level differences are evaluated by the auditory system. Also, delays between the ears can still be detected via some combination of phase differences and
group delays, which are more pronounced at higher frequencies; that is, if there is a sound onset, the delay of this onset between the ears can be used to determine the input direction of the corresponding sound source. This mechanism becomes especially important in reverberant environments. After a sound onset there is a short time frame where the direct sound reaches the ears, but not yet the reflected sound. The auditory system uses this short time frame for evaluating the sound source direction, and keeps this detected direction as long as reflections and reverberation prevent an unambiguous direction estimation.
The mechanisms described above cannot be used to differentiate between a sound source ahead of the hearer or behind the hearer; therefore additional cues have to be evaluated.
Pinna filtering effect
Motivations
Duplex theory shows that ITD and IID play significant roles in sound localization, but they can only deal with lateral localization problems. For example, if two acoustic sources are placed symmetrically at the front and back of the right side of the human head, they will generate equal ITDs and IIDs, in what is called the cone model effect. However, human ears can still distinguish between these sources. Besides that, in natural sense of hearing, one ear alone, without any ITD or IID, can distinguish between them with high accuracy. Due to the disadvantages of duplex theory, researchers proposed the pinna filtering effect theory.
The shape of human pinna is concave with complex folds and asymmetrical both horizontally and vertically. Reflected and direct waves generate a frequency spectrum on the eardrum, relating to the acoustic sources. Then auditory nerves localize the sources using this frequency spectrum.
Mathematical model
These spectrum clues generated by the pinna filtering effect can be presented as a
head-related transfer function
A head-related transfer function (HRTF), also known as anatomical transfer function (ATF), is a response that characterizes how an ear receives a sound from a point in space. As sound strikes the listener, the size and shape of the head, ears, e ...
(HRTF). The corresponding time domain expressions are called the Head-Related Impulse Response (HRIR). The HRTF is also described as the transfer function from the free field to a specific point in the ear canal. We usually recognize HRTFs as LTI systems:
where L and R represent the left ear and right ear respectively,
and
represent the amplitude of the sound pressure at the entrances to the left and right ear canals, and
is the amplitude of sound pressure at the center of the head coordinate when listener does not exist. In general, an HRTF's
and
are functions of source angular position
, elevation angle
, the distance between the source and the center of the head
, the angular velocity
and the equivalent dimension of the head
.
HRTF database
At present, the main institutes that work on measuring HRTF database include CIPIC International Lab, MIT Media Lab, the Graduate School in Psychoacoustics at the University of Oldenburg, the Neurophysiology Lab at the University of Wisconsin–Madison and Ames Lab of NASA. Databases of HRIRs from humans with normal and impaired hearing and from animals are publicly available.
Other cues for 3D space localization
Monaural cues
The human
outer ear
The outer ear, external ear, or auris externa is the external part of the ear, which consists of the auricle (also pinna) and the ear canal. It gathers sound energy and focuses it on the eardrum (tympanic membrane).
Structure
Auricle
The ...
, i.e. the structures of the
pinna and the external
ear canal
The ear canal (external acoustic meatus, external auditory meatus, EAM) is a pathway running from the outer ear to the middle ear. The adult human ear canal extends from the pinna to the eardrum and is about in length and in diameter.
Struc ...
, form direction-selective filters. Depending on the sound input direction in the median plane, different filter resonances become active. These resonances implant direction-specific patterns into the
frequency response
In signal processing and electronics, the frequency response of a system is the quantitative measure of the magnitude and phase of the output as a function of input frequency. The frequency response is widely used in the design and analysis of sy ...
s of the ears, which can be evaluated by the
auditory system
The auditory system is the sensory system for the sense of hearing. It includes both the sensory organs (the ears) and the auditory parts of the sensory system.
System overview
The outer ear funnels sound vibrations to the eardrum, increasin ...
for
vertical sound localization
Sound localization is a listener's ability to identify the location or origin of a detected sound in direction and distance.
The sound localization mechanisms of the mammalian auditory system have been extensively studied. The auditory system us ...
. Together with other direction-selective reflections at the head, shoulders and torso, they form the outer ear transfer functions. These patterns in the ear's
frequency response
In signal processing and electronics, the frequency response of a system is the quantitative measure of the magnitude and phase of the output as a function of input frequency. The frequency response is widely used in the design and analysis of sy ...
s are highly individual, depending on the shape and size of the outer ear. If sound is presented through headphones, and has been recorded via another head with different-shaped outer ear surfaces, the directional patterns differ from the listener's own, and problems will appear when trying to evaluate directions in the median plane with these foreign ears. As a consequence, front–back permutations or inside-the-head-localization can appear when listening to
dummy head recording
In acoustics, the dummy head recording (also known as ''artificial head'', ''Kunstkopf'' or ''Head and Torso Simulator'') is a method of recording used to generate binaural recordings. The tracks are then listened to through headphones allowi ...
s, or otherwise referred to as binaural recordings. It has been shown that human subjects can monaurally localize high frequency sound but not low frequency sound. Binaural localization, however, was possible with lower frequencies. This is likely due to the pinna being small enough to only interact with sound waves of high frequency. It seems that people can only accurately localize the elevation of sounds that are complex and include frequencies above 7,000 Hz, and a pinna must be present.
Dynamic binaural cues
When the head is stationary, the binaural cues for lateral sound localization (interaural time difference and interaural level difference) do not give information about the location of a sound in the median plane. Identical ITDs and ILDs can be produced by sounds at eye level or at any elevation, as long as the lateral direction is constant. However, if the head is rotated, the ITD and ILD change dynamically, and those changes are different for sounds at different elevations. For example, if an eye-level sound source is straight ahead and the head turns to the left, the sound becomes louder (and arrives sooner) at the right ear than at the left. But if the sound source is directly overhead, there will be no change in the ITD and ILD as the head turns. Intermediate elevations will produce intermediate degrees of change, and if the presentation of binaural cues to the two ears during head movement is reversed, the sound will be heard behind the listener.
Hans Wallach
Hans Wallach (November 28, 1904 – February 5, 1998) was a German-American experimental psychologist whose research focused on perception and learning. Although he was trained in the Gestalt psychology tradition, much of his later work explored t ...
artificially altered a sound's binaural cues during movements of the head. Although the sound was objectively placed at eye level, the dynamic changes to ITD and ILD as the head rotated were those that would be produced if the sound source had been elevated. In this situation, the sound was heard at the synthesized elevation. The fact that the sound sources objectively remained at eye level prevented monaural cues from specifying the elevation, showing that it was the dynamic change in the binaural cues during head movement that allowed the sound to be correctly localized in the vertical dimension. The head movements need not be actively produced; accurate vertical localization occurred in a similar setup when the head rotation was produced passively, by seating the blindfolded subject in a rotating chair. As long as the dynamic changes in binaural cues accompanied a perceived head rotation, the synthesized elevation was perceived.
Distance of the sound source
The human auditory system has only limited possibilities to determine the distance of a sound source. In the close-up-range there are some indications for distance determination, such as extreme level differences (e.g. when whispering into one ear) or specific
pinna (the visible part of the ear) resonances in the close-up range.
The auditory system uses these clues to estimate the distance to a sound source:
* Direct/ Reflection ratio: In enclosed rooms, two types of sound are arriving at a listener: The direct sound arrives at the listener's ears without being reflected at a wall. Reflected sound has been reflected at least one time at a wall before arriving at the listener. The ratio between direct sound and reflected sound can give an indication about the distance of the sound source.
* Loudness: Distant sound sources have a lower loudness than close ones. This aspect can be evaluated especially for well-known sound sources.
* Sound spectrum: High frequencies are more quickly damped by the air than low frequencies. Therefore, a distant sound source sounds more muffled than a close one, because the high frequencies are attenuated. For sound with a known spectrum (e.g. speech) the distance can be estimated roughly with the help of the perceived sound.
* ITDG: The Initial Time Delay Gap describes the time difference between arrival of the direct wave and first strong reflection at the listener. Nearby sources create a relatively large ITDG, with the first reflections having a longer path to take, possibly many times longer. When the source is far away, the direct and the reflected sound waves have similar path lengths.
* Movement: Similar to the visual system there is also the phenomenon of motion
parallax
Parallax is a displacement or difference in the apparent position of an object viewed along two different lines of sight and is measured by the angle or semi-angle of inclination between those two lines. Due to foreshortening, nearby objects ...
in acoustical perception. For a moving listener nearby sound sources are passing faster than distant sound sources.
* Level Difference: Very close sound sources cause a different level between the ears.
Signal processing
Sound processing of the human auditory system is performed in so-called
critical band In audiology and psychoacoustics the concept of critical bands, introduced by Harvey Fletcher in 1933 and refined in 1940, describes the frequency bandwidth (signal processing), bandwidth of the "auditory filter" created by the cochlea, the sense or ...
s. The
hearing range
Hearing range describes the range of frequencies that can be heard by humans or other animals, though it can also refer to the range of levels. The human range is commonly given as 20 to 20,000 Hz, although there is considerable variati ...
is segmented into 24 critical bands, each with a width of 1
Bark or 100
Mel
Mel, Mels or MEL may refer to:
Biology
* Mouse erythroleukemia cell line (MEL)
* National Herbarium of Victoria, a herbarium with the Index Herbariorum code MEL
People
* Mel (given name), the abbreviated version of several given names (including ...
. For a directional analysis the signals inside the critical band are analyzed together.
The auditory system can extract the sound of a desired sound source out of interfering noise. This allows the listener to concentrate on only one speaker if other speakers are also talking (the
cocktail party effect
The cocktail party effect is the phenomenon of the brain's ability to focus one's auditory attention on a particular stimulus while filtering out a range of other stimuli, such as when a partygoer can focus on a single conversation in a noisy room ...
). With the help of the cocktail party effect sound from interfering directions is perceived attenuated compared to the sound from the desired direction. The auditory system can increase the
signal-to-noise ratio
Signal-to-noise ratio (SNR or S/N) is a measure used in science and engineering that compares the level of a desired signal to the level of background noise. SNR is defined as the ratio of signal power to the noise power, often expressed in deci ...
by up to 15
dB, which means that interfering sound is perceived to be attenuated to half (or less) of its actual
loudness
In acoustics, loudness is the subjectivity, subjective perception of sound pressure. More formally, it is defined as, "That attribute of auditory sensation in terms of which sounds can be ordered on a scale extending from quiet to loud". The rel ...
.
Localization in enclosed rooms
In enclosed rooms not only the direct sound from a sound source is arriving at the listener's ears, but also sound which has been
reflected at the walls. The auditory system analyses only the direct sound,
which is arriving first, for sound localization, but not the reflected sound, which is arriving later (
law of the first wave front). So sound localization remains possible even in an echoic environment. This echo cancellation occurs in the Dorsal Nucleus of the
Lateral Lemniscus
The lateral lemniscus is a tract of axons in the brainstem that carries information about sound from the cochlear nucleus to various brainstem nuclei and ultimately the contralateral inferior colliculus of the midbrain. Three distinct, primarily in ...
(DNLL).
In order to determine the time periods, where the direct sound prevails and which can be used for directional evaluation, the auditory system analyzes loudness changes in different critical bands and also the stability of the perceived direction. If there is a strong attack of the loudness in several critical bands and if the perceived direction is stable, this attack is in all probability caused by the direct sound of a sound source, which is entering newly or which is changing its signal characteristics. This short time period is used by the auditory system for directional and loudness analysis of this sound. When reflections arrive a little bit later, they do not enhance the loudness inside the critical bands in such a strong way, but the directional cues become unstable, because there is a mix of sound of several reflection directions. As a result, no new directional analysis is triggered by the auditory system.
This first detected direction from the direct sound is taken as the found sound source direction, until other strong loudness attacks, combined with stable directional information, indicate that a new directional analysis is possible. (see
Franssen effect The Franssen effect is an auditory illusion where the listener incorrectly localizes a sound. It was found in 1960 by Nico Valentinus Franssen (1926–1979), a Dutch physicist and inventor. There are two classical experiments, which are related ...
)
Specific techniques with applications
Auditory transmission stereo system
This kind of sound localization technique provides us the real virtual
stereo system
Stereophonic sound, or more commonly stereo, is a method of sound reproduction that recreates a multi-directional, 3-dimensional audible perspective. This is usually achieved by using two independent audio channels through a configuration ...
.
[Zhao R. Study of Auditory Transmission Sound Localization System University of Science and Technology of China, 2006.] It utilizes "smart" manikins, such as KEMAR, to glean signals or use DSP methods to simulate the transmission process from sources to ears. After amplifying, recording and transmitting, the two channels of received signals will be reproduced through earphones or speakers. This localization approach uses electroacoustic methods to obtain the spatial information of the original sound field by transferring the listener's auditory apparatus to the original sound field. The most considerable advantages of it would be that its acoustic images are lively and natural. Also, it only needs two independent transmitted signals to reproduce the acoustic image of a 3D system.
3D para-virtualization stereo system
The representatives of this kind of system are SRS Audio Sandbox, Spatializer Audio Lab and
Qsound
QSound is the original name for a positional three-dimensional (3D) sound processing algorithm from QSound Labs that creates 3D audio effects from multiple monophonic sources and sums the outputs to two channels for presentation over regular ste ...
Qxpander.
They use HRTF to simulate the received acoustic signals at the ears from different directions with common binary-channel stereo reproduction. Therefore, they can simulate reflected sound waves and improve subjective sense of space and envelopment. Since they are para-virtualization stereo systems, the major goal of them is to simulate stereo sound information. Traditional stereo systems use sensors that are quite different from human ears. Although those sensors can receive the acoustic information from different directions, they do not have the same frequency response of human auditory system. Therefore, when binary-channel mode is applied, human auditory systems still cannot feel the 3D sound effect field. However, the 3D para-virtualization stereo system overcome such disadvantages. It uses HRTF principles to glean acoustic information from the original sound field then produce a lively 3D sound field through common earphones or speakers.
Multichannel stereo virtual reproduction
Since the multichannel stereo systems require many reproduction channels, some researchers adopted the HRTF simulation technologies to reduce the number of reproduction channels.
They use only two speakers to simulate multiple speakers in a multichannel system. This process is called as virtual reproduction. Essentially, such approach uses both interaural difference principle and pinna filtering effect theory. Unfortunately, this kind of approach cannot perfectly substitute the traditional multichannel stereo system, such as
5.1/
7.1 surround sound
7.1 surround sound is the common name for an eight-channel surround audio system commonly used in home theatre configurations. It adds two additional speakers to the more conventional six-channel (5.1) audio configuration. As with 5.1 surround sou ...
system. That is because when the listening zone is relatively larger, simulation reproduction through HRTFs may cause invert acoustic images at symmetric positions.
Animals
Since most animals have two ears, many of the effects of the human auditory system can also be found in other animals. Therefore, interaural time differences (interaural phase differences) and interaural level differences play a role for the hearing of many animals. But the influences on localization of these effects are dependent on head sizes, ear distances, the ear positions and the orientation of the ears. Smaller animals like insects use different techniques as the separation of the ears are too small.
Lateral information (left, ahead, right)
If the ears are located at the side of the head, similar lateral localization cues as for the human auditory system can be used. This means: evaluation of
interaural time difference
The interaural time difference (or ITD) when concerning humans or animals, is the difference in arrival time of a sound between two ears. It is important in the localization of sounds, as it provides a cue to the direction or angle of the sound s ...
s (interaural phase differences) for lower frequencies and evaluation of interaural level differences for higher frequencies. The evaluation of interaural phase differences is useful, as long as it gives unambiguous results. This is the case, as long as ear distance is smaller than half the length (maximal one wavelength) of the sound waves. For animals with a larger head than humans the evaluation range for interaural phase differences is shifted towards lower frequencies, for animals with a smaller head, this range is shifted towards higher frequencies.
The lowest frequency which can be localized depends on the ear distance. Animals with a greater ear distance can localize lower frequencies than humans can. For animals with a smaller ear distance the lowest localizable frequency is higher than for humans.
If the ears are located at the side of the head, interaural level differences appear for higher frequencies and can be evaluated for localization tasks. For animals with ears at the top of the head, no shadowing by the head will appear and therefore there will be much less interaural level differences, which could be evaluated. Many of these animals can move their ears, and these ear movements can be used as a lateral localization cue.
Odontocetes
Dolphins (and other odontocetes) rely on echolocation to aid in detecting, identifying, localizing, and capturing prey. Dolphin sonar signals are well suited for localizing multiple, small targets in a three-dimensional aquatic environment by utilizing highly directional (3 dB beamwidth of about 10 deg), broadband (3 dB bandwidth typically of about 40 kHz; peak frequencies between 40 kHz and 120 kHz), short duration clicks (about 40 μs). Dolphins can localize sounds both passively and actively (echolocation) with a resolution of about 1 deg. Cross-modal matching (between vision and echolocation) suggests dolphins perceive the spatial structure of complex objects interrogated through echolocation, a feat that likely requires spatially resolving individual object features and integration into a holistic representation of object shape. Although dolphins are sensitive to small, binaural intensity and time differences, mounting evidence suggests dolphins employ position-dependent spectral cues derived from well-developed head-related transfer functions, for sound localization in both the horizontal and vertical planes. A very small temporal integration time (264 μs) allows localization of multiple targets at varying distances. Localization adaptations include pronounced asymmetry of the skull, nasal sacks, and specialized lipid structures in the forehead and jaws, as well as acoustically isolated middle and inner ears.
In the median plane (front, above, back, below)
For many mammals there are also pronounced structures in the pinna near the entry of the ear canal. As a consequence, direction-dependent resonances can appear, which could be used as an additional localization cue, similar to the localization in the median plane in the human auditory system.
There are additional localization cues which are also used by animals.
Head tilting
For sound localization in the median plane (elevation of the sound) also two detectors can be used, which are positioned at different heights. In animals, however, rough elevation information is gained simply by tilting the head, provided that the sound lasts long enough to complete the movement. This explains the innate behavior of cocking the head to one side when trying to localize a sound precisely. To get instantaneous localization in more than two dimensions from time-difference or amplitude-difference cues requires more than two detectors.
Localization with coupled ears (flies)
The tiny parasitic fly ''
Ormia ochracea
''Ormia ochracea'' is a small yellow nocturnal fly in the family Tachinidae. It is notable for its parasitism of crickets and its exceptionally acute directional hearing. The female is attracted to the song of the male cricket and deposits larvae ...
'' has become a
model organism
A model organism (often shortened to model) is a non-human species that is extensively studied to understand particular biological phenomena, with the expectation that discoveries made in the model organism will provide insight into the workin ...
in sound localization experiments because of its unique
ear
An ear is the organ that enables hearing and, in mammals, body balance using the vestibular system. In mammals, the ear is usually described as having three parts—the outer ear, the middle ear and the inner ear. The outer ear consists of ...
. The animal is too small for the time difference of sound arriving at the two ears to be calculated in the usual way, yet it can determine the direction of sound sources with exquisite precision. The
tympanic membrane
In the anatomy of humans and various other tetrapods, the eardrum, also called the tympanic membrane or myringa, is a thin, cone-shaped membrane that separates the external ear from the middle ear. Its function is to transmit sound from the air ...
s of opposite ears are directly connected mechanically, allowing resolution of sub-microsecond time differences and requiring a new
neural coding
Neural coding (or Neural representation) is a neuroscience field concerned with characterising the hypothetical relationship between the stimulus and the individual or ensemble neuronal responses and the relationship among the electrical activity o ...
strategy. Ho showed that the coupled-eardrum system in frogs can produce increased interaural vibration disparities when only small
arrival time Time of arrival (TOA or ToA) is the absolute time instant when a radio signal emanating from a transmitter reaches a remote receiver.
The time span elapsed since the time of transmission (TOT or ToT) is the ''time of flight'' (TOF or ToF).
Time diff ...
and sound level differences were available to the animal's head. Efforts to build directional microphones based on the coupled-eardrum structure are underway.
Bi-coordinate sound localization (owls)
Most owls are
nocturnal
Nocturnality is an animal behavior characterized by being active during the night and sleeping during the day. The common adjective is "nocturnal", versus diurnal meaning the opposite.
Nocturnal creatures generally have highly developed sens ...
or
crepuscular
In zoology, a crepuscular animal is one that is active primarily during the twilight period, being matutinal, vespertine, or both. This is distinguished from diurnal and nocturnal behavior, where an animal is active during the hours of daylig ...
birds of prey
Birds of prey or predatory birds, also known as raptors, are hypercarnivorous bird species that actively hunt and feed on other vertebrates (mainly mammals, reptiles and other smaller birds). In addition to speed and strength, these predators ...
. Because they hunt at night, they must rely on non-visual senses. Experiments by Roger Payne
[Payne, Roger S., 1962. How the Barn Owl Locates Prey by Hearing. ''The Living Bird, First Annual of the Cornell Laboratory of Ornithology'', 151-159] have shown that owls are sensitive to the sounds made by their prey, not the heat or the smell. In fact, the sound cues are both necessary and sufficient for localization of mice from a distant location where they are perched. For this to work, the owls must be able to accurately localize both the azimuth and the elevation of the sound source.
History
The term 'binaural' literally signifies 'to hear with two ears', and was introduced in 1859 to signify the practice of listening to the same sound through both ears, or to two discrete sounds, one through each ear. It was not until 1916 that
Carl Stumpf
Carl Stumpf (; 21 April 1848 – 25 December 1936) was a German philosopher, psychologist and musicologist. He is noted for founding the Berlin School of Experimental Psychology.
He studied with Franz Brentano at the University of Würzburg bef ...
(1848–1936), a German
philosopher
A philosopher is a person who practices or investigates philosophy. The term ''philosopher'' comes from the grc, φιλόσοφος, , translit=philosophos, meaning 'lover of wisdom'. The coining of the term has been attributed to the Greek th ...
and
psychologist
A psychologist is a professional who practices psychology and studies mental states, perceptual, cognitive, emotional, and social processes and behavior. Their work often involves the experimentation, observation, and interpretation of how indi ...
, distinguished between dichotic listening, which refers to the stimulation of each ear with a different
stimulus
A stimulus is something that causes a physiological response. It may refer to:
*Stimulation
**Stimulus (physiology), something external that influences an activity
**Stimulus (psychology), a concept in behaviorism and perception
*Stimulus (economi ...
, and diotic listening, the simultaneous stimulation of both ears with the same stimulus.
Later, it would become apparent that binaural hearing, whether dichotic or diotic, is the means by which sound localization occurs.
Scientific consideration of binaural hearing began before the phenomenon was so named, with speculations published in 1792 by William Charles Wells
Dr William Charles Wells FRS FRSE FRCP (24 May 1757 – 18 September 1817) was a Scottish-American physician and printer. He lived a life of extraordinary variety, did some notable medical research, and made the first clear statement about na ...
(1757–1817) based on his research into binocular vision
In biology, binocular vision is a type of vision in which an animal has two eyes capable of facing the same direction to perceive a single three-dimensional image of its surroundings. Binocular vision does not typically refer to vision where an ...
.[ ]Giovanni Battista Venturi
Giovanni Battista Venturi (11 September 1746 – 10 September 1822) was an Italian physicist, savant, man of letters, diplomat and historian of science. He was the discoverer of the Venturi effect, which was described in 1797 in his ''Recherches E ...
(1746–1822) conducted and described experiments in which people tried to localize a sound using both ears, or one ear blocked with a finger. This work was not followed up on, and was only recovered after others had worked out how human sound localization works. Lord Rayleigh
John William Strutt, 3rd Baron Rayleigh, (; 12 November 1842 – 30 June 1919) was an English mathematician and physicist who made extensive contributions to science. He spent all of his academic career at the University of Cambridge. Amo ...
(1842–1919) would do these same experiments and come to the results, without knowing Venturi had first done them, almost seventy-five years later.[
]Charles Wheatstone
Sir Charles Wheatstone FRS FRSE DCL LLD (6 February 1802 – 19 October 1875), was an English scientist and inventor of many scientific breakthroughs of the Victorian era, including the English concertina, the stereoscope (a device for di ...
(1802–1875) did work on optics and color mixing, and also explored hearing. He invented a device he called a "microphone" that involved a metal plate over each ear, each connected to metal rods; he used this device to amplify sound. He also did experiments holding tuning fork
A tuning fork is an acoustic resonator in the form of a two-pronged fork with the prongs (tines) formed from a U-shaped bar of elastic metal (usually steel). It resonates at a specific constant pitch when set vibrating by striking it against ...
s to both ears at the same time, or separately, trying to work out how sense of hearing works, that he published in 1827.[ ]Ernst Heinrich Weber
Ernst Heinrich Weber (24 June 1795 – 26 January 1878) was a German physician who is considered one of the founders of experimental psychology. He was an influential and important figure in the areas of physiology and psychology during his lif ...
(1795–1878) and August Seebeck
August Ludwig Friedrich Wilhelm Seebeck (27 December 1805 in Jena – 19 March 1849 in Dresden) was a scientist at the Technische Universität Dresden.
Seebeck is primarily remembered for his work on sound and hearing, in particular with experim ...
(1805–1849) and William Charles Wells
Dr William Charles Wells FRS FRSE FRCP (24 May 1757 – 18 September 1817) was a Scottish-American physician and printer. He lived a life of extraordinary variety, did some notable medical research, and made the first clear statement about na ...
also attempted to compare and contrast what would become known as binaural hearing with the principles of binocular integration generally.[
Understanding how the differences in sound signals between two ears contributes to ]auditory processing
The auditory cortex is the part of the temporal lobe that processes auditory information in humans and many other vertebrates. It is a part of the auditory system, performing basic and higher functions in hearing, such as possible relations to ...
in such a way as to enable sound localization and direction was considerably advanced after the invention of the stethophone by Somerville Scott Alison
Somerville may refer to:
* Somerville College, Oxford, a constituent college of the University of Oxford
Places
*Somerville, Victoria, Australia
* Somerville, Western Australia, a suburb of Kalgoorlie, Australia
* Somerville, New Zealand, a sub ...
in 1859, who coined the term 'binaural'. Alison based the stethophone on the stethoscope
The stethoscope is a medical device for auscultation, or listening to internal sounds of an animal or human body. It typically has a small disc-shaped resonator that is placed against the skin, and one or two tubes connected to two earpieces. ...
, which had been invented by René Théophile Hyacinthe Laennec (1781–1826); the stethophone had two separate "pickups", allowing the user to hear and compare sounds derived from two discrete locations.
See also
* Acoustic location
Acoustic location is the use of sound to determine the distance and direction of its source or reflector. Location can be done actively or passively, and can take place in gases (such as the atmosphere), liquids (such as water), and in solids (s ...
* Animal echolocation
Echolocation, also called bio sonar, is a biological sonar used by several animal species.
Echolocating animals emit calls out to the environment
and listen to the echoes of those calls that return from various objects near them. They use these ...
* Binaural fusion Binaural fusion or binaural integration is a cognitive process that involves the combination of different auditory information presented binaurally, or to each ear. In humans, this process is essential in understanding speech as one ear may pick u ...
* Coincidence detection in neurobiology
Coincidence detection in the context of neurobiology is a process by which a neuron or a neural circuit can encode information by detecting the occurrence of temporally close but spatially distributed input signals. Coincidence detectors influen ...
* Human echolocation
Human echolocation is the ability of humans to detect objects in their environment by sensing echoes from those objects, by actively creating sounds: for example, by tapping their canes, lightly stomping their foot, snapping their fingers, or makin ...
* Perceptual-based 3D sound localization
Perceptual-based 3D sound localization is the application of knowledge of the human auditory system to develop 3D sound localization technology.
Motivation and Applications
Human listeners combine information from two ears to localize and separ ...
* Psychoacoustics
Psychoacoustics is the branch of psychophysics involving the scientific study of sound perception and audiology—how humans perceive various sounds. More specifically, it is the branch of science studying the psychological responses associated wit ...
* Spatial hearing loss
Spatial hearing loss refers to a form of deafness that is an inability to use spatial cues about where a sound originates from in space. This in turn affects the ability to understand speech in the presence of background noise.Cameron S and Dillon ...
References
External links
auditoryneuroscience.com: Collection of multimedia files and flash demonstrations related to spatial hearing
* ttp://highered.mcgraw-hill.com/sites/0070579431/student_view0/chapter11/glossary.html Online learning center - Hearing and Listening
HearCom:Hearing in the Communication Society, an EU research project
Research on "Non-line-of-sight (NLOS) Localisation for Indoor Environments" by CMR at UNSW
An introduction to acoustic holography
An introduction to acoustic beamforming
{{Neuroethology
Acoustics
Neuroethology
Hearing
Sound
Spatial cognition