A head-related transfer function (HRTF), also known as anatomical transfer function (ATF), is a response that characterizes how an

ear An ear is the organ that enables hearing and, in mammals, body balance using the vestibular system. In mammals, the ear is usually described as having three parts—the outer ear, the middle ear and the inner ear. The outer ear consists of ...

receives a sound from a point in space. As sound strikes the listener, the size and shape of the head, ears, ear canal, density of the head, size and shape of nasal and oral cavities, all transform the sound and affect how it is perceived, boosting some frequencies and attenuating others. Generally speaking, the HRTF boosts frequencies from 2–5 kHz with a primary resonance of +17 dB at 2,700 Hz. But the response curve is more complex than a single bump, affects a broad frequency spectrum, and varies significantly from person to person. A pair of HRTFs for two ears can be used to synthesize a binaural sound that seems to come from a particular point in space. It is a

transfer function In engineering, a transfer function (also known as system function or network function) of a system, sub-system, or component is a function (mathematics), mathematical function that mathematical model, theoretically models the system's output for ...

, describing how a sound from a specific point will arrive at the ear (generally at the outer end of the

auditory canal The ear canal (external acoustic meatus, external auditory meatus, EAM) is a pathway running from the outer ear to the middle ear. The adult human ear canal extends from the pinna to the eardrum and is about in length and in diameter. Struc ...

). Some consumer home entertainment products designed to reproduce surround sound from stereo (two-speaker) headphones use HRTFs. Some forms of HRTF-processing have also been included in computer software to simulate surround sound playback from loudspeakers. Humans have just two

s, but can locate sounds in three dimensions – in range (distance), in direction above and below (elevation), in front and to the rear, as well as to either side (azimuth). This is possible because the brain, inner ear, and the external ears ( pinna) work together to make inferences about location. This ability to localize sound sources may have developed in humans and ancestors as an evolutionary necessity, since the eyes can only see a fraction of the world around a viewer, and vision is hampered in darkness, while the ability to localize a sound source works in all directions, to varying accuracy, regardless of the surrounding light. Humans estimate the location of a source by taking cues derived from one ear (''monaural cues''), and by comparing cues received at both ears (''difference cues'' or ''binaural cues''). Among the difference cues are time differences of arrival and intensity differences. The monaural cues come from the interaction between the sound source and the human anatomy, in which the original source sound is modified before it enters the ear canal for processing by the auditory system. These modifications encode the source location, and may be captured via an

impulse response In signal processing and control theory, the impulse response, or impulse response function (IRF), of a dynamic system is its output when presented with a brief input signal, called an Dirac delta function, impulse (). More generally, an impulse ...

which relates the source location and the ear location. This impulse response is termed the ''head-related impulse response'' (HRIR).

Convolution In mathematics (in particular, functional analysis), convolution is a operation (mathematics), mathematical operation on two function (mathematics), functions ( and ) that produces a third function (f*g) that expresses how the shape of one is ...

of an arbitrary source sound with the HRIR converts the sound to that which would have been heard by the listener if it had been played at the source location, with the listener's ear at the receiver location. HRIRs have been used to produce virtual surround sound. The HRTF is the

Fourier transform A Fourier transform (FT) is a mathematical transform that decomposes functions into frequency components, which are represented by the output of the transform as a function of frequency. Most commonly functions of time or space are transformed, ...

of HRIR. HRTFs for left and right ear (expressed above as HRIRs) describe the filtering of a sound source (''x''(''t'')) before it is perceived at the left and right ears as ''x''_L(''t'') and ''x''_R(''t''), respectively. The HRTF can also be described as the modifications to a

sound In physics, sound is a vibration that propagates as an acoustic wave, through a transmission medium such as a gas, liquid or solid. In human physiology and psychology, sound is the ''reception'' of such waves and their ''perception'' by the ...

from a direction in free air to the sound as it arrives at the

eardrum In the anatomy of humans and various other tetrapods, the eardrum, also called the tympanic membrane or myringa, is a thin, cone-shaped membrane that separates the external ear The outer ear, external ear, or auris externa is the extern ...

. These modifications include the shape of the listener's outer ear, the shape of the listener's head and body, the acoustic characteristics of the space in which the sound is played, and so on. All these characteristics will influence how (or whether) a listener can accurately tell what direction a sound is coming from. In the AES69-2015 standard, the

Audio Engineering Society The Audio Engineering Society (AES) is a professional body for engineers, scientists, other individuals with an interest or involvement in the professional audio industry. The membership largely comprises engineers developing devices or products ...

(AES) has defined the SOFA file format for storing spatially oriented acoustic data like head-related transfer functions (HRTFs). SOFA software libraries and files are collected at the Sofa Conventions website.

How HRTF works

The associated mechanism varies between individuals, as their

head A head is the part of an organism which usually includes the ears, brain, forehead, cheeks, chin, eyes, nose, and mouth, each of which aid in various sensory functions such as sight, hearing, smell, and taste. Some very simple animals may ...

and ear shapes differ. HRTF describes how a given sound wave input (parameterized as frequency and source location) is filtered by the

diffraction Diffraction is defined as the interference or bending of waves around the corners of an obstacle or through an aperture into the region of geometrical shadow of the obstacle/aperture. The diffracting object or aperture effectively becomes a s ...

and

reflection Reflection or reflexion may refer to: Science and technology * Reflection (physics), a common wave phenomenon ** Specular reflection, reflection from a smooth surface *** Mirror image, a reflection in a mirror or in water ** Signal reflection, in ...

properties of the

, pinna, and

torso The torso or trunk is an anatomical term for the central part, or the core, of the body of many animals (including humans), from which the head, neck, limbs, tail and other appendages extend. The tetrapod torso — including that of a human � ...

, before the sound reaches the transduction machinery of the eardrum and inner ear (see

auditory system The auditory system is the sensory system for the sense of hearing. It includes both the sensory organs (the ears) and the auditory parts of the sensory system. System overview The outer ear funnels sound vibrations to the eardrum, increasin ...

). Biologically, the source-location-specific prefiltering effects of these external structures aid in the neural determination of source location, particularly the determination of the source's

elevation The elevation of a geographic location is its height above or below a fixed reference point, most commonly a reference geoid, a mathematical model of the Earth's sea level as an equipotential gravitational surface (see Geodetic datum § Vert ...

(see

vertical sound localization Sound localization is a listener's ability to identify the location or origin of a detected sound in direction and distance. The sound localization mechanisms of the mammalian auditory system have been extensively studied. The auditory system us ...

Technical derivation

Linear systems analysis defines the

as the complex ratio between the output signal spectrum and the input signal spectrum as a function of frequency. Blauert (1974; cited in Blauert, 1981) initially defined the transfer function as the free-field transfer function (FFTF). Other terms include free-field to

transfer function and the pressure transformation from the free-field to the eardrum. Less specific descriptions include the pinna transfer function, the outer

transfer function, the pinna response, or directional transfer function (DTF). The transfer function ''H''(''f'') of any linear

time-invariant system In control theory, a time-invariant (TIV) system has a time-dependent system function that is not a direct function of time. Such systems are regarded as a class of systems in the field of system analysis. The time-dependent system function is ...

at frequency ''f'' is: :''H''(''f'') = Output(''f'') / Input(''f'') One method used to obtain the HRTF from a given source location is therefore to measure the head-related impulse response (HRIR), ''h''(''t''), at the ear drum for the impulse ''Δ''(''t'') placed at the source. The HRTF ''H''(''f'') is the

of the HRIR ''h''(''t''). Even when measured for a "dummy head" of idealized geometry, HRTF are complicated functions of

frequency Frequency is the number of occurrences of a repeating event per unit of time. It is also occasionally referred to as ''temporal frequency'' for clarity, and is distinct from ''angular frequency''. Frequency is measured in hertz (Hz) which is eq ...

and the three spatial variables. For distances greater than 1 m from the head, however, the HRTF can be said to attenuate inversely with range. It is this

far field The near field and far field are regions of the electromagnetic (EM) field around an object, such as a transmitting antenna, or the result of radiation scattering off an object. Non-radiative ''near-field'' behaviors dominate close to the ante ...

HRTF, ''H''(''f'', ''θ'', ''φ''), that has most often been measured. At closer range, the difference in level observed between the ears can grow quite large, even in the low-frequency region within which negligible level differences are observed in the far field. HRTFs are typically measured in an

anechoic chamber An anechoic chamber (''an-echoic'' meaning "non-reflective") is a room designed to stop reflections of either sound or electromagnetic waves. They are also often isolated from energy entering from their surroundings. This combination means ...

to minimize the influence of early reflections and

reverberation Reverberation (also known as reverb), in acoustics, is a persistence of sound, after a sound is produced. Reverberation is created when a sound or signal is reflected causing numerous reflections to build up and then decay as the sound is abso ...

on the measured response. HRTFs are measured at small increments of ''θ'' such as 15° or 30° in the horizontal plane, with

interpolation In the mathematical field of numerical analysis, interpolation is a type of estimation, a method of constructing (finding) new data points based on the range of a discrete set of known data points. In engineering and science, one often has a n ...

used to synthesize ''HRTF''s for arbitrary positions of ''θ''. Even with small increments, however, interpolation can lead to front-back confusion, and optimizing the interpolation procedure is an active area of research. In order to maximize the

signal-to-noise ratio Signal-to-noise ratio (SNR or S/N) is a measure used in science and engineering that compares the level of a desired signal to the level of background noise. SNR is defined as the ratio of signal power to the noise power, often expressed in deci ...

(SNR) in a measured HRTF, it is important that the impulse being generated be of high volume. In practice, however, it can be difficult to generate impulses at high volumes and, if generated, they can be damaging to human ears, so it is more common for HRTFs to be directly calculated in the

frequency domain In physics, electronics, control systems engineering, and statistics, the frequency domain refers to the analysis of mathematical functions or signals with respect to frequency, rather than time. Put simply, a time-domain graph shows how a signa ...

using a frequency-swept

sine wave A sine wave, sinusoidal wave, or just sinusoid is a curve, mathematical curve defined in terms of the ''sine'' trigonometric function, of which it is the graph of a function, graph. It is a type of continuous wave and also a Smoothness, smooth p ...

or by using

maximum length sequence A maximum length sequence (MLS) is a type of pseudorandom binary sequence. They are bit sequences generated using maximal linear-feedback shift registers and are so called because they are periodic and reproduce every binary sequence (except th ...

s. User fatigue is still a problem, however, highlighting the need for the ability to interpolate based on fewer measurements. The head-related transfer function is involved in resolving the Cone of Confusion, a series of points where ITD and ILD are identical for sound sources from many locations around the "0" part of the cone. When a sound is received by the ear it can either go straight down the ear into the ear canal or it can be reflected off the

pinnae The auricle or auricula is the visible part of the ear that is outside the head. It is also called the pinna (Latin for "wing" or " fin", plural pinnae), a term that is used more in zoology. Structure The diagram shows the shape and location ...

of the ear, into the ear canal a fraction of a second later. The sound will contain many frequencies, so therefore many copies of this signal will go down the ear all at different times depending on their frequency (according to reflection, diffraction, and their interaction with high and low frequencies and the size of the structures of the ear.) These copies overlap each other, and during this, certain signals are enhanced (where the phases of the signals match) while other copies are canceled out (where the phases of the signal do not match). Essentially, the brain is looking for frequency notches in the signal that correspond to particular known directions of sound. If another person's ears were substituted, the individual would not immediately be able to localize sound, as the patterns of enhancement and cancellation would be different from those patterns the person's auditory system is used to. However, after some weeks, the auditory system would adapt to the new head-related transfer function. The inter-subject variability in the spectra of HRTFs has been studied through cluster analyses.So, R.H.Y., Ngan, B., Horner, A., Leung, K.L., Braasch, J. and Blauert, J. (2010) Toward orthogonal non-individualized head-related transfer functions for forward and backward directional sound: cluster analysis and an experimental study. Ergonomics, 53(6), pp.767-781. Assessing the variation through changes between the person's ear, we can limit our perspective with the degrees of freedom of the head and its relation with the spatial domain. Through this, we eliminate the tilt and other co-ordinate parameters that add complexity. For the purpose of calibration we are only concerned with the direction level to our ears, ergo a specific degree of freedom. Some of the ways in which we can deduce an expression to calibrate the HRTF are: # Localization of sound in Virtual Auditory space # HRTF Phase synthesis # HRTF Magnitude synthesis

Localization of sound in virtual auditory space

A basic assumption in the creation of a virtual auditory space is that if the acoustical waveforms present at a listener's eardrums are the same under headphones as in free field, then the listener's experience should also be the same. Typically, sounds generated from headphones are perceived as originating from within the head. In the virtual auditory space, the headphones should be able to "externalize" the sound. Using the HRTF, sounds can be spatially positioned using the technique described below. Let ''x''(''t'') represent an electrical signal driving a loudspeaker and ''y''(''t'') represent the signal received by a microphone inside the listener's eardrum. Similarly, let ''x''(''t'') represent the electrical signal driving a headphone and ''y''(''t'') represent the microphone response to the signal. The goal of the virtual auditory space is to choose ''x''(''t'') such that ''y''(''t'') = ''y''(''t''). Applying the Fourier transform to these signals, we come up with the following two equations: : ''Y'' = ''X''LFM, and : ''Y'' = ''X''HM, where ''L'' is the transfer function of the loudspeaker in the free field, ''F'' is the HRTF, ''M'' is the microphone transfer function, and ''H'' is the headphone-to-eardrum transfer function. Setting ''Y'' = ''Y'', and solving for ''X'' yields : ''X'' = ''X''LF/H. By observation, the desired transfer function is : ''T''= ''LF''/''H''. Therefore, theoretically, if ''x''(''t'') is passed through this filter and the resulting ''x''(''t'') is played on the headphones, it should produce the same signal at the eardrum. Since the filter applies only to a single ear, another one must be derived for the other ear. This process is repeated for many places in the virtual environment to create an array of head-related transfer functions for each position to be recreated while ensuring that the sampling conditions are set by the Nyquist criteria.

HRTF phase synthesis

There is less reliable phase estimation in the very low part of the frequency band, and in the upper frequencies the phase response is affected by the features of the pinna. Earlier studies also show that the HRTF phase response is mostly linear and that listeners are insensitive to the details of the interaural phase spectrum as long as the interaural time delay (ITD) of the combined low-frequency part of the waveform is maintained. This is the modeled phase response of the subject HRTF as a time delay, dependent on the direction and elevation. A scaling factor is a function of the anthropometric features. For example, a training set of N subjects would consider each HRTF phase and describe a single ITD scaling factor as the average delay of the group. This computed scaling factor can estimate the time delay as function of the direction and elevation for any given individual. Converting the time delay to phase response for the left and the right ears is trivial. The HRTF phase can be described by the ITD scaling factor. This is in turn quantified by the anthropometric data of a given individual taken as the source of reference. For a generic case we consider ''β'' as a sparse vector :

\beta = \beta_1, \beta_2, \ldots, \beta_N T

that represents the subject's anthropometric features as a linear superposition of the anthropometric features from the training data (y = β X), and then apply the same sparse vector directly on the scaling vector H. We can write this task as a minimization problem, for a non-negative shrinking parameter ''λ'': :

\beta = \operatorname\limits_\beta \left( \sum_^A \left( y_a - \sum_^N \beta_n X_n^2 \right) + \lambda \sum_^N \beta_n \right)

From this, ITD scaling factor value H is estimated as: :

H' = \sum_^N \beta_n H_n.

where The ITD scaling factors for all persons in the dataset are stacked in a vector ''H'' ∈ ''R'', so the value ''H'' corresponds to the scaling factor of the n-th person.

HRTF magnitude synthesis

We solve the above minimization problem using

Least Absolute Shrinkage and Selection Operator In statistics and machine learning, lasso (least absolute shrinkage and selection operator; also Lasso or LASSO) is a regression analysis method that performs both variable selection and regularization in order to enhance the prediction accuracy ...

(LASSO). We assume that the HRTFs are represented by the same relation as the anthropometric features. Therefore, once we learn the sparse vector β from the anthropometric features, we directly apply it to the HRTF tensor data and the subject's HRTF values H given by: :

H'_ = \sum_^N \beta_n H_

where The HRTFs for each subject are described by a tensor of size ''D'' × ''K'', where ''D'' is the number of HRTF directions and ''K'' is the number of frequency bins. All ''H'' corresponds to all the HRTFs of the training set are stacked in a new tensor ''H'' ∈ ''R'', so the value H corresponds to the ''k''-th frequency bin for ''d''-th HRTF direction of the ''n''-th person. Also ''H'' corresponds to ''k''-th frequency for every d-th HRTF direction of the synthesized HRTF.

Recording technology

Recordings processed via an HRTF, such as in a computer gaming environment (see

A3D Aureal Semiconductor Inc. was an American electronics manufacturer, best known throughout the mid-late 1990s for their PC sound card technologies including A3D and the Vortex (a line of audio ASICs.) The company was the reincarnation of the, at ...

, EAX, and

OpenAL OpenAL (Open Audio Library) is a cross-platform audio application programming interface (API). It is designed for efficient rendering of multichannel three-dimensional positional audio. Its API style and conventions deliberately resemble those ...

), which approximates the HRTF of the listener, can be heard through stereo headphones or speakers and interpreted as if they comprise sounds coming from all directions, rather than just two points on either side of the head. The perceived accuracy of the result depends on how closely the HRTF data set matches the characteristics of one's own ears.

References

External links

{{Commons category
Spatial Sound Tutorial

High-resolution HRTF and 3D ear model database (48 subjects)

AIR Database (HRTF database in reverberant environments)

Full Sphere HRIR/HRTF Database of the Neumann KU100

ARI (Acoustics Research Institute) Database (90+ datasets)
Acoustics Electrical circuits Signal processing Control theory