Ambisonics is a ''full-sphere''

surround sound Surround sound is a technique for enriching the fidelity and depth of sound reproduction by using multiple audio channels from speakers that surround the listener ( surround channels). Its first application was in movie theaters. Prior to ...

format: in addition to the horizontal plane, it covers sound sources above and below the listener. Unlike some other multichannel surround formats, its transmission channels do not carry speaker signals. Instead, they contain a speaker-independent representation of a sound field called ''B-format'', which is then ''decoded'' to the listener's speaker setup. This extra step allows the producer to think in terms of source directions rather than loudspeaker positions, and offers the listener a considerable degree of flexibility as to the layout and number of speakers used for playback. Ambisonics was developed in the UK in the 1970s under the auspices of the British

National Research Development Corporation The National Research Development Corporation (NRDC) was a non-departmental government body established by the British Government to transfer technology from the public sector to the private sector. History The NRDC was established by Attlee's La ...

. Despite its solid technical foundation and many advantages, Ambisonics had not until recently been a commercial success, and survived only in niche applications and among recording enthusiasts. With the widespread availability of powerful digital signal processing (as opposed to the expensive and error-prone analog circuitry that had to be used during its early years) and the successful market introduction of home theatre surround sound systems since the 1990s, interest in Ambisonics among recording engineers, sound designers, composers, media companies, broadcasters and researchers has returned and continues to increase. In particular, it has proved an effective way to present spatial audio in Virtual Reality applications (e.g. YouTube 360 Video), as the B-Format scene can be rotated to match the user's head orientation, and then be decoded as binaural stereo.

Introduction

Ambisonics can be understood as a three-dimensional extension of M/S (mid/side) stereo, adding additional difference channels for height and depth. The resulting signal set is called ''B-format''. Its component channels are labelled

W

for the sound pressure (the M in M/S),

X

for the front-minus-back sound pressure gradient,

Y

for left-minus-right (the S in M/S) and

Z

for up-minus-down.The traditional B-format notation is used in this introductory paragraph, since it is assumed that the reader may have come across it already. For higher-order Ambisonics, use of the ACN notation is recommended. The

W

signal corresponds to an omnidirectional microphone, whereas

XYZ

are the components that would be picked up by figure-of-eight capsules oriented along the three spatial axes.

Panning a source

A simple Ambisonic panner (or ''encoder'') takes a source signal

S

and two parameters, the horizontal angle

\theta

and the elevation angle

\phi

. It positions the source at the desired angle by distributing the signal over the Ambisonic components with different gains: :

W=S \cdot  \frac

X=S \cdot \cos\theta\cos\phi

Y=S \cdot \sin\theta\cos\phi

Z=S \cdot \sin\phi

Being omnidirectional, the

W

channel always gets the same constant input signal, regardless of the angles. So that it has more-or-less the same average energy as the other channels, W is attenuated by about 3 dB (precisely, divided by the square root of two). The terms for

XYZ

actually produce the polar patterns of figure-of-eight microphones (see illustration on the right, second row). We take their value at

\theta

and

\phi

, and multiply the result with the input signal. The result is that the input ends up in all components exactly as loud as the corresponding microphone would have picked it up.

Virtual microphones

The B-format components can be combined to derive ''virtual

microphone A microphone, colloquially called a mic (), or mike, is a transducer that converts sound into an electrical signal. Microphones are used in many applications such as telephones, hearing aids, public address systems for concert halls and publi ...

s'' with any first-order polar pattern (omnidirectional, cardioid, hypercardioid, figure-of-eight or anything in between) pointing in any direction. Several such microphones with different parameters can be derived at the same time, to create coincident stereo pairs (such as a Blumlein) or surround arrays. A horizontal virtual microphone at horizontal angle

\Theta

with pattern

0 \leq p \leq 1

is given by :

M(\Theta, p) = p\sqrt W + (1-p)(\cos\Theta X + \sin\Theta Y)

. This virtual mic is ''free-field normalised'', which means it has a constant gain of one for on-axis sounds. The illustration on the left shows some examples created with this formula. Virtual microphones can be manipulated in post-production: desired sounds can be picked out, unwanted ones suppressed, and the balance between direct and reverberant sound can be fine-tuned during mixing.

Decoding

A basic Ambisonic ''decoder'' is very similar to a set of virtual microphones. For perfectly regular layouts, a simplified decoder can be generated by pointing a virtual cardioid microphone in the direction of each speaker. Here is a square: :

LF = (\sqrtW + X + Y)\sqrt

LB = (\sqrtW - X + Y)\sqrt

RB = (\sqrtW - X - Y)\sqrt

RF = (\sqrtW + X - Y)\sqrt

The signs of the

X

and

Y

components are the important part, the rest are gain factors. The

Z

component is discarded, because it is not possible to reproduce height cues with just four loudspeakers in one plane. In practice, a real Ambisonic decoder requires a number of psycho-acoustic optimisations to work properly. Currently, the All-Round Ambisonic Decoder (AllRAD) can be regarded as the standard solution for loudspeaker-based playback, and Magnitude Least Squares (MagLS) or binaural decoding, as implemented for instance in the IEM and SPARTA Ambisonic production tools.Daniel Rudrich et al
''IEM Plug-in Suite''
2018 (accessed 2024)Leo McCormack
''Spatial Audio Real-Time Applications''
2019 (accessed 2024) Frequency-dependent decoding can also be used to produce binaural stereo; this is particularly relevant in Virtual Reality applications.

Higher-order Ambisonics

The spatial resolution of first-order Ambisonics as described above is quite low. In practice, that translates to slightly blurry sources, but also to a comparably small usable listening area or ''sweet spot''. The resolution can be increased and the sweet spot enlarged by adding groups of more selective directional components to the B-format. These no longer correspond to conventional microphone polar patterns, but rather look like clover leaves. The resulting signal set is then called ''Second-'', ''Third-'', or collectively, ''Higher-order Ambisonics''. For a given order

\ell

, full-sphere systems require

(\ell+1)^2

signal components, and

2\ell+1

components are needed for horizontal-only reproduction. Historically there have been several different format conventions for higher-order Ambisonics; for details see

Ambisonic data exchange formats Data exchange formats for Ambisonics have undergone radical changes since the early days of four-track magnetic tape. Researchers working on very high-order systems found no straightforward way to extend the traditional formats to suit their need ...

Comparison to other surround formats

Ambisonics differs from other surround formats in a number of aspects: * It requires only three channels for basic horizontal surround, and four channels for a full-sphere soundfield. Basic full-sphere replay requires a minimum of six loudspeakers (a minimum of four for horizontal). * The same program material can be decoded for varying numbers of loudspeakers. Moreover, a width-height mix can be played back on horizontal-only, stereo or even mono systems without losing content entirely (it will be folded to the horizontal plane and to the frontal quadrant, respectively). This allows producers to embrace with-height production without worrying about loss of information. * Ambisonics can be scaled to any desired spatial resolution at the cost of additional transmission channels and more speakers for playback. Higher-order material remains downwards compatible and can be played back at lower spatial resolution without requiring a special downmix. * The core technology of Ambisonics is free of patents, and a complete tool chain for production and listening is available as

free software Free software, libre software, libreware sometimes known as freedom-respecting software is computer software distributed open-source license, under terms that allow users to run the software for any purpose as well as to study, change, distribut ...

for all major

operating system An operating system (OS) is system software that manages computer hardware and software resources, and provides common daemon (computing), services for computer programs. Time-sharing operating systems scheduler (computing), schedule tasks for ...

s. On the downside, Ambisonics is: * Prone to strong coloration from

comb filter In signal processing, a comb filter is a Filter (signal processing), filter implemented by adding a delayed version of a signal processing, signal to itself, causing constructive and destructive Interference (wave propagation), interference. The ...

ing artifacts due to high coherence of neighbouring loudspeaker signals at lower orders * Unable to deliver the particular spaciousness of spaced omnidirectional microphones preferred by many classical sound engineers and listeners * Not supported by any major record label or media company. Although a number of Ambisonic UHJ format (UHJ) encoded tracks (principally classical) can be located, if with some difficulty, on services such as

Spotify Spotify (; ) is a List of companies of Sweden, Swedish Music streaming service, audio streaming and media service provider founded on 23 April 2006 by Daniel Ek and Martin Lorentzon. , it is one of the largest providers of music streaming services ...

. * Conceptually difficult for people to grasp, as opposed to the conventional ''"one channel, one speaker"'' paradigm. * More complicated for the consumer to set up, because of the decoding stage. * Sweet spot which is not found in other forms of surround sound such as VBAP * Worse localisation for point sources than amplitude panning and counter phase signals blurring imaging * Much more sensitive to speaker placement than other forms of surround sound that use amplitude panning

Theoretical foundation

Soundfield analysis (encoding)

The B-format signals comprise a truncated

spherical harmonic In mathematics and Outline of physical science, physical science, spherical harmonics are special functions defined on the surface of a sphere. They are often employed in solving partial differential equations in many scientific fields. The tabl ...

decomposition of the sound field. They correspond to the

sound pressure Sound pressure or acoustic pressure is the local pressure deviation from the ambient (average or equilibrium) atmospheric pressure, caused by a sound wave. In air, sound pressure can be measured using a microphone, and in water with a hydrophon ...

W

, and the three components of the pressure gradient

XYZ

(not to be confused with the related

particle velocity Particle velocity (denoted or ) is the velocity of a particle (real or imagined) in a medium as it transmits a wave. The SI unit of particle velocity is the metre per second (m/s). In many cases this is a longitudinal wave of pressure as with ...

) at a point in space. Together, these approximate the sound field on a sphere around the microphone; formally the first-order truncation of the

multipole expansion A multipole expansion is a mathematical series representing a function that depends on angles—usually the two angles used in the spherical coordinate system (the polar and azimuthal angles) for three-dimensional Euclidean space, \R^3. Multipo ...

W

(the mono signal) is the zero-order information, corresponding to a constant function on the sphere, while

XYZ

are the first-order terms (the dipoles or figures-of-eight). This first-order truncation is only an approximation of the overall sound field. The ''higher orders'' correspond to further terms of the multipole expansion of a function on the sphere in terms of spherical harmonics. In practice, higher orders require more speakers for playback, but increase the spatial resolution and enlarge the area where the sound field is reproduced perfectly (up to an upper boundary frequency). The radius

r

of this area for Ambisonic order

\ell

and frequency

f

is given by :

r\approx\frac

, where

c

denotes the speed of sound. This area becomes smaller than a human head above 600 Hz for first order or 1800 Hz for third-order. Accurate reproduction in a head-sized volume up to 20 kHz would require an order of 32 or more than 1000 loudspeakers. At those frequencies and listening positions where perfect soundfield

reconstruction Reconstruction may refer to: Politics, history, and sociology *Reconstruction (law), the transfer of a company's (or several companies') business to a new company *''Perestroika'' (Russian for "reconstruction"), a late 20th century Soviet Union ...

is no longer possible, Ambisonics reproduction has to focus on delivering correct directional cues to allow for good localisation even in the presence of reconstruction errors.

Psychoacoustics

The human hearing apparatus has very keen localisation on the horizontal plane (as fine as 2° source separation in some experiments). Two predominant cues, for different frequency ranges, can be identified:

Low-frequency localisation

At low frequencies, where the wavelength is large compared to the human head, an incoming sound diffracts around it, so that there is virtually no acoustic shadow and hence no level difference between the ears. In this range, the only available information is the phase relationship between the two ear signals, called ''interaural time difference'', or ''ITD''. Evaluating this time difference allows for precise localisation within a ''cone of confusion'': the angle of incidence is unambiguous, but the ITD is the same for sounds from the front or from the back. As long as the sound is not totally unknown to the subject, the confusion can usually be resolved by perceiving the timbral front-back variations caused by the ear flaps (or ''pinnae'').

High-frequency localisation

As the wavelength approaches twice the size of the head, phase relationships become ambiguous, since it is no longer clear whether the phase difference between the ears corresponds to one, two, or even more periods as the frequency goes up. Fortunately, the head will create a significant acoustic shadow in this range, which causes a slight difference in level between the ears. This is called the ''interaural level difference'', or ''ILD'' (the same cone of confusion applies). Combined, these two mechanisms provide localisation over the entire hearing range.

ITD and ILD reproduction in Ambisonics

Gerzon has shown that the quality of localisation cues in the reproduced sound field corresponds to two objective metrics: the length of the particle velocity vector

\vec

for the ITD, and the length of the energy vector

\vec

for the ILD. Gerzon and Barton (1992) define a decoder for horizontal surround to be ''Ambisonic'' if * the directions of

\vec

and

\vec

agree up to at least 4 kHz, * at frequencies below about 400 Hz,

\, \vec\, =1

for all azimuth angles, and * at frequencies from about 700 Hz to 4 kHz, the magnitude of

\vec

is ''"substantially maximised across as large a part of the 360° sound stage as possible"''.Michael A Gerzon, Geoffrey J Barton, "Ambisonic Decoders for HDTV", 92nd AES Convention, Vienna 1992. http://www.aes.org/e-lib/browse.cfm?elib=6788 In practice, satisfactory results are achieved at moderate orders even for very large listening areas.

Monoaural HRTF cue

Humans are also able to derive information about sound source location in 3D-space, taking into account height. Much of this ability is due to the shape of the head (especially the pinna) producing a variable frequency response depending on the angle of the source. The response can be measured by placing a microphone in a person's ear canal, then playing back sounds from various directions. The recorded head-related transfer function (HRTF) can then be used for rendering ambisonics to headphones, mimicking the effect of the head. HRTFs differ among person to person due to head shape variations, but a generic one can produce a satisfactory result.

Soundfield synthesis (decoding)

In principle, the

loudspeaker A loudspeaker (commonly referred to as a speaker or, more fully, a speaker system) is a combination of one or more speaker drivers, an enclosure, and electrical connections (possibly including a crossover network). The speaker driver is an ...

signals are derived by using a

linear combination In mathematics, a linear combination or superposition is an Expression (mathematics), expression constructed from a Set (mathematics), set of terms by multiplying each term by a constant and adding the results (e.g. a linear combination of ''x'' a ...

of the Ambisonic component signals, where each signal is dependent on the actual position of the speaker in relation to the center of an imaginary sphere the surface of which passes through all available speakers. In practice, slightly irregular distances of the speakers may be compensated with delay. True Ambisonics decoding however requires spatial equalisation of the signals to account for the differences in the high- and low-frequency sound localisation mechanisms in human hearing. A further refinement accounts for the distance of the listener from the loudspeakers (''near-field compensation''). A variety of more modern decoding methods are also in use.

Compatibility with existing distribution channels

Ambisonics decoders are not currently being marketed to end users in any significant way, and no native Ambisonic recordings are commercially available. Hence, content that has been produced in Ambisonics must be made available to consumers in stereo or discrete multichannel formats.

Stereo

Ambisonics content can be folded down to stereo automatically, without requiring a dedicated downmix. The most straightforward approach is to sample the B-format with a ''virtual stereo microphone''. The result is equivalent to a coincident stereo recording. Imaging will depend on the microphone geometry, but usually rear sources will be reproduced more softly and diffuse. Vertical information (from the

Z

channel) is omitted. Alternatively, the B-format can be matrix-encoded into ''UHJ format'', which is suitable for direct playback on stereo systems. As before, the vertical information will be discarded, but in addition to left-right reproduction, UHJ tries to retain some of the horizontal surround information by translating sources in the back into out-of-phase signals. This gives the listener some sense of rear localisation. Two-channel UHJ can also be decoded back into horizontal Ambisonics (with some loss of accuracy), if an Ambisonic playback system is available. Lossless UHJ up to four channels (including height information) exists but has never seen wide use. In all UHJ schemes, the first two channels are conventional left and right speaker feeds.

Multichannel formats

Likewise, it is possible to pre-decode Ambisonics material to arbitrary speaker layouts, such as

Quad QUaD, an acronym for QUEST at DASI, was a ground-based cosmic microwave background (CMB) polarization experiment at the South Pole. QUEST (Q and U Extragalactic Sub-mm Telescope) was the original name attributed to the bolometer detector instrume ...

, 5.1, 7.1, Auro 11.1, or even 22.2, again without manual intervention. The LFE channel is either omitted, or a special mix is created manually. Pre-decoding to 5.1 media has been known as ' during the early days of DVD audio, although the term is not in common use anymore. The obvious advantage of pre-decoding is that any surround listener can be able to experience Ambisonics; no special hardware is required beyond that found in a common home theatre system. The main disadvantage is that the flexibility of rendering a single, standard Ambisonics signal to any target speaker array is lost: the signal is assumes a specific "standard" layout and anyone listening with a different array may experience a degradation of localisation accuracy. Target layouts from 5.1 upwards usually surpass the spatial resolution of first-order Ambisonics, at least in the frontal quadrant. For optimal resolution, to avoid excessive crosstalk, and to steer around irregularities of the target layout, pre-decodings for such targets should be derived from source material in higher-order Ambisonics.

Production workflow

Ambisonic content can be created in two basic ways: by recording a sound with a suitable first- or higher-order microphone, or by taking separate monophonic sources and panning them to the desired positions. Content can also be manipulated while it is in B-format.

Ambisonic microphones

Native B-format arrays

Since the components of first-order Ambisonics correspond to physical microphone pickup patterns, it is entirely practical to record B-format directly, with three coincident microphones: an omnidirectional capsule, one forward-facing figure-8 capsule, and one left-facing figure-8 capsule, yielding the

W

X

and

Y

components. This is referred to as a ''native'' or ''Nimbus/Halliday'' microphone array, after its designer Dr Jonathan Halliday at

Nimbus Records Nimbus Records is a British record company based at Wyastone Leys, Ganarew, Herefordshire. It specialises in classical music recordings and was the first company in the UK to produce compact discs. Description Nimbus was founded in 1972 by C ...

, where it is used to record their extensive and continuing series of Ambisonic releases. An integrated native B-format microphone, the C700S has been manufactured and sold by Josephson Engineering since 1990. The primary difficulty inherent in this approach is that high-frequency localisation and clarity relies on the diaphragms approaching true coincidence. By stacking the capsules vertically, perfect coincidence for horizontal sources is obtained. However, sound from above or below will theoretically suffer from subtle comb filtering effects in the highest frequencies. In most instances this is not a limitation as sound sources far from the horizontal plane are typically from room reverberation. In addition, stacked figure-8 microphone elements have a deep null in the direction of their stacking axis such that the primary transducer in those directions is the central omnidirectional microphone. In practice this can produce less localisation error than either of the alternatives (tetrahedral arrays with processing, or a fourth microphone for the Z axis.) Native arrays are most commonly used for horizontal-only surround, because of increasing positional errors and shading effects when adding a fourth microphone.

The tetrahedral microphone

Since it is impossible to build a perfectly coincident microphone array, the next-best approach is to minimize and distribute the positional error as uniformly as possible. This can be achieved by arranging four cardioid or sub-cardioid capsules in a tetrahedron and equalising for uniform diffuse-field response. The capsule signals are then converted to B-format with a matrix operation. The Core Sound TetraMic was the first commercially available A-format ambisonic microphone. Introduced in 2006, it uses four cardioid capsules. Each TetraMic is individually calibrated, and a calibration file and A- to B-format encoder plug-in are provided with each microphone. Outside Ambisonics, tetrahedral microphones have become popular with location recording engineers working in stereo or 5.1 for their flexibility in post-production; here, the B-format is only used as an intermediate to derive virtual microphones.

Higher-order microphones

Above first-order, it is no longer possible to obtain Ambisonic components directly with single microphone capsules. Instead, higher-order difference signals are derived from several spatially distributed (usually omnidirectional) capsules using very sophisticated digital signal processing. The Core Sound OctoMic was the first commercially available second-order ambisonic microphone. Introduced in 2018, it uses eight cardioid capsules. Each OctoMic is individually calibrated, and a calibration file and A- to B-format encoder plug-in are provided with each microphone. The ZYLIA ZM-1 is a commercially available microphone capable of generating third-order ambisonic recordings, using 19 omni-directional capsules. The em64 Eigenmike from mh acoustics is a 64-channel spherical microphone array capable of sixth-order capture. The production of the em64 has superseded their previous em32 microphone. A recent paper by Peter Craven et al. (subsequently patented) describes the use of bi-directional capsules for higher order microphones to reduce the extremity of the equalisation involved. No microphones have yet been made using this idea.

Ambisonic panning

The most straightforward way to produce Ambisonic mixes of arbitrarily high order is to take monophonic sources and position them with an Ambisonic encoder. A full-sphere encoder usually has two parameters, azimuth (or horizon) and elevation angle. The encoder will distribute the source signal to the Ambisonic components such that, when decoded, the source will appear at the desired location. More sophisticated panners will additionally provide a radius parameter that will take care of distance-dependent attenuation and bass boost due to near-field effect. Hardware panning units and mixers for first-order Ambisonics have been available since the 1980s and have been used commercially. Today, panning plugins and other related software tools are available for all major digital audio workstations, often as

. However, due to arbitrary bus width restrictions, few professional

digital audio workstation A digital audio workstation (DAW ) is an electronic device or application software used for Sound recording and reproduction, recording, editing and producing audio files. DAWs come in a wide variety of configurations from a single software pr ...

s (DAW) support orders higher than second. Notable exceptions are

REAPER A reaper is a farm implement that reaps (cuts and often also gathers) crops at harvest when they are ripe. Usually the crop involved is a cereal grass, especially wheat. The first documented reaping machines were Gallic reapers that were used ...

, Pyramix, ProTools,

Nuendo Nuendo is a digital audio workstation (DAW) developed by Steinberg for music recording, arranging, editing, and post-production. The package is aimed at audio and video post-production market segments (marketed as a 'Premium Media Production Sys ...

and Ardour.

Ambisonic manipulation

First order B-format can be manipulated in various ways to change the contents of an auditory scene. Well known manipulations include "rotation" and "dominance" (moving sources towards or away from a particular direction). Additionally, linear time-invariant

signal processing Signal processing is an electrical engineering subfield that focuses on analyzing, modifying and synthesizing ''signals'', such as audio signal processing, sound, image processing, images, Scalar potential, potential fields, Seismic tomograph ...

such as equalisation can be applied to B-format without disrupting sound directions, as long as it applied to all component channels equally. More recent developments in higher order Ambisonics enable a wide range of manipulations including rotation, reflection, movement, 3D

reverb In acoustics, reverberation (commonly shortened to reverb) is a persistence of sound after it is produced. It is often created when a sound is reflected on surfaces, causing multiple reflections that build up and then decay as the sound is a ...

, upmixing from legacy formats such as 5.1 or first order, visualisation and directionally-dependent masking and equalisation.

Data exchange

Transmitting Ambisonic B-format between devices and to end-users requires a standardized exchange format. While ''traditional first-order B-format'' is well-defined and universally understood, there are conflicting conventions for higher-order Ambisonics, differing both in channel order and weighting, which might need to be supported for some time. Traditionally, the ''Furse-Malham (FuMa) higher order format'' in the .amb container based on Microsoft's WAVE-EX file format.Richard Dobso
''The AMB Ambisonic File Format''
It scales up to third order and has a file size limitation of 4GB. The current B-format standard format is AmbiX proposal, which adopts the .caf file format and does away with the 4GB limit. It scales to arbitrarily high orders and is based on SN3D encoding. SN3D encoding has been adopted by Google as the basis for its YouTube 360 format.

Compressed distribution

To effectively distribute Ambisonic data to non-professionals,

lossy compression In information technology, lossy compression or irreversible compression is the class of data compression methods that uses inexact approximations and partial data discarding to represent the content. These techniques are used to reduce data size ...

is desired to keep the data size acceptable. However, simple multi-mono compression is not sufficient, as lossy compression tends to destroy phase information and thus degrade localization in the form of spatial reduction, blur, and phantom source. Reduction of redundancy among channels is desired, not only to enhance compression, but also to reduce the risk of dicernable phase errors. (It is also possible to use post-processing to hide the artifacts.) As with mid-side joint stereo encoding, a static matrixing scheme (as in Opus) is usable for first-order ambisonics, but not optimal in case of multiple sources. A number of schemes such as DirAC use a scheme similar to parametric stereo, where a downmixed signal is encoded, the principal direction recorded, and some description of ambiance added. MPEG-H 3D Audio, drawing on some work from

MPEG Surround MPEG Surround (ISO/IEC 23003-1 or MPEG-D Part 1), also known as Spatial Audio Coding (SAC), is a lossy compression format for surround sound that provides a method for extending mono or stereo audio services to multi-channel audio in a backwar ...

, extends the concept to handle multiple sources. MPEG-H uses

principal component analysis Principal component analysis (PCA) is a linear dimensionality reduction technique with applications in exploratory data analysis, visualization and data preprocessing. The data is linearly transformed onto a new coordinate system such that th ...

to determine the main sources and then encodes a multi-mono signal corresponding to the principal directions. These parametric methods provide good quality, so long as they take good care in smoothing sound directions among frames. PCA/SVD is applicable for first-order as well as high-order ambisonics input.

Current development

Open source

Since 2018 a free and open source implementation exists in the IEM Plugin Suite and the SPARTA suite that implement the recent academic developments and the sound codec Opus. Opus provides two channel encoding modes: one that simply stores channels individually, and another that weights the channels through a fixed, invertible matrix to reduce redundancy. A listening-test of Opus ambisonics was published in 2020, as calibration for AMBIQUAL, an objective metric for compressed ambisonics by Google. Opus third-order ambisonics at 256 kbps has similar localization accuracy compared to Opus first-order ambisonics at 128 kbps.

Corporate interest

Since its adoption by Google and other manufacturers as the audio format of choice for

virtual reality Virtual reality (VR) is a Simulation, simulated experience that employs 3D near-eye displays and pose tracking to give the user an immersive feel of a virtual world. Applications of virtual reality include entertainment (particularly video gam ...

, Ambisonics has seen a surge of interest. In 2018, Sennheiser released its VR microphone, and Zoom released an Ambisonics Field Recorder. Both are implementations of the tetrahedral microphone design which produces first order Ambisonics. A number of companies are currently conducting research in Ambisonics: *

BBC The British Broadcasting Corporation (BBC) is a British public service broadcaster headquartered at Broadcasting House in London, England. Originally established in 1922 as the British Broadcasting Company, it evolved into its current sta ...

Technicolor Technicolor is a family of Color motion picture film, color motion picture processes. The first version, Process 1, was introduced in 1916, and improved versions followed over several decades. Definitive Technicolor movies using three black-and ...

Research and Innovation/Thomson Licensing

Dolby Laboratories Dolby Laboratories, Inc. (Dolby Labs or simply Dolby) is a British-American technology corporation specializing in audio noise reduction, audio data compression, audio encoding/compression, spatial audio, and high-dynamic-range television (H ...

have expressed "interest" in Ambisonics by acquiring (and liquidating) Barcelona-based Ambisonics specialist imm sound prior to launching

Dolby Atmos Dolby Atmos is a surround sound technology developed by Dolby Laboratories. It expands on existing surround sound systems by adding height channels as well as free-moving sound objects, interpreted as three-dimensional objects with neither horiz ...

, which, although its precise workings are undisclosed, does implement decoupling between source direction and actual loudspeaker positions. Atmos takes a fundamentally different approach in that it does not attempt to transmit a sound field; it transmits discrete premixes or stems (i.e., raw streams of sound data) along with metadata about what location and direction they should appear to be coming from. The stems are then decoded, mixed, and rendered in real time using whatever loudspeakers are available at the playback location.

Use in gaming

Higher-order Ambisonics has found a niche market in video games developed by

Codemasters The Codemasters Software Company Limited (trade name: Codemasters) is a British video game developer and former publisher based in Southam, England, which is a subsidiary of American corporation Electronic Arts and managed under the EA Sports ...

. Their first game to use an Ambisonic audio engine was Colin McRae: DiRT, however, this only used Ambisonics on the

PlayStation 3 The PlayStation 3 (PS3) is a home video game console developed and marketed by Sony Computer Entertainment (SCE). It is the successor to the PlayStation 2, and both are part of the PlayStation brand of consoles. The PS3 was first released on ...

platform. Their game Race Driver: GRID extended the use of Ambisonics to the

Xbox 360 The Xbox 360 is a home video game console developed by Microsoft. As the successor to the Xbox (console), original Xbox, it is the second console in the Xbox#Consoles, Xbox series. It was officially unveiled on MTV on May 12, 2005, with detail ...

platform, and Colin McRae: DiRT 2 uses Ambisonics on all platforms including the PC. The recent games from Codemasters, F1 2010, Dirt 3, F1 2011 and Dirt: Showdown, use fourth-order Ambisonics on faster PCs, rendered by Blue Ripple Sound's Rapture3D

OpenAL OpenAL (Open Audio Library) is a cross-platform audio application programming interface (API). It is designed for efficient rendering of multichannel three-dimensional positional audio. Its API style and conventions deliberately resemble those o ...

driver and pre-mixed Ambisonic audio produced using Bruce Wiggins' WigWare Ambisonic Plug-ins. OpenAL Sof

a free and open source implementation of the OpenAL specification, also uses Ambisonics to render 3D audio. OpenAL Soft can often be used as a drop-in replacement for other OpenAL implementations, enabling OpenAL#Games, several games that use the OpenAL API to benefit from rendering audio with Ambisonics. For many games that do not make use of the OpenAL API natively, the use of a wrapper or a chain of wrappers can help to make these games indirectly use the OpenAL API. Ultimately, this leads to the sound being rendered in Ambisonics if a capable OpenAL driver such as OpenAL Soft is being used. The

Unreal Engine Unreal Engine (UE) is a 3D computer graphics game engine developed by Epic Games, first showcased in the 1998 first-person shooter video game '' Unreal''. Initially developed for PC first-person shooters, it has since been used in a variety of ...

supports soundfield Ambisonics rendering since version 4.25. The Unity engine supports working with Ambisonics audio clips since version 2017.1.

Patents and trademarks

Most of the patents covering Ambisonic developments have now expired (including those covering the

Soundfield microphone The Soundfield microphone is an audio microphone composed of four closely spaced subcardioid or cardioid (unidirectional) microphone capsules arranged in a tetrahedron. It was invented by Michael Gerzon and Peter Craven, and is a part of, but not ...

) and, as a result, the basic technology is available for anyone to implement. The "pool" of patents comprising Ambisonics technology was originally assembled by the UK Government's National Research & Development Corporation (NRDC), which existed until the late 1970s to develop and promote British inventions and license them to commercial manufacturers – ideally to a single licensee. The system was ultimately licensed to

(now owned by Wyastone Estate Ltd). The "interlocking circles" Ambisonic logo (UK trademarks and ), and the text marks "AMBISONIC" and "A M B I S O N" (UK trademarks and ), formerly owned by Wyastone Estate Ltd., have expired as of 2010.

Notes

References

{{Reflist

External links

Ambisonic.net
website
Ambisonia
a repository of Ambisonic recordings and compositions
Ambisonic.info
website of Ambisonic field recordist Paul Hodges

at the University of Parma
Ambisonic resources
at the University of York
Higher Order Ambisonic Technical Notes
at Blue Ripple Sound
Ambisonics
on Xiph wiki, a resource aimed at file format developers Surround sound Sound technology