Steered-response Power With Phase Transform
   HOME

TheInfoList



OR:

Steered-response power (SRP) is a family of
acoustic source localization Acoustic location is a method of determining the position of an object or sound source by using sound waves. Location can take place in gases (such as the atmosphere), liquids (such as water), and in solids (such as in the earth). Location can ...
algorithms that can be interpreted as a
beamforming Beamforming or spatial filtering is a signal processing technique used in sensor arrays for directional signal transmission or reception. This is achieved by combining elements in an antenna array in such a way that signals at particular angles ...
-based approach that searches for the candidate position or direction that maximizes the output of a steered delay-and-sum beamformer. Steered-response power with phase transform (SRP-PHAT) is a variant using a "phase transform" to make it more robust in adverse acoustic environments.


Algorithm


Steered-response power

Consider a system of M microphones, where each microphone is denoted by a subindex m \in \. The discrete-time output signal from a microphone is s_m(n). The (unweighted) steered-response power (SRP) at a spatial point \mathbf = , y, z\mathsf can be expressed as P_0(\mathbf) \triangleq \sum_ \left, \sum_^M s_m\big(n - \tau_m(\mathbf)\big) \^2, where \mathbb denotes the set of integer numbers, and \tau_m(\mathbf) would be the time-lag due to the propagation from a source located at \mathbf to the m-th microphone. The (weighted) SRP can be rewritten as P(\mathbf) = \frac \sum_^M \sum_^M \int_^\pi \Phi_(e^) S_(e^) S_^*(e^) e^ \,d\omega where ()^ denotes complex conjugation, S_m(e^) represents the
discrete-time Fourier transform In mathematics, the discrete-time Fourier transform (DTFT) is a form of Fourier analysis that is applicable to a sequence of discrete values. The DTFT is often used to analyze samples of a continuous function. The term ''discrete-time'' refers ...
of s_m(n), and \Phi_(e^) is a weighting function in the frequency domain (discussed later). The term \tau_(\mathbf) is the discrete time-difference of arrival (TDOA) of a signal emitted at position \mathbf to microphones m_1 and m_2, given by \tau_(\mathbf) \triangleq \left\lfloor f_s \frac \right\rceil, where f_s is the sampling frequency of the system, c is the sound propagation speed, \mathbf_m is the position of the m-th microphone, \, \cdot\, is the
2-norm In mathematics, a norm is a function from a real or complex vector space to the non-negative real numbers that behaves in certain ways like the distance from the origin: it commutes with scaling, obeys a form of the triangle inequality, and ze ...
, and \lfloor \cdot \rceil denotes the rounding operator.


Generalized cross-correlation

The above SRP objective function can be expressed as a sum of generalized cross-correlations (GCCs) for the different microphone pairs at the time-lag corresponding to their TDOA P(\mathbf) = \sum_^M \sum_^M R_(\tau_(\mathbf)), where the GCC for a microphone pair (m_1, m_2) is defined as R_(\tau) \triangleq \frac \int_^\pi \Phi_(e^) S_(e^) S_^*(e^) e^ \,d\omega. The phase transform (PHAT) is an effective GCC weighting for time delay estimation in reverberant environments, that forces the GCC to consider only the phase information of the involved signals: \Phi_(e^) \triangleq \frac.


Estimation of source location

The SRP-PHAT algorithm consists in a grid-search procedure that evaluates the objective function P(\mathbf) on a grid of candidate source locations \mathcal to estimate the spatial location \textbf_s of the sound source as the point of the grid that provides the maximum SRP: \hat_s = \arg \max_ P(\mathbf).


Modified SRP-PHAT

Modifications of the classical SRP-PHAT algorithm have been proposed to reduce the computational cost of the grid-search step of the algorithm and to increase the robustness of the method. In the classical SRP-PHAT, for each microphone pair and for each point of the grid, a unique integer TDOA value is selected to be the acoustic delay corresponding to that grid point. This procedure does not guarantee that all TDOAs are associated to points on the grid, nor that the spatial grid is consistent, since some of the points may not correspond to an intersection of hyperboloids. This issue becomes more problematic with coarse grids since, when the number of points is reduced, part of the TDOA information gets lost because most delays are not anymore associated to any point in the grid. The modified SRP-PHAT collects and uses the TDOA information related to the volume surrounding each spatial point of the search grid by considering a modified objective function: P'(\mathbf) = \sum_^M \sum_^M \sum_^ R_(\tau), where L^l_(\mathbf) and L^u_(\mathbf) are the lower and upper accumulation limits of GCC delays, which depend on the spatial location \mathbf.


Accumulation limits

The accumulation limits can be calculated beforehand in an exact way by exploring the boundaries separating the regions corresponding to the points of the grid. Alternatively, they can be selected by considering the spatial
gradient In vector calculus, the gradient of a scalar-valued differentiable function f of several variables is the vector field (or vector-valued function) \nabla f whose value at a point p gives the direction and the rate of fastest increase. The g ...
of the TDOA \nabla_(\mathbf) = nabla_(\mathbf), \nabla_(\mathbf), \nabla_ (\mathbf)\mathsf, where each component \gamma \in \ of the gradient is \nabla_(\mathbf) = \frac \left(\frac - \frac\right). For a rectangular grid where neighboring points are separated a distance r, the lower and upper accumulation limits are given by L^l_(\mathbf) = \tau_(\mathbf) - \, \nabla_(\mathbf)\, \cdot d, L^u_(\mathbf) = \tau_(\mathbf) + \, \nabla_(\mathbf)\, \cdot d, where d = \frac \min\left(\frac, \frac, \frac\right), and the gradient direction angles are given by \theta = \cos^\left(\frac\right), \phi = \arctan_2\left(\nabla_(\mathbf), \nabla_(\mathbf)\right).


See also

*
Acoustic source localization Acoustic location is a method of determining the position of an object or sound source by using sound waves. Location can take place in gases (such as the atmosphere), liquids (such as water), and in solids (such as in the earth). Location can ...
*
Multilateration Trilateration is the use of distances (or "ranges") for determining the unknown position coordinates of a point of interest, often around Earth ( geopositioning). When more than three distances are involved, it may be called multilateration, f ...
*
Audio signal processing Audio signal processing is a subfield of signal processing that is concerned with the electronic manipulation of audio signals. Audio signals are electronic representations of sound waves—longitudinal waves which travel through air, consisting ...


References

{{Reflist Acoustics Signal processing Digital signal processing