Pyramid Vector Quantization
   HOME

TheInfoList



OR:

Pyramid vector quantization (PVQ) is a method used in audio and video
codec A codec is a device or computer program that encodes or decodes a data stream or signal. ''Codec'' is a portmanteau of coder/decoder. In electronic communications, an endec is a device that acts as both an encoder and a decoder on a signal or ...
s to quantize and transmit
unit vector In mathematics, a unit vector in a normed vector space is a vector (often a spatial vector) of length 1. A unit vector is often denoted by a lowercase letter with a circumflex, or "hat", as in \hat (pronounced "v-hat"). The term ''direction v ...
s, i.e. vectors whose magnitudes are known to the decoder but whose directions are unknown. PVQ may also be used as part of a gain/shape quantization scheme, whereby the magnitude and direction of a vector are quantized separately from each other. PVQ was initially described in 1986 in the paper "A Pyramid Vector Quantizer" by Thomas R. Fischer. One caveat of PVQ is that it operates under the
taxicab distance A taxicab geometry or a Manhattan geometry is a geometry whose usual distance function or metric of Euclidean geometry is replaced by a new metric in which the distance between two points is the sum of the absolute differences of their Cartesian co ...
(L1-norm). Conversion to/from the more familiar
Euclidean distance In mathematics, the Euclidean distance between two points in Euclidean space is the length of a line segment between the two points. It can be calculated from the Cartesian coordinates of the points using the Pythagorean theorem, therefor ...
(L2-norm) is possible via
vector projection The vector projection of a vector on (or onto) a nonzero vector , sometimes denoted \operatorname_\mathbf \mathbf (also known as the vector component or vector resolution of in the direction of ), is the orthogonal projection of onto a straig ...
, though results in a less uniform distribution of quantization points (the poles of the Euclidean ''n''-sphere become denser than non-poles). No efficient algorithm for the ideal (i.e., uniform) vector quantization of the Euclidean ''n''-sphere is known as of 2010. This non-uniformity can be reduced by applying deformation like coordinate-wise power before projection, reducing mean-squared quantization error by ~10%. PVQ is used in the CELT audio codec (now
Opus ''Opus'' (pl. ''opera'') is a Latin word meaning "work". Italian equivalents are ''opera'' (singular) and ''opere'' (pl.). Opus or OPUS may refer to: Arts and entertainment Music * Opus number, (abbr. Op.) specifying order of (usually) publicatio ...
) and the
Daala Daala is a video coding format under development by the Xiph.Org Foundation under the lead of Timothy B. Terriberry mainly sponsored by the Mozilla Corporation. Like Theora and Opus, Daala is available free of any royalties and its reference im ...
video codec.


Overview

As a form of
vector quantization Vector quantization (VQ) is a classical quantization technique from signal processing that allows the modeling of probability density functions by the distribution of prototype vectors. It was originally used for data compression. It works by di ...
, PVQ defines a codebook of quantization points, each of which is assigned an
integer An integer is the number zero (), a positive natural number (, , , etc.) or a negative integer with a minus sign ( −1, −2, −3, etc.). The negative numbers are the additive inverses of the corresponding positive numbers. In the languag ...
codeword from 0 to −1. The goal of the encoder is to find the codeword of the closest vector, which the decoder must decode back into a vector. The PVQ codebook consists of all -dimensional points \vec with integer-only coordinates whose absolute values sum to a constant (i.e. whose
L1-norm In mathematics, the spaces are function spaces defined using a natural generalization of the -norm for finite-dimensional vector spaces. They are sometimes called Lebesgue spaces, named after Henri Lebesgue , although according to the Bourbaki ...
equals ). In set-builder notation: : S(N,K)=\left\ where \left\, \vec \right\, _1 denotes the L1-norm of \vec. As it stands, the set tesselates the surface of an -dimensional pyramid. If desired, we may reshape it into a sphere by "projecting" the points onto the sphere, i.e. by normalizing them: : S_\text(N,K)=\left\ where \left\, \vec \right\, _2 denotes the
L2-norm In mathematics, a norm is a function from a real or complex vector space to the non-negative real numbers that behaves in certain ways like the distance from the origin: it commutes with scaling, obeys a form of the triangle inequality, and is z ...
of \vec. Increasing the parameter results in more quantization points, and hence typically yields a more "accurate" approximation of the original
unit vector In mathematics, a unit vector in a normed vector space is a vector (often a spatial vector) of length 1. A unit vector is often denoted by a lowercase letter with a circumflex, or "hat", as in \hat (pronounced "v-hat"). The term ''direction v ...
\vec at the cost of larger integer codewords that require more bits to transmit.


Example

Suppose we wish to quantize three-dimensional unit vectors using the parameter =2. Our codebook becomes: (0.707 = \sqrt/2 rounded to 3 decimal places.) Now, suppose we wish to transmit the unit vector <0.592, −0.720, 0.362> (rounded here to 3 decimal places, for clarity). According to our codebook, the closest point we can pick is codeword 13 (<0.707, −0.707, 0.000>), located approximately 0.381 units away from our original point. Increasing the parameter results in a larger codebook, which typically increases the reconstruction accuracy. For example, based on the Python code below, =5 (codebook size: 102) yields an error of only 0.097 units, and =20 (codebook size: 1602) yields an error of only 0.042 units.


Python code

import itertools import math from typing import List, NamedTuple, Tuple class PVQEntry(NamedTuple): codeword: int point: Tuple nt, ... normalizedPoint: Tuple loat, ... def create_pvq_codebook(n: int, k: int) -> List VQEntry """ Naive algorithm to generate an n-dimensional PVQ codebook with k pulses. Runtime complexity: O(k**n) """ ret = [] for p in itertools.product(range(-k, k + 1), repeat=n): if sum(abs(x) for x in p)

k: norm = math.sqrt(sum(abs(x) ** 2 for x in p)) q = tuple(x / norm for x in p) ret.append(PVQEntry(len(ret), p, q)) return ret def search_pvq_codebook( codebook: List VQEntry p: Tuple loat, ...) -> Tuple VQEntry, float """ Naive algorithm to search the PVQ codebook. Returns the point in the codebook that's "closest" to p, according to the Euclidean distance.) """ ret = None min_dist = None for i in range(len(codebook)): q = codebook normalizedPoint dist = math.sqrt(sum(abs(q - p ** 2 for j in range(len(p)))) if min_dist is None or dist < min_dist: ret = codebook min_dist = dist return ret, min_dist def example(p: Tuple loat, ... k: int) -> None: n = len(p) codebook = create_pvq_codebook(n, k) print("Number of codebook entries: " + str(len(codebook))) entry, dist = search_pvq_codebook(codebook, p) print("Best entry: " + str(entry)) print("Distance: " + str(dist)) phi = 1.2 theta = 5.4 x = math.sin(phi) * math.cos(theta) y = math.sin(phi) * math.sin(theta) z = math.cos(phi) p = (x, y, z) example(p, 2) example(p, 5) example(p, 20)


Complexity

The PVQ codebook can be searched in O(KN). Encoding and decoding can likewise be performed in O(KN) using O(K+N) memory. The codebook size obeys the recurrence : V(N,K) = V(N-1,K) + V(N,K-1) + V(N-1,K-1) with V(N,0) = 1 for all N \ge 0 and V(0,K) = 0 for all K \ne 0. A closed-form solution is given by : V(N,K) = 2N \cdot _2F_1 (1-K,1-N;2;2). where _2F_1 is the hypergeometric function.


See also

*
Vector quantization Vector quantization (VQ) is a classical quantization technique from signal processing that allows the modeling of probability density functions by the distribution of prototype vectors. It was originally used for data compression. It works by di ...
*
ACELP Algebraic code-excited linear prediction (ACELP) is a speech coding algorithm in which a limited set of pulses is distributed as excitation to a linear prediction filter. It is a linear predictive coding (LPC) algorithm that is based on the cod ...
*
Opus (audio format) Opus is a lossy audio coding format developed by the Xiph.Org Foundation and standardized by the Internet Engineering Task Force, designed to efficiently code speech and general audio in a single format, while remaining low-latency enough fo ...


References

{{Reflist Lossy compression algorithms