Count sketch is a type of
dimensionality reduction
Dimensionality reduction, or dimension reduction, is the transformation of data from a high-dimensional space into a low-dimensional space so that the low-dimensional representation retains some meaningful properties of the original data, ideally ...
that is particularly efficient in
statistics,
machine learning
Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence.
Machine ...
and
algorithms
In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific problems or to perform a computation. Algorithms are used as specifications for performing ...
.
It was invented by
Moses Charikar, Kevin Chen and Martin Farach-Colton in an effort to speed up the
AMS Sketch AMS or Ams may refer to:
Organizations Companies
* Alenia Marconi Systems
* American Management Systems
* AMS (Advanced Music Systems)
* ams AG, semiconductor manufacturer
* AMS Pictures
* Auxiliary Medical Services
Educational institutions
* ...
by Alon, Matias and Szegedy for approximating the frequency moments of streams.
The sketch is nearly identical to the
Feature hashing In machine learning, feature hashing, also known as the hashing trick (by analogy to the kernel trick), is a fast and space-efficient way of vectorizing features, i.e. turning arbitrary features into indices in a vector or matrix. It works by applyi ...
algorithm by John Moody, but differs in its use of hash functions with low dependence, which makes it more practical.
In order to still have a high probability of success, the
median trick is used to aggregate multiple count sketches, rather than the mean.
These properties allow use for explicit
kernel methods
In machine learning, kernel machines are a class of algorithms for pattern analysis, whose best known member is the support-vector machine (SVM). The general task of pattern analysis is to find and study general types of relations (for example c ...
, bilinear
pooling in
neural network
A neural network is a network or neural circuit, circuit of biological neurons, or, in a modern sense, an artificial neural network, composed of artificial neurons or nodes. Thus, a neural network is either a biological neural network, made up ...
s and is a cornerstone in many numerical linear algebra algorithms.
[Woodruff, David P. "Sketching as a Tool for Numerical Linear Algebra." Theoretical Computer Science 10.1-2 (2014): 1–157.]
Mathematical definition
1. For constants
and
(to be defined later) independently choose
random hash functions
and
such that