Generative topographic map (GTM) is a
machine learning
Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of Computational statistics, statistical algorithms that can learn from data and generalise to unseen data, and thus perform Task ( ...
method that is a probabilistic counterpart of the
self-organizing map
A self-organizing map (SOM) or self-organizing feature map (SOFM) is an unsupervised machine learning technique used to produce a low-dimensional (typically two-dimensional) representation of a higher-dimensional data set while preserving the t ...
(SOM), is probably convergent and does not require a shrinking
neighborhood
A neighbourhood (Commonwealth English) or neighborhood (American English) is a geographically localized community within a larger town, city, suburb or rural area, sometimes consisting of a single street and the buildings lining it. Neigh ...
or a decreasing step size. It is a
generative model
In statistical classification, two main approaches are called the generative approach and the discriminative approach. These compute classifiers by different approaches, differing in the degree of statistical modelling. Terminology is inconsiste ...
: the data is assumed to arise by first probabilistically picking a point in a low-dimensional space, mapping the point to the observed high-dimensional input space (via a smooth function), then adding noise in that space. The parameters of the low-dimensional probability distribution, the smooth map and the noise are all learned from the training data using the
expectation–maximization (EM) algorithm. GTM was introduced in 1996 in a paper by
Christopher Bishop
Christopher Michael Bishop (born 7 April 1959) is a British computer scientist. He is a Microsoft Technical Fellow and Director oMicrosoft Research AI4Science He is also Honorary Professor of Computer Science at the University of Edinburgh, and ...
, Markus Svensen, and Christopher K. I. Williams.
Details of the algorithm
The approach is strongly related to
density networks which use
importance sampling
Importance sampling is a Monte Carlo method for evaluating properties of a particular distribution, while only having samples generated from a different distribution than the distribution of interest. Its introduction in statistics is generally at ...
and a
multi-layer perceptron to form a non-linear
latent variable model
A latent variable model is a statistical model that relates a set of observable variables (also called ''manifest variables'' or ''indicators'') to a set of latent variables. Latent variable models are applied across a wide range of fields such ...
. In the GTM the latent space is a discrete grid of points which is assumed to be non-linearly projected into data space. A
Gaussian noise
Carl Friedrich Gauss (1777–1855) is the eponym of all of the topics listed below.
There are over 100 topics all named after this German mathematician and scientist, all in the fields of mathematics, physics, and astronomy. The English eponymo ...
assumption is then made in data space so that the model becomes a constrained
mixture of Gaussians. Then the model's likelihood can be maximized by EM.
In theory, an arbitrary nonlinear parametric deformation could be used. The optimal parameters could be found by gradient descent, etc.
The suggested approach to the nonlinear mapping is to use a
radial basis function network
In the field of mathematical modeling, a radial basis function network is an artificial neural network that uses radial basis functions as activation functions. The output of the network is a linear combination of radial basis functions of the in ...
(RBF) to create a nonlinear mapping between the latent space and the data space. The nodes of the
RBF network then form a
feature space
Feature may refer to:
Computing
* Feature recognition, could be a hole, pocket, or notch
* Feature (computer vision), could be an edge, corner or blob
* Feature (machine learning), in statistics: individual measurable properties of the phenom ...
and the nonlinear mapping can then be taken as a
linear transform of this feature space. This approach has the advantage over the suggested density network approach that it can be optimised analytically.
Uses
In data analysis, GTMs are like a nonlinear version of
principal components analysis
Principal component analysis (PCA) is a Linear map, linear dimensionality reduction technique with applications in exploratory data analysis, visualization and Data Preprocessing, data preprocessing.
The data is linear map, linearly transformed ...
, which allows high-dimensional data to be modelled as resulting from Gaussian noise added to sources in lower-dimensional latent space. For example, to locate stocks in plottable 2D space based on their hi-D time-series shapes. Other applications may want to have fewer sources than data points, for example mixture models.
In generative
deformational modelling, the latent and data spaces have the same dimensions, for example, 2D images or 1 audio sound waves. Extra 'empty' dimensions are added to the source (known as the 'template' in this form of modelling), for example locating the 1D sound wave in 2D space. Further nonlinear dimensions are then added, produced by combining the original dimensions. The enlarged latent space is then projected back into the 1D data space. The probability of a given projection is, as before, given by the product of the likelihood of the data under the Gaussian noise model with the prior on the deformation parameter. Unlike conventional spring-based deformation modelling, this has the advantage of being analytically optimizable. The disadvantage is that it is a 'data-mining' approach, i.e. the shape of the deformation prior is unlikely to be meaningful as an explanation of the possible deformations, as it is based on a very high, artificial- and arbitrarily constructed nonlinear latent space. For this reason the prior is learned from data rather than created by a human expert, as is possible for spring-based models.
Comparison with Kohonen's self-organizing maps
While nodes in the
self-organizing map (SOM) can wander around at will, GTM nodes are constrained by the allowable transformations and their probabilities. If the deformations are well-behaved the topology of the latent space is preserved.
The SOM was created as a biological model of neurons and is a heuristic algorithm. By contrast, the GTM has nothing to do with neuroscience or cognition and is a probabilistically principled model. Thus, it has a number of advantages over SOM, namely:
* it explicitly formulates a density model over the data.
* it uses a cost function that quantifies how well the map is trained.
* it uses a sound optimization procedure (
EM algorithm
EM, Em or em may refer to:
Arts and entertainment Music
* Em, the E minor musical scale
* Em, the E minor chord
* Electronic music, music that employs electronic musical instruments and electronic music technology in its production
* Encyclopedia ...
).
GTM was introduced by Bishop, Svensen and Williams in their Technical Report in 1997 (Technical Report NCRG/96/015, Aston University, UK) published later in Neural Computation. It was also described in the
PhD
A Doctor of Philosophy (PhD, DPhil; or ) is a terminal degree that usually denotes the highest level of academic achievement in a given discipline and is awarded following a course of graduate study and original research. The name of the deg ...
thesis of Markus Svensen (Aston, 1998).
Applications
{{Empty section, date=May 2015
See also
*
Self-organizing map (SOM)
*
Neural network (machine learning)
In machine learning, a neural network (also artificial neural network or neural net, abbreviated ANN or NN) is a computational model inspired by the structure and functions of biological neural networks.
A neural network consists of connected ...
aka Artificial Neural Network (ANN)
*
Connectionism
Connectionism is an approach to the study of human mental processes and cognition that utilizes mathematical models known as connectionist networks or artificial neural networks.
Connectionism has had many "waves" since its beginnings. The first ...
*
Data mining
Data mining is the process of extracting and finding patterns in massive data sets involving methods at the intersection of machine learning, statistics, and database systems. Data mining is an interdisciplinary subfield of computer science and ...
*
Machine learning
Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of Computational statistics, statistical algorithms that can learn from data and generalise to unseen data, and thus perform Task ( ...
*
Nonlinear dimensionality reduction
Nonlinear dimensionality reduction, also known as manifold learning, is any of various related techniques that aim to project high-dimensional data, potentially existing across non-linear manifolds which cannot be adequately captured by linear de ...
*
Neural network software
Neural network software is used to simulate, research, develop, and apply artificial neural networks, software concepts adapted from biological neural networks, and in some cases, a wider array of adaptive systems such as artificial intelligenc ...
*
Pattern recognition
Pattern recognition is the task of assigning a class to an observation based on patterns extracted from data. While similar, pattern recognition (PR) is not to be confused with pattern machines (PM) which may possess PR capabilities but their p ...
External links
Bishop, Svensen and Williams Generative Topographic Mapping paperGenerative topographic mappingdeveloped at the Neural Computing Research Group os Aston University (UK). ( Matlab toolbox )
Artificial neural networks