Neural gas is an

artificial neural network Artificial neural networks (ANNs), usually simply called neural networks (NNs) or neural nets, are computing systems inspired by the biological neural networks that constitute animal brains. An ANN is based on a collection of connected unit ...

, inspired by the self-organizing map and introduced in 1991 by Thomas Martinetz and

Klaus Schulten Klaus Schulten (January 12, 1947 – October 31, 2016) was a German-American computational biophysicist and the Swanlund Professor of Physics at the University of Illinois at Urbana-Champaign. Schulten used supercomputing techniques to app ...

. The neural gas is a simple algorithm for finding optimal data representations based on feature vectors. The algorithm was coined "neural gas" because of the dynamics of the feature vectors during the adaptation process, which distribute themselves like a gas within the data space. It is applied where data compression or vector quantization is an issue, for example

speech recognition Speech recognition is an interdisciplinary subfield of computer science and computational linguistics that develops methodologies and technologies that enable the recognition and translation of spoken language into text by computers with the m ...

image processing An image is a visual representation of something. It can be two-dimensional, three-dimensional, or somehow otherwise feed into the visual system to convey information. An image can be an artifact, such as a photograph or other two-dimensiona ...

or pattern recognition. As a robustly converging alternative to the

k-means clustering ''k''-means clustering is a method of vector quantization, originally from signal processing, that aims to partition ''n'' observations into ''k'' clusters in which each observation belongs to the cluster with the nearest mean (cluster centers or ...

it is also used for cluster analysis.

Algorithm

Given a probability distribution

P(x)

of data vectors

x

and a finite number of feature vectors

w_i,
 i = 1,\cdots,N

. With each time step

t

, a data vector

x

randomly chosen from

P(x)

is presented. Subsequently, the distance order of the feature vectors to the given data vector

x

is determined. Let

i_0

denote the index of the closest feature vector,

i_1

the index of the second closest feature vector, and

i_

the index of the feature vector most distant to

x

. Then each feature vector is adapted according to

w_^ = w_^ + \varepsilon\cdot  e^\cdot (x-w_^), k = 0, \cdots, N-1

with

\varepsilon

as the adaptation step size and

\lambda

as the so-called neighborhood range.

\varepsilon

and

\lambda

are reduced with increasing

t

. After sufficiently many adaptation steps the feature vectors cover the data space with minimum representation error. The adaptation step of the neural gas can be interpreted as gradient descent on a Loss function, cost function. By adapting not only the closest feature vector but all of them with a step size decreasing with increasing distance order, compared to (online)

a much more robust convergence of the algorithm can be achieved. The neural gas model does not delete a node and also does not create new nodes.

Variants

A number of variants of the neural gas algorithm exists in the literature so as to mitigate some of its shortcomings. More notable is perhaps Bernd Fritzke's growing neural gas, but also one should mention further elaborations such as the Growing When Required network and also the incremental growing neural gas. A performance-oriented approach that avoids the risk of overfitting is the Plastic Neural gas model.

Growing neural gas

Fritzke describes the growing neural gas (GNG) as an incremental network model that learns topological relations by using a "Hebbian learning, Hebb-like learning rule", only, unlike the neural gas, it has no parameters that change over time and it is capable of continuous learning, i.e. learning on data streams. GNG has been widely used in several domains, demonstrating its capabilities for clustering data incrementally. The GNG is initialized with two randomly positioned nodes which are initially connected with a zero age edge and whose errors are set to 0. Since the in the GNG input data is presented sequentially one by one, the following steps are followed at each iteration: * It is calculated the errors (distances) between the two closest nodes to the current input data. * The error of the winner node (only the closest one) is respectively accumulated. * The winner node and its topological neighbors (connected by an edge) are moving towards the current input by different fractions of their respective errors. * The age of all edges connected to the winner node are incremented. * If the winner node and the second-winner are connected by an edge, such an edge is set to 0. Else, an edge is created between them. * If there are edges with an age larger than a threshold, they are removed. Nodes without connections are eliminated. * If the current iteration is an integer multiple of a predefined frequency-creation threshold, a new node is inserted between the node with the largest error (among all) and its topological neighbor presenting the highest error. The link between the former and the latter nodes is eliminated (their errors are decreased by a given factor) and the new node is connected to both of them. The error of the new node is initialized as the updated error of the node which had the largest error (among all). * The accumulated error of all nodes is decreased by a given factor. * If the stopping criterion is not met, the algorithm takes a following input. The criterion might be a given number of epochs, i.e., a pre-set number of times where all data is presented, or the reach of a maximum number of nodes.

Incremental growing neural gas

Another neural gas variant inspired in the GNG algorithm is the incremental growing neural gas (IGNG). The authors propose the main advantage of this algorithm to be "learning new data (plasticity) without degrading the previously trained network and forgetting the old input data (stability)."

Growing when required

Having a network with a growing set of nodes, like the one implemented by the GNG algorithm was seen as a great advantage, however some limitation on the learning was seen by the introduction of the parameter λ, in which the network would only be able to grow when iterations were a multiple of this parameter. The proposal to mitigate this problem was a new algorithm, the Growing When Required network (GWR), which would have the network grow more quickly, by adding nodes as quickly as possible whenever the network identified that the existing nodes would not describe the input well enough.

Plastic neural gas

The ability to only grow a network may quickly introduce overfitting; on the other hand, removing nodes on the basis of age only, as in the GNG model, does not ensure that the removed nodes are actually useless, because removal depends on a model parameter that should be carefully tuned to the "memory length" of the stream of input data. The "Plastic Neural Gas" model solves this problem by making decisions to add or remove nodes using an unsupervised version of cross-validation, which controls an equivalent notion of "generalization ability" for the unsupervised setting. While growing-only methods only cater for the incremental learning scenario, the ability to grow and shrink is suited to the more general streaming data problem.

Implementations

To find the ranking

i_0, i_1, \ldots, i_

of the feature vectors, the neural gas algorithm involves sorting, which is a procedure that does not lend itself easily to parallelization or implementation in analog hardware. However, implementations in both parallel software and analog hardware were actually designed.

References

External links

DemoGNG.js
Javascript simulator for Neural Gas (and other network models)
Java Competitive Learning Applications
Unsupervised Neural Networks (including Self-organizing map) in Java with source codes.

* [http://uk.mathworks.com/matlabcentral/fileexchange/57798-gwr-and-gng-classifier A GNG and GWR Classifier implementation in Matlab] Artificial neural networks