HOME

TheInfoList



OR:

ADALINE (Adaptive Linear Neuron or later Adaptive Linear Element) is an early single-layer artificial neural network and the name of the physical device that implemented this network. The network uses memistors. It was developed by Professor Bernard Widrow and his doctorate student Ted Hoff at Stanford University in 1960. It is based on the McCulloch–Pitts neuron. It consists of a weight, a bias and a summation function. The difference between Adaline and the standard (McCulloch–Pitts) perceptron is that in the learning phase, the weights are adjusted according to the weighted sum of the inputs (the net). In the standard perceptron, the net is passed to the activation (transfer) function and the function's output is used for adjusting the weights. A multilayer network of ADALINE units is known as a MADALINE.


Definition

Adaline is a single layer neural network with multiple nodes where each node accepts multiple inputs and generates one output. Given the following variables as: * x is the input vector * w is the weight vector * n is the number of inputs * \theta some constant * y is the output of the model then we find that the output is y=\sum_^ x_j w_j + \theta. If we further assume that * x_0 = 1 *w_0 = \theta then the output further reduces to: y=\sum_^ x_j w_j


Learning algorithm

Let us assume: *\eta is the
learning rate In machine learning and statistics, the learning rate is a tuning parameter in an optimization algorithm that determines the step size at each iteration while moving toward a minimum of a loss function. Since it influences to what extent newly ac ...
(some positive constant) *y is the output of the model *o is the target (desired) output then the weights are updated as follows w \leftarrow w + \eta(o - y)x. The ADALINE converges to the least squares error which is E=(o - y)^2. This update rule is in fact the
stochastic gradient descent Stochastic gradient descent (often abbreviated SGD) is an iterative method for optimizing an objective function with suitable smoothness properties (e.g. differentiable or subdifferentiable). It can be regarded as a stochastic approximation of ...
update for
linear regression In statistics, linear regression is a linear approach for modelling the relationship between a scalar response and one or more explanatory variables (also known as dependent and independent variables). The case of one explanatory variable is call ...
.


MADALINE

MADALINE (Many ADALINE) is a three-layer (input, hidden, output), fully connected, feed-forward
artificial neural network Artificial neural networks (ANNs), usually simply called neural networks (NNs) or neural nets, are computing systems inspired by the biological neural networks that constitute animal brains. An ANN is based on a collection of connected unit ...
architecture for
classification Classification is a process related to categorization, the process in which ideas and objects are recognized, differentiated and understood. Classification is the grouping of related facts into classes. It may also refer to: Business, organizat ...
that uses ADALINE units in its hidden and output layers, i.e. its activation function is the
sign function In mathematics, the sign function or signum function (from '' signum'', Latin for "sign") is an odd mathematical function that extracts the sign of a real number. In mathematical expressions the sign function is often represented as . To avoi ...
.Youtube
widrowlms: Science in Action
(Madaline is mentioned at the start and at 8:46)
The three-layer network uses
memistor A memistor is a nanoelectric circuitry element used in parallel computing memory technology. Essentially, a resistor with memory able to perform logic operations and store information, it is a three-terminal implementation of the memristor. His ...
s. Three different training algorithms for MADALINE networks, which cannot be learned using
backpropagation In machine learning, backpropagation (backprop, BP) is a widely used algorithm for training feedforward neural network, feedforward artificial neural networks. Generalizations of backpropagation exist for other artificial neural networks (ANN ...
because the sign function is not differentiable, have been suggested, called Rule I, Rule II and Rule III. MADALINE Rule 1 (MRI) - The first of these dates back to 1962 and cannot adapt the weights of the hidden-output connection. MADALINE Rule 2 (MRII) - The second training algorithm improved on Rule I and was described in 1988. The Rule II training algorithm is based on a principle called "minimal disturbance". It proceeds by looping over training examples, then for each example, it: * finds the hidden layer unit (ADALINE classifier) with the lowest confidence in its prediction, * tentatively flips the sign of the unit, * accepts or rejects the change based on whether the network's error is reduced, * stops when the error is zero. MADALINE Rule 3 - The third "Rule" applied to a modified network with
sigmoid Sigmoid means resembling the lower-case Greek letter sigma (uppercase Σ, lowercase σ, lowercase in word-final position ς) or the Latin letter S. Specific uses include: * Sigmoid function, a mathematical function * Sigmoid colon, part of the l ...
activations instead of signum; it was later found to be equivalent to backpropagation. Additionally, when flipping single units' signs does not drive the error to zero for a particular example, the training algorithm starts flipping pairs of units' signs, then triples of units, etc.


See also

*
Multilayer perceptron A multilayer perceptron (MLP) is a fully connected class of feedforward artificial neural network (ANN). The term MLP is used ambiguously, sometimes loosely to mean ''any'' feedforward ANN, sometimes strictly to refer to networks composed of mu ...


References


External links

*
Memristor-Based Multilayer Neural Networks With Online Gradient Descent Training
. Implementation of the ADALINE algorithm with memristors in analog computing. {{DEFAULTSORT:Adaline Artificial neural networks