HOME

TheInfoList



OR:

An artificial neuron is a mathematical function conceived as a model of biological
neuron A neuron, neurone, or nerve cell is an electrically excitable cell that communicates with other cells via specialized connections called synapses. The neuron is the main component of nervous tissue in all animals except sponges and placozoa ...
s, a
neural network A neural network is a network or circuit of biological neurons, or, in a modern sense, an artificial neural network, composed of artificial neurons or nodes. Thus, a neural network is either a biological neural network, made up of biological ...
. Artificial neurons are elementary units in an
artificial neural network Artificial neural networks (ANNs), usually simply called neural networks (NNs) or neural nets, are computing systems inspired by the biological neural networks that constitute animal brains. An ANN is based on a collection of connected unit ...
. The artificial neuron receives one or more inputs (representing excitatory postsynaptic potentials and inhibitory postsynaptic potentials at neural
dendrite Dendrites (from Greek δένδρον ''déndron'', "tree"), also dendrons, are branched protoplasmic extensions of a nerve cell that propagate the electrochemical stimulation received from other neural cells to the cell body, or soma, of the ...
s) and sums them to produce an output (or , representing a neuron's
action potential An action potential occurs when the membrane potential of a specific cell location rapidly rises and falls. This depolarization then causes adjacent locations to similarly depolarize. Action potentials occur in several types of animal cells ...
which is transmitted along its
axon An axon (from Greek ἄξων ''áxōn'', axis), or nerve fiber (or nerve fibre: see spelling differences), is a long, slender projection of a nerve cell, or neuron, in vertebrates, that typically conducts electrical impulses known as action p ...
). Usually each input is separately
weighted A weight function is a mathematical device used when performing a sum, integral, or average to give some elements more "weight" or influence on the result than other elements in the same set. The result of this application of a weight function is ...
, and the sum is passed through a
non-linear function In mathematics and science, a nonlinear system is a system in which the change of the output is not proportional to the change of the input. Nonlinear problems are of interest to engineers, biologists, physicists, mathematicians, and many ot ...
known as an activation function or transfer function. The transfer functions usually have a sigmoid shape, but they may also take the form of other non-linear functions,
piecewise In mathematics, a piecewise-defined function (also called a piecewise function, a hybrid function, or definition by cases) is a function defined by multiple sub-functions, where each sub-function applies to a different interval in the domain. P ...
linear functions, or step functions. They are also often
monotonically increasing In mathematics, a monotonic function (or monotone function) is a function between ordered sets that preserves or reverses the given order. This concept first arose in calculus, and was later generalized to the more abstract setting of orde ...
,
continuous Continuity or continuous may refer to: Mathematics * Continuity (mathematics), the opposing concept to discreteness; common examples include ** Continuous probability distribution or random variable in probability and statistics ** Continuous g ...
, differentiable and bounded. Non-monotonic, unbounded and oscillating activation functions with multiple zeros that outperform sigmoidal and ReLU like activation functions on many tasks have also been recently explored. The thresholding function has inspired building
logic gate A logic gate is an idealized or physical device implementing a Boolean function, a logical operation performed on one or more binary inputs that produces a single binary output. Depending on the context, the term may refer to an ideal logic ga ...
s referred to as threshold logic; applicable to building logic circuits resembling brain processing. For example, new devices such as memristors have been extensively used to develop such logic in recent times. The artificial neuron transfer function should not be confused with a linear system's transfer function. Artificial neurons can also refer to artificial cells in neuromorphic engineering that are similar to natural physical neurons.


Basic structure

For a given artificial neuron k, let there be ''m'' + 1 inputs with signals ''x''0 through ''x''''m'' and weights ''wk''0 through ''wk''''m''. Usually, the ''x''0 input is assigned the value +1, which makes it a ''bias'' input with ''w''''k''0 = ''b''''k''. This leaves only ''m'' actual inputs to the neuron: from ''x''1 to ''x''''m''. The output of the ''k''th neuron is: :y_k = \varphi \left( \sum_^m w_ x_j \right) Where \varphi (phi) is the transfer function (commonly a threshold function). The output is analogous to the
axon An axon (from Greek ἄξων ''áxōn'', axis), or nerve fiber (or nerve fibre: see spelling differences), is a long, slender projection of a nerve cell, or neuron, in vertebrates, that typically conducts electrical impulses known as action p ...
of a biological neuron, and its value propagates to the input of the next layer, through a synapse. It may also exit the system, possibly as part of an output vector. It has no learning process as such. Its transfer function weights are calculated and threshold value are predetermined.


Types

Depending on the specific model used they may be called a semi-linear unit, Nv neuron, binary neuron, linear threshold function, or McCulloch–Pitts (MCP) neuron. Simple artificial neurons, such as the McCulloch–Pitts model, are sometimes described as "caricature models", since they are intended to reflect one or more neurophysiological observations, but without regard to realism.


Biological models

Artificial neurons are designed to mimic aspects of their biological counterparts. However a significant performance gap exists between biological and artificial neural networks. In particular single biological neurons in the human brain with oscillating activation function capable of learning the XOR function have been discovered. * ''
Dendrites Dendrites (from Greek δένδρον ''déndron'', "tree"), also dendrons, are branched protoplasmic extensions of a nerve cell that propagate the electrochemical stimulation received from other neural cells to the cell body, or soma, of the ...
'' – In a biological neuron, the dendrites act as the input vector. These dendrites allow the cell to receive signals from a large (>1000) number of neighboring neurons. As in the above mathematical treatment, each dendrite is able to perform "multiplication" by that dendrite's "weight value." The multiplication is accomplished by increasing or decreasing the ratio of synaptic neurotransmitters to signal chemicals introduced into the dendrite in response to the synaptic neurotransmitter. A negative multiplication effect can be achieved by transmitting signal inhibitors (i.e. oppositely charged ions) along the dendrite in response to the reception of synaptic neurotransmitters. * ''
Soma Soma may refer to: Businesses and brands * SOMA (architects), a New York–based firm of architects * Soma (company), a company that designs eco-friendly water filtration systems * SOMA Fabrications, a builder of bicycle frames and other bicycle ...
'' – In a biological neuron, the soma acts as the summation function, seen in the above mathematical description. As positive and negative signals (exciting and inhibiting, respectively) arrive in the soma from the dendrites, the positive and negative ions are effectively added in summation, by simple virtue of being mixed together in the solution inside the cell's body. * ''
Axon An axon (from Greek ἄξων ''áxōn'', axis), or nerve fiber (or nerve fibre: see spelling differences), is a long, slender projection of a nerve cell, or neuron, in vertebrates, that typically conducts electrical impulses known as action p ...
'' – The axon gets its signal from the summation behavior which occurs inside the soma. The opening to the axon essentially samples the electrical potential of the solution inside the soma. Once the soma reaches a certain potential, the axon will transmit an all-in signal pulse down its length. In this regard, the axon behaves as the ability for us to connect our artificial neuron to other artificial neurons. Unlike most artificial neurons, however, biological neurons fire in discrete pulses. Each time the electrical potential inside the soma reaches a certain threshold, a pulse is transmitted down the axon. This pulsing can be translated into continuous values. The rate (activations per second, etc.) at which an axon fires converts directly into the rate at which neighboring cells get signal ions introduced into them. The faster a biological neuron fires, the faster nearby neurons accumulate electrical potential (or lose electrical potential, depending on the "weighting" of the dendrite that connects to the neuron that fired). It is this conversion that allows computer scientists and mathematicians to simulate biological neural networks using artificial neurons which can output distinct values (often from −1 to 1).


Encoding

Research has shown that unary coding is used in the neural circuits responsible for birdsong production. The use of unary in biological networks is presumably due to the inherent simplicity of the coding. Another contributing factor could be that unary coding provides a certain degree of error correction.


Physical artificial cells

There is research and development into physical artificial neurons – organic and inorganic. For example, some artificial neurons can receive and release
dopamine Dopamine (DA, a contraction of 3,4-dihydroxyphenethylamine) is a neuromodulatory molecule that plays several important roles in cells. It is an organic chemical of the catecholamine and phenethylamine families. Dopamine constitutes about 80% o ...
( chemical signals rather than electrical signals) and communicate with natural rat
muscle Skeletal muscles (commonly referred to as muscles) are organs of the vertebrate muscular system and typically are attached by tendons to bones of a skeleton. The muscle cells of skeletal muscles are much longer than in the other types of mus ...
and brain cells, with potential for use in BCIs/ prosthetics. Low-power biocompatible memristors may enable construction of artificial neurons which function at voltages of biological
action potential An action potential occurs when the membrane potential of a specific cell location rapidly rises and falls. This depolarization then causes adjacent locations to similarly depolarize. Action potentials occur in several types of animal cells ...
s and could be used to directly process biosensing signals, for neuromorphic computing and/or direct communication with biological neurons. Organic neuromorphic circuits made out of
polymer A polymer (; Greek '' poly-'', "many" + '' -mer'', "part") is a substance or material consisting of very large molecules called macromolecules, composed of many repeating subunits. Due to their broad spectrum of properties, both synthetic a ...
s, coated with an ion-rich gel to enable a material to carry an electric charge like real neurons, have been built into a robot, enabling it to learn sensorimotorically within the real world, rather than via simulations or virtually. Moreover, artificial spiking neurons made of soft matter (polymers) can operate in biologically relevant environments and enable the synergetic communication between the artificial and biological domains.


History

The first artificial neuron was the Threshold Logic Unit (TLU), or Linear Threshold Unit, first proposed by Warren McCulloch and Walter Pitts in 1943. The model was specifically targeted as a computational model of the "nerve net" in the brain. As a transfer function, it employed a threshold, equivalent to using the
Heaviside step function The Heaviside step function, or the unit step function, usually denoted by or (but sometimes , or ), is a step function, named after Oliver Heaviside (1850–1925), the value of which is zero for negative arguments and one for positive argum ...
. Initially, only a simple model was considered, with binary inputs and outputs, some restrictions on the possible weights, and a more flexible threshold value. Since the beginning it was already noticed that any boolean function could be implemented by networks of such devices, what is easily seen from the fact that one can implement the AND and OR functions, and use them in the disjunctive or the conjunctive normal form. Researchers also soon realized that cyclic networks, with
feedback Feedback occurs when outputs of a system are routed back as inputs as part of a chain of cause-and-effect that forms a circuit or loop. The system can then be said to ''feed back'' into itself. The notion of cause-and-effect has to be handled ...
s through neurons, could define dynamical systems with memory, but most of the research concentrated (and still does) on strictly feed-forward networks because of the smaller difficulty they present. One important and pioneering artificial neural network that used the linear threshold function was the perceptron, developed by Frank Rosenblatt. This model already considered more flexible weight values in the neurons, and was used in machines with adaptive capabilities. The representation of the threshold values as a bias term was introduced by
Bernard Widrow Bernard Widrow (born December 24, 1929) is a U.S. professor of electrical engineering at Stanford University. He is the co-inventor of the Widrow–Hoff least mean squares filter (LMS) adaptive algorithm with his then doctoral student Ted Hoff. ...
in 1960 – see ADALINE. In the late 1980s, when research on neural networks regained strength, neurons with more continuous shapes started to be considered. The possibility of differentiating the activation function allows the direct use of the gradient descent and other optimization algorithms for the adjustment of the weights. Neural networks also started to be used as a general function approximation model. The best known training algorithm called backpropagation has been rediscovered several times but its first development goes back to the work of
Paul Werbos Paul John Werbos (born 1947) is an American social scientist and machine learning pioneer. He is best known for his 1974 dissertation, which first described the process of training artificial neural networks through backpropagation of errors. He ...
.


Types of transfer functions

The transfer function ( activation function) of a neuron is chosen to have a number of properties which either enhance or simplify the network containing the neuron. Crucially, for instance, any multilayer perceptron using a ''linear'' transfer function has an equivalent single-layer network; a non-linear function is therefore necessary to gain the advantages of a multi-layer network. Below, ''u'' refers in all cases to the weighted sum of all the inputs to the neuron, i.e. for ''n'' inputs, : u = \sum_^n w_i x_i where w is a vector of ''synaptic weights'' and x is a vector of inputs.


Step function

The output ''y'' of this transfer function is binary, depending on whether the input meets a specified threshold, ''θ''. The "signal" is sent, i.e. the output is set to one, if the activation meets the threshold. :y = \begin 1 & \textu \ge \theta \\ 0 & \textu < \theta \end This function is used in perceptrons and often shows up in many other models. It performs a division of the
space Space is the boundless three-dimensional extent in which objects and events have relative position and direction. In classical physics, physical space is often conceived in three linear dimensions, although modern physicists usually consi ...
of inputs by a
hyperplane In geometry, a hyperplane is a subspace whose dimension is one less than that of its '' ambient space''. For example, if a space is 3-dimensional then its hyperplanes are the 2-dimensional planes, while if the space is 2-dimensional, its hyper ...
. It is specially useful in the last layer of a network intended to perform binary classification of the inputs. It can be approximated from other sigmoidal functions by assigning large values to the weights.


Linear combination

In this case, the output unit is simply the weighted sum of its inputs plus a ''bias'' term. A number of such linear neurons perform a linear transformation of the input vector. This is usually more useful in the first layers of a network. A number of analysis tools exist based on linear models, such as
harmonic analysis Harmonic analysis is a branch of mathematics concerned with the representation of functions or signals as the superposition of basic waves, and the study of and generalization of the notions of Fourier series and Fourier transforms (i.e. an ex ...
, and they can all be used in neural networks with this linear neuron. The bias term allows us to make affine transformations to the data. See:
Linear transformation In mathematics, and more specifically in linear algebra, a linear map (also called a linear mapping, linear transformation, vector space homomorphism, or in some contexts linear function) is a mapping V \to W between two vector spaces that pre ...
,
Harmonic analysis Harmonic analysis is a branch of mathematics concerned with the representation of functions or signals as the superposition of basic waves, and the study of and generalization of the notions of Fourier series and Fourier transforms (i.e. an ex ...
,
Linear filter Linear filters process time-varying input signals to produce output signals, subject to the constraint of linearity. In most cases these linear filters are also time invariant (or shift invariant) in which case they can be analyzed exactly using ...
, Wavelet,
Principal component analysis Principal component analysis (PCA) is a popular technique for analyzing large datasets containing a high number of dimensions/features per observation, increasing the interpretability of data while preserving the maximum amount of information, and ...
, Independent component analysis,
Deconvolution In mathematics, deconvolution is the operation inverse to convolution. Both operations are used in signal processing and image processing. For example, it may be possible to recover the original signal after a filter (convolution) by using a deco ...
.


Sigmoid

A fairly simple non-linear function, the
sigmoid function A sigmoid function is a mathematical function having a characteristic "S"-shaped curve or sigmoid curve. A common example of a sigmoid function is the logistic function shown in the first figure and defined by the formula: :S(x) = \frac = \f ...
such as the logistic function also has an easily calculated derivative, which can be important when calculating the weight updates in the network. It thus makes the network more easily manipulable mathematically, and was attractive to early computer scientists who needed to minimize the computational load of their simulations. It was previously commonly seen in multilayer perceptrons. However, recent work has shown sigmoid neurons to be less effective than rectified linear neurons. The reason is that the gradients computed by the backpropagation algorithm tend to diminish towards zero as activations propagate through layers of sigmoidal neurons, making it difficult to optimize neural networks using multiple layers of sigmoidal neurons.


Rectifier

In the context of
artificial neural network Artificial neural networks (ANNs), usually simply called neural networks (NNs) or neural nets, are computing systems inspired by the biological neural networks that constitute animal brains. An ANN is based on a collection of connected unit ...
s, the rectifier or ReLU (Rectified Linear Unit) is an activation function defined as the positive part of its argument: : f(x) = x^+ = \max(0, x), where ''x'' is the input to a neuron. This is also known as a ramp function and is analogous to
half-wave rectification A rectifier is an electrical device that converts alternating current (AC), which periodically reverses direction, to direct current (DC), which flows in only one direction. The reverse operation (converting DC to AC) is performed by an inver ...
in electrical engineering. This activation function was first introduced to a dynamical network by Hahnloser et al. in a 2000 paper in Nature with strong
biological Biology is the scientific study of life. It is a natural science with a broad scope but has several unifying themes that tie it together as a single, coherent field. For instance, all organisms are made up of cells that process hereditary in ...
motivations and mathematical justifications. It has been demonstrated for the first time in 2011 to enable better training of deeper networks, compared to the widely used activation functions prior to 2011, i.e., the logistic sigmoid (which is inspired by
probability theory Probability theory is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expressing it through a set ...
; see
logistic regression In statistics, the logistic model (or logit model) is a statistical model that models the probability of an event taking place by having the log-odds for the event be a linear combination of one or more independent variables. In regression an ...
) and its more practical counterpart, the
hyperbolic tangent In mathematics, hyperbolic functions are analogues of the ordinary trigonometric functions, but defined using the hyperbola rather than the circle. Just as the points form a circle with a unit radius, the points form the right half of the ...
. A commonly used variant of the ReLU activation function is the Leaky ReLU which allows a small, positive gradient when the unit is not active: f(x) = \begin x & \text x > 0, \\ ax & \text. \end where ''x'' is the input to the neuron and ''a'' is a small positive constant (in the original paper the value 0.01 was used for ''a'').Andrew L. Maas, Awni Y. Hannun, Andrew Y. Ng (2014)
Rectifier Nonlinearities Improve Neural Network Acoustic Models


Pseudocode algorithm

The following is a simple
pseudocode In computer science, pseudocode is a plain language description of the steps in an algorithm or another system. Pseudocode often uses structural conventions of a normal programming language, but is intended for human reading rather than machine re ...
implementation of a single TLU which takes boolean inputs (true or false), and returns a single boolean output when activated. An
object-oriented Object-oriented programming (OOP) is a programming paradigm based on the concept of "objects", which can contain data and code. The data is in the form of fields (often known as attributes or ''properties''), and the code is in the form of ...
model is used. No method of training is defined, since several exist. If a purely functional model were used, the class TLU below would be replaced with a function TLU with input parameters threshold, weights, and inputs that returned a boolean value. class TLU defined as: data member threshold : number data member weights : list of numbers of size X function member fire(inputs : list of booleans of size X) : boolean defined as: variable T : number T ← 0 for each i in 1 to X do if inputs(i) is true then T ← T + weights(i) end if end for each if T > threshold then return true else: return false end if end function end class


See also

*
Binding neuron A binding neuron (BN) is an abstract concept of processing of input impulses in a generic neuron based on their temporal coherence and the level of neuronal inhibition. Mathematically, the concept may be implemented by most neuronal models includin ...
*
Connectionism Connectionism refers to both an approach in the field of cognitive science that hopes to explain mental phenomena using artificial neural networks (ANN) and to a wide range of techniques and algorithms using ANNs in the context of artificial in ...


References


Further reading

* *


External links


{{sic, nolink=y, Artifical neuron mimicks function of human cells

McCulloch-Pitts Neurons (Overview)
Artificial neural networks American inventions