In
neural networks
A neural network is a group of interconnected units called neurons that send signals to one another. Neurons can be either Cell (biology), biological cells or signal pathways. While individual neurons are simple, many of them together in a netwo ...
, the gating mechanism is an architectural motif for controlling the flow of
activation
In chemistry and biology, activation is the process whereby something is prepared or excited for a subsequent reaction.
Chemistry
In chemistry, "activation" refers to the reversible transition of a molecule into a nearly identical chemical or ...
and
gradient signals. They are most prominently used in
recurrent neural network
Recurrent neural networks (RNNs) are a class of artificial neural networks designed for processing sequential data, such as text, speech, and time series, where the order of elements is important. Unlike feedforward neural networks, which proces ...
s (RNNs), but have also found applications in other architectures.
RNNs
Gating mechanisms are the centerpiece of
long short-term memory
Long short-term memory (LSTM) is a type of recurrent neural network (RNN) aimed at mitigating the vanishing gradient problem commonly encountered by traditional RNNs. Its relative insensitivity to gap length is its advantage over other RNNs, ...
(LSTM).
They were proposed to mitigate the
vanishing gradient problem
In machine learning, the vanishing gradient problem is the problem of greatly diverging gradient magnitudes between earlier and later layers encountered when training neural networks with backpropagation. In such methods, neural network weights ar ...
often encountered by regular RNNs.
An LSTM unit contains three gates:
* An input gate, which controls the flow of new information into the memory cell
* A forget gate, which controls how much information is retained from the previous time step
* An output gate, which controls how much information is passed to the next layer.
The equations for LSTM are:
Here,
represents
elementwise multiplication.
File:LSTM 1.svg
File:LSTM 0.svg
File:LSTM 2.svg
File:LSTM 3.svg
The
gated recurrent unit
Gated recurrent units (GRUs) are a gating mechanism in recurrent neural networks, introduced in 2014 by Kyunghyun Cho et al. The GRU is like a long short-term memory (LSTM) with a gating mechanism to input or forget certain features, but lacks a ...
(GRU) simplifies the LSTM. Compared to the LSTM, the GRU has just two gates: a reset gate and an update gate. GRU also merges the cell state and hidden state. The reset gate roughly corresponds to the forget gate, and the update gate roughly corresponds to the input gate. The output gate is removed.
There are several variants of GRU. One particular variant has these equations:
File:Gated Recurrent Unit 1.svg
File:Gated Recurrent Unit 2.svg
File:Gated Recurrent Unit 3.svg
Gated Linear Unit
Gated Linear Units (GLUs) adapt the gating mechanism for use in
feedforward neural network
Feedforward refers to recognition-inference architecture of neural networks. Artificial neural network architectures are based on inputs multiplied by weights to obtain outputs (inputs-to-output): feedforward. Recurrent neural networks, or neur ...
s, often within
transformer
In electrical engineering, a transformer is a passive component that transfers electrical energy from one electrical circuit to another circuit, or multiple Electrical network, circuits. A varying current in any coil of the transformer produces ...
-based architectures. They are defined as:
where
are the first and second inputs, respectively.
represents the
sigmoid
Sigmoid means resembling the lower-case Greek letter sigma (uppercase Σ, lowercase σ, lowercase in word-final position ς) or the Latin letter S. Specific uses include:
* Sigmoid function, a mathematical function
* Sigmoid colon, part of the l ...
activation function
The activation function of a node in an artificial neural network is a function that calculates the output of the node based on its individual inputs and their weights. Nontrivial problems can be solved using only a few nodes if the activation f ...
.
Replacing
with other activation functions leads to variants of GLU:
where ReLU, GELU, and Swish are different activation functions (see
this table for definitions).
In transformer models, such gating units are often used in the
feedforward modules. For a single vector input, this results in:
Other architectures
Gating mechanism is used in
highway networks, which were designed by unrolling an LSTM.
Channel gating
uses a gate to control the flow of information through different channels inside a
convolutional neural network
A convolutional neural network (CNN) is a type of feedforward neural network that learns features via filter (or kernel) optimization. This type of deep learning network has been applied to process and make predictions from many different ty ...
(CNN).
See also
*
Recurrent neural network
Recurrent neural networks (RNNs) are a class of artificial neural networks designed for processing sequential data, such as text, speech, and time series, where the order of elements is important. Unlike feedforward neural networks, which proces ...
*
Long short-term memory
Long short-term memory (LSTM) is a type of recurrent neural network (RNN) aimed at mitigating the vanishing gradient problem commonly encountered by traditional RNNs. Its relative insensitivity to gap length is its advantage over other RNNs, ...
*
Gated recurrent unit
Gated recurrent units (GRUs) are a gating mechanism in recurrent neural networks, introduced in 2014 by Kyunghyun Cho et al. The GRU is like a long short-term memory (LSTM) with a gating mechanism to input or forget certain features, but lacks a ...
*
Transformer
In electrical engineering, a transformer is a passive component that transfers electrical energy from one electrical circuit to another circuit, or multiple Electrical network, circuits. A varying current in any coil of the transformer produces ...
*
Activation function
The activation function of a node in an artificial neural network is a function that calculates the output of the node based on its individual inputs and their weights. Nontrivial problems can be solved using only a few nodes if the activation f ...
References
Further reading
*
{{Artificial intelligence navbox
Neural network architectures
Deep learning