Gated Recurrent Unit
   HOME

TheInfoList



OR:

Gated recurrent units (GRUs) are a gating mechanism in
recurrent neural networks A recurrent neural network (RNN) is a class of artificial neural networks where connections between nodes can create a cycle, allowing output from some nodes to affect subsequent input to the same nodes. This allows it to exhibit temporal dynamic ...
, introduced in 2014 by Kyunghyun Cho et al. The GRU is like a
long short-term memory Long short-term memory (LSTM) is an artificial neural network used in the fields of artificial intelligence and deep learning. Unlike standard feedforward neural networks, LSTM has feedback connections. Such a recurrent neural network (RNN) ca ...
(LSTM) with a forget gate, but has fewer parameters than LSTM, as it lacks an output gate. GRU's performance on certain tasks of polyphonic music modeling, speech signal modeling and natural language processing was found to be similar to that of LSTM. GRUs have been shown to exhibit better performance on certain smaller and less frequent datasets.


Architecture

There are several variations on the full gated unit, with gating done using the previous hidden state and the bias in various combinations, and a simplified form called minimal gated unit. The operator \odot denotes the Hadamard product in the following.


Fully gated unit

Initially, for t = 0, the output vector is h_0 = 0. : \begin z_t &= \sigma_g(W_ x_t + U_ h_ + b_z) \\ r_t &= \sigma_g(W_ x_t + U_ h_ + b_r) \\ \hat_t &= \phi_h(W_ x_t + U_ (r_t \odot h_) + b_h) \\ h_t &= z_t \odot h_ + (1-z_t) \odot \hat_t \end Variables * x_t: input vector * h_t: output vector * \hat_t: candidate activation vector * z_t: update gate vector * r_t: reset gate vector * W, U and b: parameter matrices and vector
Activation function In artificial neural networks, the activation function of a node defines the output of that node given an input or set of inputs. A standard integrated circuit can be seen as a digital network of activation functions that can be "ON" (1) or " ...
s * \sigma_g: The original is a
sigmoid function A sigmoid function is a mathematical function having a characteristic "S"-shaped curve or sigmoid curve. A common example of a sigmoid function is the logistic function shown in the first figure and defined by the formula: :S(x) = \frac = \f ...
. *\phi_h: The original is a
hyperbolic tangent In mathematics, hyperbolic functions are analogues of the ordinary trigonometric functions, but defined using the hyperbola rather than the circle. Just as the points form a circle with a unit radius, the points form the right half of the un ...
. Alternative activation functions are possible, provided that \sigma_g(x) \isin
, 1 The comma is a punctuation mark that appears in several variants in different languages. It has the same shape as an apostrophe or single closing quotation mark () in many typefaces, but it differs from them in being placed on the baseline (t ...
/math>. Alternate forms can be created by changing z_t and r_t * Type 1, each gate depends only on the previous hidden state and the bias. *: \begin z_t &= \sigma_g(U_ h_ + b_z) \\ r_t &= \sigma_g(U_ h_ + b_r) \\ \end * Type 2, each gate depends only on the previous hidden state. *: \begin z_t &= \sigma_g(U_ h_) \\ r_t &= \sigma_g(U_ h_) \\ \end * Type 3, each gate is computed using only the bias. *: \begin z_t &= \sigma_g(b_z) \\ r_t &= \sigma_g(b_r) \\ \end


Minimal gated unit

The minimal gated unit is similar to the fully gated unit, except the update and reset gate vector is merged into a forget gate. This also implies that the equation for the output vector must be changed: : \begin f_t &= \sigma_g(W_ x_t + U_ h_ + b_f) \\ \hat_t &= \phi_h(W_ x_t + U_ (f_t \odot h_) + b_h) \\ h_t &= (1-f_t) \odot h_ + f_t \odot \hat_t \end Variables * x_t: input vector * h_t: output vector * \hat_t: candidate activation vector * f_t: forget vector * W, U and b: parameter matrices and vector


References

{{Reflist Artificial neural networks