Modern Hopfield networks
(also known as Dense Associative Memories
) are generalizations of the classical
Hopfield networks that break the linear scaling relationship between the number of input features and the number of stored memories. This is achieved by introducing stronger non-linearities (either in the energy function or neurons’ activation functions) leading to super-linear
(even an exponential
) memory storage capacity as a function of the number of feature neurons. The network still requires a sufficient number of hidden neurons.
The key theoretical idea behind the modern Hopfield networks is to use an energy function and an update rule that is more sharply peaked around the stored memories in the space of neuron’s configurations compared to the classical Hopfield network.
Classical Hopfield networks
Hopfield networks
are
recurrent neural networks
Recurrent neural networks (RNNs) are a class of artificial neural networks designed for processing sequential data, such as text, speech, and time series, where the order of elements is important. Unlike feedforward neural networks, which proces ...
with dynamical trajectories converging to fixed point attractor states and described by an
energy function. The state of each model
neuron
A neuron (American English), neurone (British English), or nerve cell, is an membrane potential#Cell excitability, excitable cell (biology), cell that fires electric signals called action potentials across a neural network (biology), neural net ...
is defined by a time-dependent variable
, which can be chosen to be either discrete or continuous. A complete model describes the mathematics of how the future state of activity of each neuron depends on the known present or previous activity of all the neurons.
In the original Hopfield model of associative memory,
the variables were binary, and the dynamics were described by a one-at-a-time update of the state of the neurons. An energy function quadratic in the
was defined, and the dynamics consisted of changing the activity of each single neuron
only if doing so would lower the total energy of the system. This same idea was extended to the case of
being a continuous variable representing the output of neuron
, and
being a monotonic function of an input current. The dynamics became expressed as a set of
first-order differential equations for which the "energy" of the system always decreased.
The energy in the continuous case has one term which is quadratic in the
(as in the binary model), and a second term which depends on the gain function (neuron's
activation function
The activation function of a node in an artificial neural network is a function that calculates the output of the node based on its individual inputs and their weights. Nontrivial problems can be solved using only a few nodes if the activation f ...
). While having many desirable properties of associative memory, both of these classical systems suffer from a small memory storage capacity, which scales linearly with the number of input features.
Discrete variables
A simple example
of the Modern Hopfield network can be written in terms of binary variables
that represent the active
and inactive
state of the model neuron
.
In this formula the weights
represent the matrix of memory vectors (index
enumerates different memories, and index
enumerates the content of each memory corresponding to the
-th feature neuron), and the function
is a rapidly growing non-linear function. The update rule for individual neurons (in the asynchronous case) can be written in the following form
, {{EquationRef, 18 Thus, the hierarchical layered network is indeed an attractor network with the global energy function. This network is described by a hierarchical set of synaptic weights that can be learned for each specific problem.
References
Artificial neural networks