Sparse distributed memory (SDM) is a mathematical model of human
long-term memory
Long-term memory (LTM) is the stage of the Atkinson–Shiffrin memory model in which informative knowledge is held indefinitely. It is defined in contrast to short-term and working memory, which persist for only about 18 to 30 seconds. Long-t ...
introduced by
Pentti Kanerva Pentti Kanerva is a research affiliate at the Redwood Neuroscience Institute, and is the originator of the sparse distributed memory model.Kanerva, Pentti. Sparse distributed memory. MIT press, 1988. He is responsible for relating the properties of ...
in 1988 while he was at
NASA Ames Research Center
The Ames Research Center (ARC), also known as NASA Ames, is a major NASA research center at Moffett Federal Airfield in California's Silicon Valley. It was founded in 1939 as the second National Advisory Committee for Aeronautics (NACA) laborat ...
. It is a generalized
random-access memory
Random-access memory (RAM; ) is a form of computer memory that can be read and changed in any order, typically used to store working Data (computing), data and machine code. A Random access, random-access memory device allows data items to b ...
(RAM) for long (e.g., 1,000 bit) binary words. These words serve as both addresses to and data for the memory. The main attribute of the memory is sensitivity to similarity, meaning that a word can be read back not only by giving the original write address but also by giving one close to it, as measured by the number of mismatched bits (i.e., the
Hamming distance
In information theory, the Hamming distance between two strings of equal length is the number of positions at which the corresponding symbols are different. In other words, it measures the minimum number of ''substitutions'' required to chan ...
between
memory address
In computing, a memory address is a reference to a specific memory location used at various levels by software and hardware. Memory addresses are fixed-length sequences of digits conventionally displayed and manipulated as unsigned integers. Su ...
es).
SDM implements transformation from logical space to physical space using distributed data representation and storage, similarly to
encoding
In communications and information processing, code is a system of rules to convert information—such as a letter, word, sound, image, or gesture—into another form, sometimes shortened or secret, for communication through a communication ...
processes in human memory. A value corresponding to a logical address is stored into many physical addresses. This way of storing is robust and not deterministic. A memory cell is not addressed directly. If input data (logical addresses) are partially damaged at all, we can still get correct output data.
The theory of the memory is mathematically complete
and has been verified by
computer simulation
Computer simulation is the process of mathematical modelling, performed on a computer, which is designed to predict the behaviour of, or the outcome of, a real-world or physical system. The reliability of some mathematical models can be dete ...
. It arose from the observation that the distances between points of a
high-dimensional space
In physics and mathematics, the dimension of a mathematical space (or object) is informally defined as the minimum number of coordinates needed to specify any point within it. Thus, a line has a dimension of one (1D) because only one coordina ...
resemble the proximity relations between
concept
Concepts are defined as abstract ideas. They are understood to be the fundamental building blocks of the concept behind principles, thoughts and beliefs.
They play an important role in all aspects of cognition. As such, concepts are studied by s ...
s in human memory. The theory is also practical in that memories based on it can be implemented with conventional
random-access memory
Random-access memory (RAM; ) is a form of computer memory that can be read and changed in any order, typically used to store working Data (computing), data and machine code. A Random access, random-access memory device allows data items to b ...
elements.
[Flynn, Michael J., Pentti Kanerva, and Neil Bhadkamkar. "Sparse distributed memory prototype: principles and operation." (1989).]
Definition
Human memory has a tendency to
congregate memories based on similarities between them (although they may not be related), such as "firetrucks are red and apples are red".
Sparse distributed memory is a mathematical representation of human memory, and uses
high-dimensional space
In physics and mathematics, the dimension of a mathematical space (or object) is informally defined as the minimum number of coordinates needed to specify any point within it. Thus, a line has a dimension of one (1D) because only one coordina ...
to help model the large amounts of memory that mimics that of the human neural network.
An
important property of such high dimensional spaces is that two randomly chosen vectors are relatively far away from each other, meaning that they are uncorrelated.
[ SDM can be considered a realization of ]Locality-sensitive hashing In computer science, locality-sensitive hashing (LSH) is an algorithmic technique that hashes similar input items into the same "buckets" with high probability. (The number of buckets is much smaller than the universe of possible input items.) Since ...
.
The underlying idea behind a SDM is the mapping of a huge binary memory onto a smaller set of physical locations, so-called ''hard locations''. As a general guideline, those hard locations should be uniformly distributed in the virtual space, to mimic the existence of the larger virtual space as accurately as possible. Every datum is stored distributed by a set of hard locations, and retrieved by averaging those locations. Therefore, recall may not be perfect, accuracy depending on the saturation of the memory.
Kanerva's proposal is based on four basic ideas:
*1. The boolean space , or points in dimensions, exhibits properties which are similar to humans' intuitive notions of relationships between the concepts. This means that it makes sense to store data as points of the mentioned space where each memory item is stored as an n-bit vector.
*2. Neurons with n inputs can be used as address decoders of a random-access memory
*3. Unifying principle: data stored into the memory can be used as addresses to the same memory. Distance between two points is a measure of similarity between two memory items. The closer the points, the more similar the stored vectors.
*4. Time can be traced in the memory as a function of where the data are stored, if the data are organized as sequences of events
The binary space N
The SDM works with n-dimensional vectors with binary components. Depending on the context, the vectors are called points, patterns, addresses, words, memory items, data, or events. This section is mostly about the properties of the vector space N = . Let n be number of dimensions of the space. The number of points, or possible memory items, is then . We will denote this number by N and will use N and to stand also for the space itself.[Grebeníček, František. "Sparse Distributed Memory− Pattern Data Analysis. URL: http://www.fit.vutbr.cz/~grebenic/Publikace/mosis2000.pdf"]
Concepts Related to the space N: [
*''Origin'', 0: The point with all coordinates 0 is called the origin, 0 = 000...00.
*''Complement'', 'x: The complement, or opposite, of point x is the n-tuple that has ones where x has zeros and vice versa.
*''Norm'', , x, : The norm of point x is the number of ones in its binary representation.
*''Difference'', x − y: The difference of two points x and y is the n-tuple that has ones where x and y differ and zeros elsewhere. It is the bitwise ']exclusive or
Exclusive or or exclusive disjunction is a logical operation that is true if and only if its arguments differ (one is true, the other is false).
It is symbolized by the prefix operator J and by the infix operators XOR ( or ), EOR, EXOR, , ...
': x − y = x ⊕ y. The difference commutes: x − y = y − x.
*''Distance'', d(x, y) The distance between two points x and y is the number of dimensions at which x and y differ. It is called the Hamming distance (its square root is the Euclidean distance
In mathematics, the Euclidean distance between two points in Euclidean space is the length of a line segment between the two points.
It can be calculated from the Cartesian coordinates of the points using the Pythagorean theorem, therefor ...
) and is expressed in bits. Distance is the norm of the difference: d(x, y) = , x − y,
*''Betweenness'', x:y:z: Point y is between points x and z if and only if the distance from x to z is the sum of the distances from x to y and from y to z; that is, x:y:z ⇔ d(x, z) = d(x, y) + d(y, z). It is easily seen that every bit of a point in between is a copy of the corresponding bit of an endpoint.
*''Orthogonality'', x ⊥ y: Point x is orthogonal to point y, or the two are perpendicular or indifferent, if and only if the distance between the two is half the number of dimensions: x ⊥ y ⇔ d(x, y) = n/2. The distance n/2 is called indifference distance of space N. If x is orthogonal to y, it is also orthogonal to its complement 'y (x is halfway between y and 'y).
*''Circle'', O(r,x) A circle with radius r and center x is the set of points that are at most r bits from x: O(r,x) = .
Properties of the space N: [
The space N can be represented by the vertices of the unit cube in n-dimensional ]Euclidean space
Euclidean space is the fundamental space of geometry, intended to represent physical space. Originally, that is, in Euclid's Elements, Euclid's ''Elements'', it was the three-dimensional space of Euclidean geometry, but in modern mathematics ther ...
. The vertices lie on the surface of an n-dimensional sphere with (Euclidean-metric) radius . This gives rise to the sphere
A sphere () is a Geometry, geometrical object that is a solid geometry, three-dimensional analogue to a two-dimensional circle. A sphere is the Locus (mathematics), set of points that are all at the same distance from a given point in three ...
analogy. We will call a space spherical if
*1. any point x has a unique opposite 'x,
*2. the entire space is between any point x and its opposite 'x, and
*3. all points are "equal" (meaning that for any two points x and y there is a distance preserving automorphism
In mathematics, an automorphism is an isomorphism from a mathematical object to itself. It is, in some sense, a symmetry of the object, and a way of mapping the object to itself while preserving all of its structure. The set of all automorphisms ...
of the space that maps x to y, so that from any of its points the space "looks" the same).
The surface of a sphere (in Euclidean 3d-space) clearly is spherical. According to definition, N is also spherical, since y ⊕ x ⊕ (…) is an automorphism that maps x to y.
Because N is spherical, it is helpful to think of it as the surface of a sphere with circumference
In geometry, the circumference (from Latin ''circumferens'', meaning "carrying around") is the perimeter of a circle or ellipse. That is, the circumference would be the arc length of the circle, as if it were opened up and straightened out to ...
2n. All points of N are equally qualified as points of origin, and a point and its complement are like two poles at distance n from each other, with the entire space in between. The points halfway between the poles and perpendicular to them are like the equator.
;Distribution of the space N
The number of points that are exactly d bits from an arbitrary point x (say, from the point 0) is the number of ways to choose d coordinates from a total of n coordinates, and is therefore given by the binomial coefficient
In mathematics, the binomial coefficients are the positive integers that occur as coefficients in the binomial theorem. Commonly, a binomial coefficient is indexed by a pair of integers and is written \tbinom. It is the coefficient of the t ...
:
The distribution of N thus is the binomial distribution with parameters n and p, where p = 1/2. The mean of the binomial distribution
In probability theory and statistics, the binomial distribution with parameters ''n'' and ''p'' is the discrete probability distribution of the number of successes in a sequence of ''n'' independent experiments, each asking a yes–no quest ...
is n/2, and the variance
In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbers ...
is n/4. This distribution function will be denoted by N(d).
The normal distribution
In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is
:
f(x) = \frac e^
The parameter \mu ...
F with mean n/2 and standard deviation
In statistics, the standard deviation is a measure of the amount of variation or dispersion of a set of values. A low standard deviation indicates that the values tend to be close to the mean (also called the expected value) of the set, while ...
is a good approximation to it:
N(d) = Pr ≅ F
;Tendency to orthogonality
An outstanding property of N is that most of it lies at approximately the mean (indifference) distance n/2 from a point (and its complement). In
other words, most of the space is nearly orthogonal to any given point, and the larger n is, the more pronounced is this effect.
As neural network
The SDM may be regarded either as a content-addressable extension of a classical random-access memory
Random-access memory (RAM; ) is a form of computer memory that can be read and changed in any order, typically used to store working Data (computing), data and machine code. A Random access, random-access memory device allows data items to b ...
(RAM) or as a special type of three layer feedforward neural network
A feedforward neural network (FNN) is an artificial neural network wherein connections between the nodes do ''not'' form a cycle. As such, it is different from its descendant: recurrent neural networks.
The feedforward neural network was the ...
. The main SDM alterations to the RAM are:[Grebenıcek, František. Neural Nets as Associative Memories. Diss. Brno University of Technology, 2001. URL: http://www.vutium.vutbr.cz/tituly/pdf/ukazka/80-214-1914-8.pdf]
*The SDM calculates Hamming distance
In information theory, the Hamming distance between two strings of equal length is the number of positions at which the corresponding symbols are different. In other words, it measures the minimum number of ''substitutions'' required to chan ...
s between the reference address and each location address. For each distance which is less or equal to the given radius the corresponding location is selected.
*The memory is represented by counters (where n is number of locations and m is the input data length) instead of single-bit storage elements.
*Writing to the memory, instead of overwriting, is as follows:
**if the i-bit of the input data is 1, the corresponding counters (counters in the selected locations (rows) and in the i-th columns) are incremented,
**if the i-bit of the input data is 0, the corresponding counters are decremented.
*Reading (or recall) from the memory is similar:
**The contents of the selected locations are summed columnwise.
**Each sum is thresholded. If the sum is greater than or equal to the threshold value the corresponding output bit is set to 1, in the opposite case it is cleared. Note that the thresholds may be zero, if the training input vectors are closed to orthogonal ones.
Neuron model
An idealized description of neuron
A neuron, neurone, or nerve cell is an electrically excitable cell that communicates with other cells via specialized connections called synapses. The neuron is the main component of nervous tissue in all animals except sponges and placozoa. N ...
is as follows: a neuron has a cell body with two kinds of branches: ''dendrites
Dendrites (from Greek δένδρον ''déndron'', "tree"), also dendrons, are branched protoplasmic extensions of a nerve cell that propagate the electrochemical stimulation received from other neural cells to the cell body, or soma, of the n ...
'' and an ''axon
An axon (from Greek ἄξων ''áxōn'', axis), or nerve fiber (or nerve fibre: see spelling differences), is a long, slender projection of a nerve cell, or neuron, in vertebrates, that typically conducts electrical impulses known as action po ...
''. It receives input signals from other neurons via dendrites, integrates (sums) them and generates its own (electric) output signal which is sent to outside neurons via axon. The points of electric contact between neurons are called ''synapses
In the nervous system, a synapse is a structure that permits a neuron (or nerve cell) to pass an electrical or chemical signal to another neuron or to the target effector cell.
Synapses are essential to the transmission of nervous impulses from ...
''.
When a neuron generates signal it is ''firing'' and after firing it must ''recover'' before it fires again.
The relative importance of a synapse to the firing of neuron is called ''synaptic weight'' (or ''input coefficient''). There are two kinds of synapses: excitatory
In neuroscience, an excitatory postsynaptic potential (EPSP) is a postsynaptic potential that makes the postsynaptic neuron more likely to fire an action potential. This temporary depolarization of postsynaptic membrane potential, caused by the ...
that trigger neuron to ''fire'' and inhibitory
An inhibitory postsynaptic potential (IPSP) is a kind of synaptic potential that makes a postsynaptic neuron less likely to generate an action potential.Purves et al. Neuroscience. 4th ed. Sunderland (MA): Sinauer Associates, Incorporated; 2008. ...
that hinder firing. The neuron is either excitatory or inhibitory according to the kinds of synapses its axon makes.
A neuron fires when the sum of inputs exceed a specific ''threshold''. The higher the threshold the more important it is that excitatory synapses have input while inhibitory ones do not. Whether a recovered neuron actually fires depends on whether it received sufficient excitatory input (beyond the threshold) and not too much of inhibitory input within a certain period.
The formal model of neuron makes further simplifying assumptions. An ''n''-input neuron is modeled by a ''linear threshold function'' as follows :
For where n is the number of inputs, let be the output at time ''t'': , and let be the ''i''-th input at time ''t'': . Let be the weight of the ''i''-th input and let be the threshold.
The ''weighted sum'' of the inputs at time ''t'' is defined by
The ''neuron output'' at time ''t'' is then defined as a boolean function
In mathematics, a Boolean function is a function whose arguments and result assume values from a two-element set (usually , or ). Alternative names are switching function, used especially in older computer science literature, and truth function ( ...
:
Where Ft=1 means that the neuron fires at time ''t'' and Ft=0 that it doesn't, i.e. in order for neuron to fire the weighted sum must reach or exceed the threshold .
Excitatory inputs increase the sum and inhibitory inputs decrease it.
Neuron as address-decoder
Kanerva's key thesis is that certain neurons could have their input coefficients and thresholds fixed over the entire life of an organism and used as address decoders where ''n''-tuple of input coefficients (the pattern to which neurons respond most readily) determines the ''n''-bit memory address, and the threshold controls the size of the region of similar address patterns to which the neuron responds.
This mechanism is complementary to adjustable synapses or adjustable weights in a neural network (perceptron
In machine learning, the perceptron (or McCulloch-Pitts neuron) is an algorithm for supervised learning of binary classifiers. A binary classifier is a function which can decide whether or not an input, represented by a vector of numbers, belon ...
convergence learning), as this fixed accessing mechanism would be a permanent ''frame of reference'' which allows to ''select'' the synapses in which the information is stored and from which it is retrieved under given set of circumstances. Furthermore, an encoding of the present circumstance would serve as an address.
The address ''a'' of a neuron with input coefficients ''w'' where is defined as an ''n''-bit input pattern that maximizes the weighted sum. The maximum occurs when the inhibitory inputs are zeros and the excitatory inputs are ones.
The ''i''-th bit of address is:
(assuming weights are non-zero)
The ''maximum weighted sum'' is then the sum of all positive coefficients:
And the ''minimum weighted sum'' would correspond to a point opposite the neuron address a`:
When the threshold c is in range