Sparse distributed memory (SDM) is a mathematical model of human
long-term memory
Long-term memory (LTM) is the stage of the Atkinson–Shiffrin memory model in which informative knowledge is held indefinitely. It is defined in contrast to sensory memory, the initial stage, and short-term or working memory, the second stage ...
introduced by
Pentti Kanerva in 1988 while he was at
NASA Ames Research Center
The Ames Research Center (ARC), also known as NASA Ames, is a major NASA research center at Moffett Federal Airfield in California's Silicon Valley. It was founded in 1939 as the second National Advisory Committee for Aeronautics (NACA) laborat ...
.
This memory exhibits behaviors, both in theory and in experiment, that resemble those previously unapproached by machines – e.g., rapid recognition of faces or odors, discovery of new connections between seemingly unrelated ideas, etc. Sparse distributed memory is used for storing and retrieving large amounts (
bits) of information without focusing on the accuracy but on similarity of information.
There are some recent applications in robot navigation and experience-based robot manipulation.
General principle
It is a generalized
random-access memory
Random-access memory (RAM; ) is a form of Computer memory, electronic computer memory that can be read and changed in any order, typically used to store working Data (computing), data and machine code. A random-access memory device allows ...
(RAM) for long (e.g., 1,000 bit) binary words. These words serve as both addresses to and data for the memory. The main attribute of the memory is sensitivity to similarity. This means that a word can be read back not only by giving the original write address but also by giving one close to it, as measured by the number of mismatched bits (i.e., the
Hamming distance
In information theory, the Hamming distance between two String (computer science), strings or vectors of equal length is the number of positions at which the corresponding symbols are different. In other words, it measures the minimum number ...
between
memory address
In computing, a memory address is a reference to a specific memory location in memory used by both software and hardware. These addresses are fixed-length sequences of digits, typically displayed and handled as unsigned integers. This numeric ...
es).
SDM implements transformation from logical space to physical space using distributed data representation and storage, similarly to
encoding
In communications and Data processing, information processing, code is a system of rules to convert information—such as a letter (alphabet), letter, word, sound, image, or gesture—into another form, sometimes data compression, shortened or ...
processes in human memory. A value corresponding to a logical address is stored into many physical addresses. This way of storing is robust and not deterministic. A memory cell is not addressed directly. If input data (logical addresses) are partially damaged at all, we can still get correct output data.
The theory of the memory is mathematically complete
and has been verified by
computer simulation
Computer simulation is the running of a mathematical model on a computer, the model being designed to represent the behaviour of, or the outcome of, a real-world or physical system. The reliability of some mathematical models can be determin ...
. It arose from the observation that the distances between points of a
high-dimensional space
In physics and mathematics, the dimension of a mathematical space (or object) is informally defined as the minimum number of coordinates needed to specify any point within it. Thus, a line has a dimension of one (1D) because only one coordi ...
resemble the proximity relations between
concept
A concept is an abstract idea that serves as a foundation for more concrete principles, thoughts, and beliefs.
Concepts play an important role in all aspects of cognition. As such, concepts are studied within such disciplines as linguistics, ...
s in human memory. The theory is also practical in that memories based on it can be implemented with conventional
random-access memory
Random-access memory (RAM; ) is a form of Computer memory, electronic computer memory that can be read and changed in any order, typically used to store working Data (computing), data and machine code. A random-access memory device allows ...
elements.
[Flynn, Michael J., Pentti Kanerva, and Neil Bhadkamkar. "Sparse distributed memory prototype: principles and operation." (1989).]
Definition
Human memory has a tendency to
congregate memories based on similarities between them (although they may not be related), such as "firetrucks are red and apples are red".
Sparse distributed memory is a mathematical representation of human memory, and uses
high-dimensional space
In physics and mathematics, the dimension of a mathematical space (or object) is informally defined as the minimum number of coordinates needed to specify any point within it. Thus, a line has a dimension of one (1D) because only one coordi ...
to help model the large amounts of memory that mimics that of the human neural network.
An
important property of such high dimensional spaces is that two randomly chosen vectors are relatively far away from each other, meaning that they are uncorrelated.
[ SDM can be considered a realization of ]locality-sensitive hashing
In computer science, locality-sensitive hashing (LSH) is a fuzzy hashing technique that hashes similar input items into the same "buckets" with high probability. (The number of buckets is much smaller than the universe of possible input items.) Si ...
.
The underlying idea behind a SDM is the mapping of a huge binary memory onto a smaller set of physical locations, so-called ''hard locations''. As a general guideline, those hard locations should be uniformly distributed in the virtual space
A virtual world (also called a virtual space or spaces) is a Computer simulation, computer-simulated environment which may be populated by many simultaneous users who can create a personal Avatar (computing), avatar and independently explore th ...
, to mimic the existence of the larger virtual space as accurately as possible. Every datum is stored distributed by a set of hard locations, and retrieved by averaging those locations. Therefore, recall may not be perfect, accuracy depending on the saturation of the memory.
Kanerva's proposal is based on four basic ideas:
# The boolean space , or points in dimensions, exhibits properties which are similar to humans' intuitive notions of relationships between the concepts. This means that it makes sense to store data as points of the mentioned space where each memory item is stored as an n-bit vector.
# Neurons with n inputs can be used as address decoders of a random-access memory
# Unifying principle: data stored into the memory can be used as addresses to the same memory. Distance between two points is a measure of similarity between two memory items. The closer the points, the more similar the stored vectors.
# Time can be traced in the memory as a function of where the data are stored, if the data are organized as sequences of events.
The binary space N
The SDM works with n-dimensional vectors with binary components. Depending on the context, the vectors are called points, patterns, addresses, words, memory items, data, or events. This section is mostly about the properties of the vector space N = . Let n be number of dimensions of the space. The number of points, or possible memory items, is then . We will denote this number by N and will use N and to stand also for the space itself.[Grebeníček, František. "Sparse Distributed Memory− Pattern Data Analysis. URL: http://www.fit.vutbr.cz/~grebenic/Publikace/mosis2000.pdf"]
Concepts Related to the space N: [
*''Origin'', 0: The point with all coordinates 0 is called the origin, 0 = 000...00.
*''Complement'', 'x: The complement, or opposite, of point x is the n-tuple that has ones where x has zeros and vice versa.
*''Norm'', , x, : The norm of point x is the number of ones in its binary representation.
*''Difference'', x − y: The difference of two points x and y is the n-tuple that has ones where x and y differ and zeros elsewhere. It is the bitwise ']exclusive or
Exclusive or, exclusive disjunction, exclusive alternation, logical non-equivalence, or logical inequality is a logical operator whose negation is the logical biconditional. With two inputs, XOR is true if and only if the inputs differ (on ...
': x − y = x ⊕ y. The difference commutes: x − y = y − x.
*''Distance'', d(x, y) The distance between two points x and y is the number of dimensions at which x and y differ. It is called the Hamming distance (its square root is the Euclidean distance
In mathematics, the Euclidean distance between two points in Euclidean space is the length of the line segment between them. It can be calculated from the Cartesian coordinates of the points using the Pythagorean theorem, and therefore is o ...
) and is expressed in bits. Distance is the norm of the difference: d(x, y) = , x − y,
*''Betweenness'', x:y:z: Point y is between points x and z if and only if the distance from x to z is the sum of the distances from x to y and from y to z; that is, x:y:z ⇔ d(x, z) = d(x, y) + d(y, z). It is easily seen that every bit of a point in between is a copy of the corresponding bit of an endpoint.
*''Orthogonality'', x ⊥ y: Point x is orthogonal to point y, or the two are perpendicular or indifferent, if and only if the distance between the two is half the number of dimensions: x ⊥ y ⇔ d(x, y) = n/2. The distance n/2 is called indifference distance of space N. If x is orthogonal to y, it is also orthogonal to its complement 'y (x is halfway between y and 'y).
*''Circle'', O(r,x) A circle with radius r and center x is the set of points that are at most r bits from x: O(r,x) = .
Properties of the space N: [
The space N can be represented by the vertices of the unit cube in n-dimensional ]Euclidean space
Euclidean space is the fundamental space of geometry, intended to represent physical space. Originally, in Euclid's ''Elements'', it was the three-dimensional space of Euclidean geometry, but in modern mathematics there are ''Euclidean spaces ...
. The vertices lie on the surface of an n-dimensional sphere with (Euclidean-metric) radius . This gives rise to the sphere
A sphere (from Ancient Greek, Greek , ) is a surface (mathematics), surface analogous to the circle, a curve. In solid geometry, a sphere is the Locus (mathematics), set of points that are all at the same distance from a given point in three ...
analogy. We will call a space spherical if
# any point x has a unique opposite 'x,
# the entire space is between any point x and its opposite 'x, and
# all points are "equal" (meaning that for any two points x and y there is a distance preserving automorphism
In mathematics, an automorphism is an isomorphism from a mathematical object to itself. It is, in some sense, a symmetry of the object, and a way of mapping the object to itself while preserving all of its structure. The set of all automorphism ...
of the space that maps x to y, so that from any of its points the space "looks" the same).
The surface of a sphere (in Euclidean 3d-space) clearly is spherical. According to definition, N is also spherical, since y ⊕ x ⊕ (…) is an automorphism that maps x to y.
Because N is spherical, it is helpful to think of it as the surface of a sphere with circumference
In geometry, the circumference () is the perimeter of a circle or ellipse. The circumference is the arc length of the circle, as if it were opened up and straightened out to a line segment. More generally, the perimeter is the curve length arou ...
2n. All points of N are equally qualified as points of origin, and a point and its complement are like two poles at distance n from each other, with the entire space in between. The points halfway between the poles and perpendicular to them are like the equator.
;Distribution of the space N
The number of points that are exactly d bits from an arbitrary point x (say, from the point 0) is the number of ways to choose d coordinates from a total of n coordinates, and is therefore given by the binomial coefficient
In mathematics, the binomial coefficients are the positive integers that occur as coefficients in the binomial theorem. Commonly, a binomial coefficient is indexed by a pair of integers and is written \tbinom. It is the coefficient of the t ...
:
The distribution of N thus is the binomial distribution with parameters n and p, where p = 1/2. The mean of the binomial distribution
In probability theory and statistics, the binomial distribution with parameters and is the discrete probability distribution of the number of successes in a sequence of statistical independence, independent experiment (probability theory) ...
is n/2, and the variance
In probability theory and statistics, variance is the expected value of the squared deviation from the mean of a random variable. The standard deviation (SD) is obtained as the square root of the variance. Variance is a measure of dispersion ...
is n/4. This distribution function will be denoted by N(d).
The normal distribution
In probability theory and statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is
f(x) = \frac ...
F with mean n/2 and standard deviation
In statistics, the standard deviation is a measure of the amount of variation of the values of a variable about its Expected value, mean. A low standard Deviation (statistics), deviation indicates that the values tend to be close to the mean ( ...
is a good approximation to it:
N(d) = Pr ≅ F
;Tendency to orthogonality
An outstanding property of N is that most of it lies at approximately the mean (indifference) distance n/2 from a point (and its complement). In
other words, most of the space is nearly orthogonal to any given point, and the larger n is, the more pronounced is this effect.
As neural network
The SDM may be regarded either as a content-addressable extension of a classical random-access memory
Random-access memory (RAM; ) is a form of Computer memory, electronic computer memory that can be read and changed in any order, typically used to store working Data (computing), data and machine code. A random-access memory device allows ...
(RAM) or as a special type of three layer feedforward neural network
Feedforward refers to recognition-inference architecture of neural networks. Artificial neural network architectures are based on inputs multiplied by weights to obtain outputs (inputs-to-output): feedforward. Recurrent neural networks, or neur ...
. The main SDM alterations to the RAM are:[Grebenıcek, František. Neural Nets as Associative Memories. Diss. Brno University of Technology, 2001. URL: http://www.vutium.vutbr.cz/tituly/pdf/ukazka/80-214-1914-8.pdf ]
*The SDM calculates Hamming distance
In information theory, the Hamming distance between two String (computer science), strings or vectors of equal length is the number of positions at which the corresponding symbols are different. In other words, it measures the minimum number ...
s between the reference address and each location address. For each distance which is less or equal to the given radius the corresponding location is selected.
*The memory is represented by counters (where n is number of locations and m is the input data length) instead of single-bit storage elements.
*Writing to the memory, instead of overwriting, is as follows:
**if the i-bit of the input data is 1, the corresponding counters (counters in the selected locations (rows) and in the i-th columns) are incremented,
**if the i-bit of the input data is 0, the corresponding counters are decremented.
*Reading (or recall) from the memory is similar:
**The contents of the selected locations are summed columnwise.
**Each sum is thresholded. If the sum is greater than or equal to the threshold value the corresponding output bit is set to 1, in the opposite case it is cleared. Note that the thresholds may be zero, if the training input vectors are closed to orthogonal ones.
Neuron model
An idealized description of neuron
A neuron (American English), neurone (British English), or nerve cell, is an membrane potential#Cell excitability, excitable cell (biology), cell that fires electric signals called action potentials across a neural network (biology), neural net ...
is as follows: a neuron has a cell body with two kinds of branches: ''dendrites
A dendrite (from Greek δένδρον ''déndron'', "tree") or dendron is a branched cytoplasmic process that extends from a nerve cell that propagates the electrochemical stimulation received from other neural cells to the cell body, or soma ...
'' and an ''axon
An axon (from Greek ἄξων ''áxōn'', axis) or nerve fiber (or nerve fibre: see American and British English spelling differences#-re, -er, spelling differences) is a long, slender cellular extensions, projection of a nerve cell, or neuron, ...
''. It receives input signals from other neurons via dendrites, integrates (sums) them and generates its own (electric) output signal which is sent to outside neurons via axon. The points of electric contact between neurons are called ''synapses
In the nervous system, a synapse is a structure that allows a neuron (or nerve cell) to pass an electrical or chemical signal to another neuron or a target effector cell. Synapses can be classified as either chemical or electrical, depending o ...
''.
When a neuron generates signal it is ''firing'' and after firing it must ''recover'' before it fires again.
The relative importance of a synapse to the firing of neuron is called ''synaptic weight'' (or ''input coefficient''). There are two kinds of synapses: excitatory
In neuroscience, an excitatory postsynaptic potential (EPSP) is a postsynaptic potential that makes the postsynaptic neuron more likely to fire an action potential. This temporary depolarization of postsynaptic membrane potential, caused by the ...
that trigger neuron to ''fire'' and inhibitory
An inhibitory postsynaptic potential (IPSP) is a kind of synaptic potential that makes a Chemical synapse, postsynaptic neuron less likely to generate an action potential.Purves et al. Neuroscience. 4th ed. Sunderland (MA): Sinauer Associates, Inc ...
that hinder firing. The neuron is either excitatory or inhibitory according to the kinds of synapses its axon makes.
A neuron fires when the sum of inputs exceed a specific ''threshold''. The higher the threshold the more important it is that excitatory synapses have input while inhibitory ones do not. Whether a recovered neuron actually fires depends on whether it received sufficient excitatory input (beyond the threshold) and not too much of inhibitory input within a certain period.
The formal model of neuron makes further simplifying assumptions. An ''n''-input neuron is modeled by a ''linear threshold function'' as follows :
For where n is the number of inputs, let be the output at time ''t'': , and let be the ''i''-th input at time ''t'': . Let be the weight of the ''i''-th input and let be the threshold.
The ''weighted sum'' of the inputs at time ''t'' is defined by
The ''neuron output'' at time ''t'' is then defined as a boolean function
In mathematics, a Boolean function is a function whose arguments and result assume values from a two-element set (usually , or ). Alternative names are switching function, used especially in older computer science literature, and truth functi ...
:
Where Ft=1 means that the neuron fires at time ''t'' and Ft=0 that it doesn't, i.e. in order for neuron to fire the weighted sum must reach or exceed the threshold .
Excitatory inputs increase the sum and inhibitory inputs decrease it.
Neuron as address-decoder
Kanerva's key thesis is that certain neurons could have their input coefficients and thresholds fixed over the entire life of an organism and used as address decoders where ''n''-tuple of input coefficients (the pattern to which neurons respond most readily) determines the ''n''-bit memory address, and the threshold controls the size of the region of similar address patterns to which the neuron responds.
This mechanism is complementary to adjustable synapses or adjustable weights in a neural network (perceptron
In machine learning, the perceptron is an algorithm for supervised classification, supervised learning of binary classification, binary classifiers. A binary classifier is a function that can decide whether or not an input, represented by a vect ...
convergence learning), as this fixed accessing mechanism would be a permanent ''frame of reference'' which allows to ''select'' the synapses in which the information is stored and from which it is retrieved under given set of circumstances. Furthermore, an encoding of the present circumstance would serve as an address.
The address ''a'' of a neuron with input coefficients ''w'' where is defined as an ''n''-bit input pattern that maximizes the weighted sum. The maximum occurs when the inhibitory inputs are zeros and the excitatory inputs are ones.
The ''i''-th bit of address is:
(assuming weights are non-zero)
The ''maximum weighted sum'' is then the sum of all positive coefficients:
And the ''minimum weighted sum'' would correspond to a point opposite the neuron address a`:
When the threshold c is in range