HOME

TheInfoList



OR:

Sparse distributed memory (SDM) is a mathematical model of human
long-term memory Long-term memory (LTM) is the stage of the Atkinson–Shiffrin memory model in which informative knowledge is held indefinitely. It is defined in contrast to short-term and working memory, which persist for only about 18 to 30 seconds. Long-t ...
introduced by
Pentti Kanerva Pentti Kanerva is a research affiliate at the Redwood Neuroscience Institute, and is the originator of the sparse distributed memory model.Kanerva, Pentti. Sparse distributed memory. MIT press, 1988. He is responsible for relating the properties of ...
in 1988 while he was at
NASA Ames Research Center The Ames Research Center (ARC), also known as NASA Ames, is a major NASA research center at Moffett Federal Airfield in California's Silicon Valley. It was founded in 1939 as the second National Advisory Committee for Aeronautics (NACA) labora ...
. It is a generalized
random-access memory Random-access memory (RAM; ) is a form of computer memory that can be read and changed in any order, typically used to store working data and machine code. A random-access memory device allows data items to be read or written in almost the ...
(RAM) for long (e.g., 1,000 bit) binary words. These words serve as both addresses to and data for the memory. The main attribute of the memory is sensitivity to similarity, meaning that a word can be read back not only by giving the original write address but also by giving one close to it, as measured by the number of mismatched bits (i.e., the
Hamming distance In information theory, the Hamming distance between two strings of equal length is the number of positions at which the corresponding symbols are different. In other words, it measures the minimum number of ''substitutions'' required to chan ...
between
memory address In computing, a memory address is a reference to a specific memory location used at various levels by software and hardware. Memory addresses are fixed-length sequences of digits conventionally displayed and manipulated as unsigned integers. ...
es). SDM implements transformation from logical space to physical space using distributed data representation and storage, similarly to encoding processes in human memory. A value corresponding to a logical address is stored into many physical addresses. This way of storing is robust and not deterministic. A memory cell is not addressed directly. If input data (logical addresses) are partially damaged at all, we can still get correct output data. The theory of the memory is mathematically complete and has been verified by
computer simulation Computer simulation is the process of mathematical modelling, performed on a computer, which is designed to predict the behaviour of, or the outcome of, a real-world or physical system. The reliability of some mathematical models can be deter ...
. It arose from the observation that the distances between points of a high-dimensional space resemble the proximity relations between
concept Concepts are defined as abstract ideas. They are understood to be the fundamental building blocks of the concept behind principles, thoughts and beliefs. They play an important role in all aspects of cognition. As such, concepts are studied by ...
s in human memory. The theory is also practical in that memories based on it can be implemented with conventional
random-access memory Random-access memory (RAM; ) is a form of computer memory that can be read and changed in any order, typically used to store working data and machine code. A random-access memory device allows data items to be read or written in almost the ...
elements.Flynn, Michael J., Pentti Kanerva, and Neil Bhadkamkar. "Sparse distributed memory prototype: principles and operation." (1989).


Definition

Human memory has a tendency to congregate memories based on similarities between them (although they may not be related), such as "firetrucks are red and apples are red". Sparse distributed memory is a mathematical representation of human memory, and uses high-dimensional space to help model the large amounts of memory that mimics that of the human neural network. An important property of such high dimensional spaces is that two randomly chosen vectors are relatively far away from each other, meaning that they are uncorrelated. SDM can be considered a realization of Locality-sensitive hashing. The underlying idea behind a SDM is the mapping of a huge binary memory onto a smaller set of physical locations, so-called ''hard locations''. As a general guideline, those hard locations should be uniformly distributed in the virtual space, to mimic the existence of the larger virtual space as accurately as possible. Every datum is stored distributed by a set of hard locations, and retrieved by averaging those locations. Therefore, recall may not be perfect, accuracy depending on the saturation of the memory. Kanerva's proposal is based on four basic ideas: *1. The boolean space \^n, or 2^n points in 10^0 < n < 10^5 dimensions, exhibits properties which are similar to humans' intuitive notions of relationships between the concepts. This means that it makes sense to store data as points of the mentioned space where each memory item is stored as an n-bit vector. *2. Neurons with n inputs can be used as address decoders of a random-access memory *3. Unifying principle: data stored into the memory can be used as addresses to the same memory. Distance between two points is a measure of similarity between two memory items. The closer the points, the more similar the stored vectors. *4. Time can be traced in the memory as a function of where the data are stored, if the data are organized as sequences of events


The binary space N

The SDM works with n-dimensional vectors with binary components. Depending on the context, the vectors are called points, patterns, addresses, words, memory items, data, or events. This section is mostly about the properties of the vector space N = \^n. Let n be number of dimensions of the space. The number of points, or possible memory items, is then 2^n. We will denote this number by N and will use N and 2^n to stand also for the space itself.Grebeníček, František. "Sparse Distributed Memory− Pattern Data Analysis. URL: http://www.fit.vutbr.cz/~grebenic/Publikace/mosis2000.pdf" Concepts Related to the space N: \^n *''Origin'', 0: The point with all coordinates 0 is called the origin, 0 = 000...00. *''Complement'', 'x: The complement, or opposite, of point x is the n-tuple that has ones where x has zeros and vice versa. *''Norm'', , x, : The norm of point x is the number of ones in its binary representation. *''Difference'', x − y: The difference of two points x and y is the n-tuple that has ones where x and y differ and zeros elsewhere. It is the bitwise '
exclusive or Exclusive or or exclusive disjunction is a logical operation that is true if and only if its arguments differ (one is true, the other is false). It is symbolized by the prefix operator J and by the infix operators XOR ( or ), EOR, EXOR, , ...
': x − y = x ⊕ y. The difference commutes: x − y = y − x. *''Distance'', d(x, y) The distance between two points x and y is the number of dimensions at which x and y differ. It is called the Hamming distance (its square root is the
Euclidean distance In mathematics, the Euclidean distance between two points in Euclidean space is the length of a line segment between the two points. It can be calculated from the Cartesian coordinates of the points using the Pythagorean theorem, therefore ...
) and is expressed in bits. Distance is the norm of the difference: d(x, y) = , x − y, *''Betweenness'', x:y:z: Point y is between points x and z if and only if the distance from x to z is the sum of the distances from x to y and from y to z; that is, x:y:z ⇔ d(x, z) = d(x, y) + d(y, z). It is easily seen that every bit of a point in between is a copy of the corresponding bit of an endpoint. *''Orthogonality'', x ⊥ y: Point x is orthogonal to point y, or the two are perpendicular or indifferent, if and only if the distance between the two is half the number of dimensions: x ⊥ y ⇔ d(x, y) = n/2. The distance n/2 is called indifference distance of space N. If x is orthogonal to y, it is also orthogonal to its complement 'y (x is halfway between y and 'y). *''Circle'', O(r,x) A circle with radius r and center x is the set of points that are at most r bits from x: O(r,x) = . Properties of the space N: \^n The space N can be represented by the vertices of the unit cube in n-dimensional
Euclidean space Euclidean space is the fundamental space of geometry, intended to represent physical space. Originally, that is, in Euclid's ''Elements'', it was the three-dimensional space of Euclidean geometry, but in modern mathematics there are Euclidea ...
. The vertices lie on the surface of an n-dimensional sphere with (Euclidean-metric) radius \sqrt/2 . This gives rise to the
sphere A sphere () is a geometrical object that is a three-dimensional analogue to a two-dimensional circle. A sphere is the set of points that are all at the same distance from a given point in three-dimensional space.. That given point is the c ...
analogy. We will call a space spherical if *1. any point x has a unique opposite 'x, *2. the entire space is between any point x and its opposite 'x, and *3. all points are "equal" (meaning that for any two points x and y there is a distance preserving
automorphism In mathematics, an automorphism is an isomorphism from a mathematical object to itself. It is, in some sense, a symmetry of the object, and a way of mapping the object to itself while preserving all of its structure. The set of all automorphis ...
of the space that maps x to y, so that from any of its points the space "looks" the same). The surface of a sphere (in Euclidean 3d-space) clearly is spherical. According to definition, N is also spherical, since y ⊕ x ⊕ (…) is an automorphism that maps x to y. Because N is spherical, it is helpful to think of it as the surface of a sphere with
circumference In geometry, the circumference (from Latin ''circumferens'', meaning "carrying around") is the perimeter of a circle or ellipse. That is, the circumference would be the arc length of the circle, as if it were opened up and straightened out ...
2n. All points of N are equally qualified as points of origin, and a point and its complement are like two poles at distance n from each other, with the entire space in between. The points halfway between the poles and perpendicular to them are like the equator. ;Distribution of the space N The number of points that are exactly d bits from an arbitrary point x (say, from the point 0) is the number of ways to choose d coordinates from a total of n coordinates, and is therefore given by the
binomial coefficient In mathematics, the binomial coefficients are the positive integers that occur as coefficients in the binomial theorem. Commonly, a binomial coefficient is indexed by a pair of integers and is written \tbinom. It is the coefficient of the t ...
: \binom nd = \frac The distribution of N thus is the binomial distribution with parameters n and p, where p = 1/2. The mean of the
binomial distribution In probability theory and statistics, the binomial distribution with parameters ''n'' and ''p'' is the discrete probability distribution of the number of successes in a sequence of ''n'' independent experiments, each asking a yes–no ques ...
is n/2, and the
variance In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbe ...
is n/4. This distribution function will be denoted by N(d). The
normal distribution In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is : f(x) = \frac e^ The parameter \mu ...
F with mean n/2 and
standard deviation In statistics, the standard deviation is a measure of the amount of variation or dispersion of a set of values. A low standard deviation indicates that the values tend to be close to the mean (also called the expected value) of the set, whil ...
\sqrt/2 is a good approximation to it: N(d) = Pr ≅ F ;Tendency to orthogonality An outstanding property of N is that most of it lies at approximately the mean (indifference) distance n/2 from a point (and its complement). In other words, most of the space is nearly orthogonal to any given point, and the larger n is, the more pronounced is this effect.


As neural network

The SDM may be regarded either as a content-addressable extension of a classical
random-access memory Random-access memory (RAM; ) is a form of computer memory that can be read and changed in any order, typically used to store working data and machine code. A random-access memory device allows data items to be read or written in almost the ...
(RAM) or as a special type of three layer
feedforward neural network A feedforward neural network (FNN) is an artificial neural network wherein connections between the nodes do ''not'' form a cycle. As such, it is different from its descendant: recurrent neural networks. The feedforward neural network was the ...
. The main SDM alterations to the RAM are:Grebenıcek, František. Neural Nets as Associative Memories. Diss. Brno University of Technology, 2001. URL: http://www.vutium.vutbr.cz/tituly/pdf/ukazka/80-214-1914-8.pdf *The SDM calculates
Hamming distance In information theory, the Hamming distance between two strings of equal length is the number of positions at which the corresponding symbols are different. In other words, it measures the minimum number of ''substitutions'' required to chan ...
s between the reference address and each location address. For each distance which is less or equal to the given radius the corresponding location is selected. *The memory is represented by counters (where n is number of locations and m is the input data length) instead of single-bit storage elements. *Writing to the memory, instead of overwriting, is as follows: **if the i-bit of the input data is 1, the corresponding counters (counters in the selected locations (rows) and in the i-th columns) are incremented, **if the i-bit of the input data is 0, the corresponding counters are decremented. *Reading (or recall) from the memory is similar: **The contents of the selected locations are summed columnwise. **Each sum is thresholded. If the sum is greater than or equal to the threshold value the corresponding output bit is set to 1, in the opposite case it is cleared. Note that the thresholds may be zero, if the training input vectors are closed to orthogonal ones.


Neuron model

An idealized description of
neuron A neuron, neurone, or nerve cell is an electrically excitable cell that communicates with other cells via specialized connections called synapses. The neuron is the main component of nervous tissue in all animals except sponges and placozoa ...
is as follows: a neuron has a cell body with two kinds of branches: ''
dendrites Dendrites (from Greek δένδρον ''déndron'', "tree"), also dendrons, are branched protoplasmic extensions of a nerve cell that propagate the electrochemical stimulation received from other neural cells to the cell body, or soma, of the ...
'' and an ''
axon An axon (from Greek ἄξων ''áxōn'', axis), or nerve fiber (or nerve fibre: see spelling differences), is a long, slender projection of a nerve cell, or neuron, in vertebrates, that typically conducts electrical impulses known as action p ...
''. It receives input signals from other neurons via dendrites, integrates (sums) them and generates its own (electric) output signal which is sent to outside neurons via axon. The points of electric contact between neurons are called ''
synapses In the nervous system, a synapse is a structure that permits a neuron (or nerve cell) to pass an electrical or chemical signal to another neuron or to the target effector cell. Synapses are essential to the transmission of nervous impulses fr ...
''. When a neuron generates signal it is ''firing'' and after firing it must ''recover'' before it fires again. The relative importance of a synapse to the firing of neuron is called ''synaptic weight'' (or ''input coefficient''). There are two kinds of synapses: excitatory that trigger neuron to ''fire'' and inhibitory that hinder firing. The neuron is either excitatory or inhibitory according to the kinds of synapses its axon makes. A neuron fires when the sum of inputs exceed a specific ''threshold''. The higher the threshold the more important it is that excitatory synapses have input while inhibitory ones do not. Whether a recovered neuron actually fires depends on whether it received sufficient excitatory input (beyond the threshold) and not too much of inhibitory input within a certain period. The formal model of neuron makes further simplifying assumptions. An ''n''-input neuron is modeled by a ''linear threshold function'' F: \^n -> \ as follows : For i = 0,...,n-1 where n is the number of inputs, let F_t be the output at time ''t'': F_t\in\, and let x_ be the ''i''-th input at time ''t'': x_\in\. Let w_i be the weight of the ''i''-th input and let c be the threshold. The ''weighted sum'' of the inputs at time ''t'' is defined by S_t=\sum_^ w_ix_ The ''neuron output'' at time ''t'' is then defined as a
boolean function In mathematics, a Boolean function is a function whose arguments and result assume values from a two-element set (usually , or ). Alternative names are switching function, used especially in older computer science literature, and truth function ...
: \mathbf F_t = \begin 1 &\text S_t >= c, \\ 0 &\text. \end Where Ft=1 means that the neuron fires at time ''t'' and Ft=0 that it doesn't, i.e. in order for neuron to fire the weighted sum must reach or exceed the threshold . Excitatory inputs increase the sum and inhibitory inputs decrease it.


Neuron as address-decoder

Kanerva's key thesis is that certain neurons could have their input coefficients and thresholds fixed over the entire life of an organism and used as address decoders where ''n''-tuple of input coefficients (the pattern to which neurons respond most readily) determines the ''n''-bit memory address, and the threshold controls the size of the region of similar address patterns to which the neuron responds. This mechanism is complementary to adjustable synapses or adjustable weights in a neural network (
perceptron In machine learning, the perceptron (or McCulloch-Pitts neuron) is an algorithm for supervised learning of binary classifiers. A binary classifier is a function which can decide whether or not an input, represented by a vector of numbers, belon ...
convergence learning), as this fixed accessing mechanism would be a permanent ''frame of reference'' which allows to ''select'' the synapses in which the information is stored and from which it is retrieved under given set of circumstances. Furthermore, an encoding of the present circumstance would serve as an address. The address ''a'' of a neuron with input coefficients ''w'' where w_0,..,w_ is defined as an ''n''-bit input pattern that maximizes the weighted sum. The maximum occurs when the inhibitory inputs are zeros and the excitatory inputs are ones. The ''i''-th bit of address is: \mathbf a_i = \begin 1 &\text w_i > 0, \\ 0 &\text w_i < 0. \end (assuming weights are non-zero) The ''maximum weighted sum'' S(w) is then the sum of all positive coefficients: S(w)=\sum_ w_i And the ''minimum weighted sum'' s(w) would correspond to a point opposite the neuron address a`: s(w)=\sum_ w_i When the threshold c is in range s(w) the output of the neuron is 0 for some addresses (input patterns) and 1 for others. If the threshold is above S the output is always 0, if it's below s the output is always 1. So by a proper choice of the threshold a neuron responds only to just one address. When the threshold is S (the maximum for the weighted sum) the neuron responds only to its own address and acts like an address decoder of a conventional
random-access memory Random-access memory (RAM; ) is a form of computer memory that can be read and changed in any order, typically used to store working data and machine code. A random-access memory device allows data items to be read or written in almost the ...
.


Memory location

SDM is designed to cope with address patterns that span an enormous address space (order of 2^). SDM assumes that the address patterns actually describing physical situations of interest are ''sparsely'' scattered throughout the input space. It is impossible to reserve a separate physical location corresponding to each possible input; SDM implements only a limited number of physical or ''hard'' locations. The physical location is called a memory (or ''hard'') location. Every hard location has associated with it two items: * a fixed hard address, which is the N-bit address of the location * a contents portion that is M-bits wide and that can accumulate multiple M-bit data patterns written into the location. The contents' portion is not fixed; it is modified by data patterns written into the memory. In SDM a word could be stored in memory by writing it in a free storage location and at the same time providing the location with the appropriate address decoder. A neuron as an address decoder would select a location based on similarity of the location's address to the retrieval cue. Unlike conventional Turing machines SDM is taking advantage of ''parallel computing by the address decoders''. The mere ''accessing the memory'' is regarded as computing, the amount of which increases with memory size.


Address pattern

An N-bit vector used in writing to and reading from the memory. The address pattern is a coded description of an environmental state. (e.g. N = 256.)


Data pattern

An M-bit vector that is the object of the writing and reading operations. Like the address pattern, it is a coded description of an environmental state. (e.g. M = 256.)


Writing

Writing is the operation of storing a data pattern into the memory using a particular address pattern. During a write, the input to the memory consists of an address pattern and a data pattern. The address pattern is used to select ''hard'' memory locations whose hard addresses are within a certain cutoff distance from the address pattern. The data pattern is stored into each of the selected locations.


Reading

Reading is the operation of retrieving a data pattern from the memory using a particular address pattern. During a read, an address pattern is used to select a certain number of ''hard'' memory locations (just like during a write). The contents of the selected locations are bitwise summed and thresholded to derive an M-bit data pattern. This serves as the output read from the memory.


Pointer chains

All of the items are linked in a single list (or array) of pointers to memory locations, and are stored in RAM. Each address in an array points to an individual line in the memory. That line is then returned if it is similar to other lines. Neurons are utilized as address decoders and encoders, similar to the way neurons work in the brain, and return items from the array that match or are similar.


Critical distance

Kanerva's model of memory has a concept of a ''critical point'': prior to this point, a previously stored item can be easily retrieved; but beyond this point an item cannot be retrieved. Kanerva has methodically calculated this point for a particular set of (fixed) parameters. The corresponding ''critical distance'' of a Sparse Distributed Memory can be approximately evaluated minimizing the following equation with the restriction d \in N and d \leqslant n. The proof can be found in, \tilde(d) = \left\^2 Where: * d: is the distance to the target; * n: is the number of dimensions; * N: is the normalized normal distribution with mean zero and variance one; * w: is the number of times the target bitstring was written in memory; * \theta: is the total of random bitstrings in all h hard-locations activated by a read operation; i.e., the size of a cell assembly; * shared(d): is the mean number of shared hard-locations activated by two bitstrings d bits away from each other. One can find some values for a 1000-dimensional SDM in Kanerva's book, Table 7.1, p. 63, or the equations to calculate to any SDM in Appendix B, p. 125 of the same book.


Probabilistic interpretation

An associative memory system using sparse, distributed representations can be reinterpreted as an importance sampler, a Monte Carlo method of approximating
Bayesian inference Bayesian inference is a method of statistical inference in which Bayes' theorem is used to update the probability for a hypothesis as more evidence or information becomes available. Bayesian inference is an important technique in statistics, and ...
. The SDM can be considered a Monte Carlo approximation to a multidimensional
conditional probability In probability theory, conditional probability is a measure of the probability of an event occurring, given that another event (by assumption, presumption, assertion or evidence) has already occurred. This particular method relies on event B occu ...
integral. The SDM will produce acceptable responses from a training set when this approximation is valid, that is, when the training set contains sufficient data to provide good estimates of the underlying joint probabilities and there are enough Monte Carlo samples to obtain an accurate estimate of the integral.


Biological plausibility

Sparse coding Neural coding (or Neural representation) is a neuroscience field concerned with characterising the hypothetical relationship between the stimulus and the individual or ensemble neuronal responses and the relationship among the electrical activit ...
may be a general strategy of neural systems to augment memory capacity. To adapt to their environments, animals must learn which stimuli are associated with rewards or punishments and distinguish these reinforced stimuli from similar but irrelevant ones. Such task requires implementing stimulus-specific associative memories in which only a few neurons out of a
population Population typically refers to the number of people in a single area, whether it be a city or town, region, country, continent, or the world. Governments typically quantify the size of the resident population within their jurisdiction usi ...
respond to any given stimulus and each neuron responds to only a few stimuli out of all possible stimuli. Theoretical work on SDM by Kanerva has suggested that sparse coding increases the capacity of associative memory by reducing overlap between representations. Experimentally, sparse representations of sensory information have been observed in many systems, including vision, audition, touch, and olfaction. However, despite the accumulating evidence for widespread sparse coding and theoretical arguments for its importance, a demonstration that sparse coding improves the stimulus-specificity of associative memory has been lacking until recently. Some progress has been made in 2014 by
Gero Miesenböck Gero Andreas Miesenböck (born 15 July 1965) is an Austrian scientist. He is currently Waynflete Professor of Physiology and Director of the Centre for Neural Circuits and Behaviour (CNCB) at the University of Oxford and a fellow of Magdale ...
's lab at the
University of Oxford , mottoeng = The Lord is my light , established = , endowment = £6.1 billion (including colleges) (2019) , budget = £2.145 billion (2019–20) , chancellor ...
analyzing
Drosophila ''Drosophila'' () is a genus of flies, belonging to the family Drosophilidae, whose members are often called "small fruit flies" or (less frequently) pomace flies, vinegar flies, or wine flies, a reference to the characteristic of many speci ...
Olfactory system The olfactory system, or sense of smell, is the sensory system used for smelling ( olfaction). Olfaction is one of the special senses, that have directly associated specific organs. Most mammals and reptiles have a main olfactory system and an ...
. In Drosophila, sparse odor coding by the Kenyon cells of the
mushroom body The mushroom bodies or ''corpora pedunculata'' are a pair of structures in the brain of insects, other arthropods, and some annelids (notably the ragworm ''Platynereis dumerilii''). They are known to play a role in olfactory learning and memory ...
is thought to generate a large number of precisely addressable locations for the storage of odor-specific memories. Lin et al. demonstrated that sparseness is controlled by a negative feedback circuit between Kenyon cells and the
GABAergic In molecular biology and physiology, something is GABAergic or GABAnergic if it pertains to or affects the neurotransmitter GABA. For example, a synapse is GABAergic if it uses GABA as its neurotransmitter, and a GABAergic neuron produces GABA. A ...
anterior paired lateral (APL) neuron. Systematic activation and blockade of each leg of this feedback circuit show that Kenyon cells activate APL and APL inhibits Kenyon cells. Disrupting the Kenyon cell-APL feedback loop decreases the sparseness of Kenyon cell odor responses, increases inter-odor correlations, and prevents flies from learning to discriminate similar, but not dissimilar, odors. These results suggest that feedback inhibition suppresses Kenyon cell activity to maintain sparse, decorrelated odor coding and thus the odor-specificity of memories. A 2017 publication in
Science Science is a systematic endeavor that builds and organizes knowledge in the form of testable explanations and predictions about the universe. Science may be as old as the human species, and some of the earliest archeological evidence ...
showed that fly olfactory circuit implements an improved version of binary
locality sensitive hashing In computer science, locality-sensitive hashing (LSH) is an algorithmic technique that hashes similar input items into the same "buckets" with high probability. (The number of buckets is much smaller than the universe of possible input items.) Sinc ...
via sparse, random projections.


Applications

In applications of the memory, the words are patterns of features. Some features are produced by a sensory system, others control a motor system. There is a ''current pattern'' (of e.g. 1000 bits), which is the current contents of the system's ''focus''. The sensors feed into the focus, the motors are driven from the focus, and the memory is accessed through the focus. What goes on in the world the system's "subjective" experience is represented internally by a sequence of patterns in the focus. The memory stores this sequence and can recreate it later in the focus if addressed with a pattern similar to one encountered in the past. Thus, the memory learns to ''predict'' what is about to happen. Wide applications of the memory would be in systems that deal with real-world information in real time. The applications include
vision Vision, Visions, or The Vision may refer to: Perception Optical perception * Visual perception, the sense of sight * Visual system, the physical mechanism of eyesight * Computer vision, a field dealing with how computers can be made to gain und ...
detecting and identifying objects in a scene and anticipating subsequent scenes
robotics Robotics is an interdisciplinary branch of computer science and engineering. Robotics involves design, construction, operation, and use of robots. The goal of robotics is to design machines that can help and assist humans. Robotics integrat ...
, signal detection and verification, and adaptive
learning Learning is the process of acquiring new understanding, knowledge, behaviors, skills, values, attitudes, and preferences. The ability to learn is possessed by humans, animals, and some machines; there is also evidence for some kind of lea ...
and
control Control may refer to: Basic meanings Economics and business * Control (management), an element of management * Control, an element of management accounting * Comptroller (or controller), a senior financial officer in an organization * Controlli ...
. On the theoretical side, the working of the memory may help us understand
memory Memory is the faculty of the mind by which data or information is encoded, stored, and retrieved when needed. It is the retention of information over time for the purpose of influencing future action. If past events could not be remember ...
and
learning Learning is the process of acquiring new understanding, knowledge, behaviors, skills, values, attitudes, and preferences. The ability to learn is possessed by humans, animals, and some machines; there is also evidence for some kind of lea ...
in humans and animals.


The Best Match Search

SDM can be applied to the problem of finding the ''best match'' to a test word in a dataset of stored words. or, in other words, the
Nearest neighbor search Nearest neighbor search (NNS), as a form of proximity search, is the optimization problem of finding the point in a given set that is closest (or most similar) to a given point. Closeness is typically expressed in terms of a dissimilarity function ...
problem. Consider a memory with N locations where N = 2^n. Let each location have the capacity for one ''n''-bit word (e.g. N= 2100 100-bit words), and let the address decoding be done by N address decoder neurons. Set the threshold of each neuron ''x'' to its maximum weighted sum , x, and use a common parameter ''d'' to adjust all thresholds when accessing the memory. The effective threshold of neuron ''x'' will be then x -, d, which means that the location ''x'' is accessible every time the address ''x'' is within ''d'' bits of the address presented to memory (i.e. the address held by the address register). With d=0 we have a conventional
random-access memory Random-access memory (RAM; ) is a form of computer memory that can be read and changed in any order, typically used to store working data and machine code. A random-access memory device allows data items to be read or written in almost the ...
. Assume further that each location has a special ''location-occupied'' bit that can be accessed in the same way as the regular datum bits. Writing a word to a location sets this ''location-occupied'' bit. Assume that only occupied location can be read. To file the data in memory, start by setting d=n and issue a command to clear the ''location-occupied'' bit. This single operation marks all memory as unoccupied regardless of the values of the address register. Then set d=0 and write each word ''y'' of the data set with ''y'' itself as the address. Notice that each write operation affects only one location: the location ''y''. Filing time is thus proportional to the number of words in the dataset. Finding the best match for a test word ''z'', involves placing ''z'' in the address register and finding the least distance ''d'' for which there is an occupied location. We can start the search by setting d=0 and incrementing ''d'' successively until an occupied location is found. This method gives average search times that are proportional to the number of address bits or slightly less than n/2 because the nearest occupied location can be expected to be just under n/2 bits from ''z'' (with
binary search In computer science, binary search, also known as half-interval search, logarithmic search, or binary chop, is a search algorithm that finds the position of a target value within a sorted array. Binary search compares the target value to the ...
on ''d'' this would be O(log(n)). With 100-bit words 2100 locations would be needed, i.e. an enormously large memory. However ''if we construct the memory as we store the words of the dataset'' we need only one location (and one address decoder) for each word of the data set. None of the unoccupied locations need to be present. This represents the aspect of ''sparseness'' in SDM.


Speech recognition

SDM can be applied in transcribing speech, with the training consisting of "listening" to a large corpus of spoken
language Language is a structured system of communication. The structure of a language is its grammar and the free components are its vocabulary. Languages are the primary means by which humans communicate, and may be conveyed through a variety of ...
. Two hard problems with natural speech are how to detect word boundaries and how to adjust to different speakers. The memory should be able to handle both. First, it stores sequences of patterns as pointer chains. In training in listening to speech it will build a probabilistic structure with the highest incidence of branching at word boundaries. In transcribing speech, these branching points are detected and tend to break the stream into segments that correspond to words. Second, the memory's sensitivity to similarity is its mechanism for adjusting to different speakers and to the variations in the voice of the same speaker.


"Realizing forgetting"

At the University of Memphis, Uma Ramamurthy, Sidney K. D'Mello, and Stan Franklin created a modified version of the sparse distributed memory system that represents "realizing forgetting." It uses a decay equation to better show interference in data. The sparse distributed memory system distributes each pattern into approximately one hundredth of the locations, so interference can have detrimental results. Two possible examples of decay from this modified sparse distributed memory are presented Exponential decay mechanism: \!f(x)=1+e^ Negated-translated sigmoid decay mechanism: f(x)=1- frac/math> In the exponential decay function, it approaches zero more quickly as ''x'' increases, and ''a'' is a constant (usually between 3-9) and ''c'' is a counter. For the negated-
translated Translation is the communication of the meaning of a source-language text by means of an equivalent target-language text. The English language draws a terminological distinction (which does not exist in every language) between ''transla ...
sigmoid function A sigmoid function is a mathematical function having a characteristic "S"-shaped curve or sigmoid curve. A common example of a sigmoid function is the logistic function shown in the first figure and defined by the formula: :S(x) = \frac = \f ...
, the decay is similar to the exponential decay function when ''a'' is greater than 4. As the graph approaches 0, it represents how the memory is being forgotten using decay mechanisms.


Genetic sparse distributed memory

Ashraf Anwar, Stan Franklin, and Dipankar Dasgupta at The University of Memphis; proposed a model for SDM initialization using Genetic Algorithms and Genetic Programming (1999). Genetic memory uses genetic algorithm and sparse distributed memory as a pseudo artificial neural network. It has been considered for use in creating artificial life.


Statistical prediction

SDM has been applied to statistical
prediction A prediction (Latin ''præ-'', "before," and ''dicere'', "to say"), or forecast, is a statement about a future event or data. They are often, but not always, based upon experience or knowledge. There is no universal agreement about the exact ...
, the task of associating extremely large perceptual state vectors with future events. In conditions of near- or over- capacity, where the associative memory behavior of the model breaks down, the processing performed by the model can be interpreted as that of a statistical predictor and each data counter in an SDM can be viewed as an independent estimate of the conditional probability of a binary function f being equal to the activation set defined by the counter's memory location.


Artificial general intelligence

*
LIDA Lida ( be, Лі́да ; russian: Ли́да ; lt, Lyda; lv, Ļida; pl, Lida ; yi, לידע, Lyde) is a city 168 km (104 mi) west of Minsk in western Belarus in Grodno Region. Etymology The name ''Lida'' arises from its Lithu ...
uses sparse distributed memory to help model
cognition Cognition refers to "the mental action or process of acquiring knowledge and understanding through thought, experience, and the senses". It encompasses all aspects of intellectual functions and processes such as: perception, attention, though ...
in biological systems. The sparse distributed memory places space is recalling or recognizing the object that it has in relation to other objects. It was developed by Stan Franklin, the creator of the "realizing forgetting" modified sparse distributed memory system. Transient episodic and declarative memories have distributed representations in LIDA (based on modified version of SDM), there is evidence that this is also the case in the nervous system. * CMatie is a 'conscious' software agent developed to manage seminar announcements in the Mathematical Sciences Department at the
University of Memphis } The University of Memphis (UofM) is a public research university in Memphis, Tennessee. Founded in 1912, the university has an enrollment of more than 22,000 students. The university maintains the Herff College of Engineering, the Center for Ea ...
. It's based on SDM augmented with the use of
genetic algorithm In computer science and operations research, a genetic algorithm (GA) is a metaheuristic inspired by the process of natural selection that belongs to the larger class of evolutionary algorithms (EA). Genetic algorithms are commonly used to ge ...
s as an associative memory. *
Hierarchical temporal memory Hierarchical temporal memory (HTM) is a biologically constrained machine intelligence technology developed by Numenta. Originally described in the 2004 book ''On Intelligence'' by Jeff Hawkins with Sandra Blakeslee, HTM is primarily used today for ...
utilizes SDM for storing sparse distributed representations of the data. (Also see
Cognitive architecture A cognitive architecture refers to both a theory about the structure of the human mind and to a computational instantiation of such a theory used in the fields of artificial intelligence (AI) and computational cognitive science. The formalized mod ...
&
Artificial General Intelligence Artificial general intelligence (AGI) is the ability of an intelligent agent to understand or learn any intellectual task that a human being can. It is a primary goal of some artificial intelligence research and a common topic in science fictio ...
for a list of SDM related projects)


Reinforcement learning

SDMs provide a linear, local
function approximation In general, a function approximation problem asks us to select a function among a that closely matches ("approximates") a in a task-specific way. The need for function approximations arises in many branches of applied mathematics, and comput ...
scheme, designed to work when a very large/high-dimensional input (address) space has to be mapped into a much smaller
physical memory Computer data storage is a technology consisting of computer components and recording media that are used to retain digital data. It is a core function and fundamental component of computers. The central processing unit (CPU) of a compute ...
. In general, local architectures, SDMs included, can be subject to the
curse of dimensionality The curse of dimensionality refers to various phenomena that arise when analyzing and organizing data in high-dimensional spaces that do not occur in low-dimensional settings such as the three-dimensional physical space of everyday experience. T ...
, as some target functions may require, in the worst case, an exponential number of local units to be approximated accurately across the entire input space. However, it is widely believed that most decision-making systems need high accuracy only around low-dimensional
manifolds In mathematics, a manifold is a topological space that locally resembles Euclidean space near each point. More precisely, an n-dimensional manifold, or ''n-manifold'' for short, is a topological space with the property that each point has a ne ...
of the
state space A state space is the set of all possible configurations of a system. It is a useful abstraction for reasoning about the behavior of a given system and is widely used in the fields of artificial intelligence and game theory. For instance, the to ...
, or important state "highways". The work in Ratitch et al. combined the SDM memory model with the ideas from
memory-based learning In machine learning, instance-based learning (sometimes called memory-based learning) is a family of learning algorithms that, instead of performing explicit generalization, compare new problem instances with instances seen in training, which have b ...
, which provides an approximator that can dynamically adapt its structure and resolution in order to locate regions of the state space that are "more interesting" and allocate proportionally more memory resources to model them accurately.


Object indexing in computer vision

Dana H. Ballard's lab demonstrated a general-purpose object indexing technique for
computer vision Computer vision is an interdisciplinary scientific field that deals with how computers can gain high-level understanding from digital images or videos. From the perspective of engineering, it seeks to understand and automate tasks that the human ...
that combines the virtues of
principal component analysis Principal component analysis (PCA) is a popular technique for analyzing large datasets containing a high number of dimensions/features per observation, increasing the interpretability of data while preserving the maximum amount of information, and ...
with the favorable matching properties of high-dimensional spaces to achieve high precision recognition. The indexing algorithm uses an active vision system in conjunction with a modified form of SDM and provides a platform for learning the association between an object's appearance and its identity.


Extensions

Many extensions and improvements to SDM have been proposed, e.g.: * Ternary memory space: This enables the memory to be used as a Transient Episodic Memory (TEM) in cognitive software agents. TEM is a memory with high specificity and low retention, used for events having features of a particular time and place. * Integer SDM that uses modular arithmetic integer vectors rather than binary vectors. This extension improves the representation capabilities of the memory and is more robust over normalization. It can also be extended to support forgetting and reliable sequence storage.Snaider, Javier, and Stan Franklin.
Integer sparse distributed memory
" Twenty-fifth international flairs conference. 2012.
* Using word vectors of larger size than address vectors: This extension preserves many of the desirable properties of the original SDM: auto-associability, content addressability, distributed storage and robustness over noisy inputs. In addition, it adds new functionality, enabling an efficient auto-associative storage of sequences of vectors, as well as of other data structures such as trees. * Constructing SDM from Spiking Neurons: Despite the biological likeness of SDM most of the work undertaken to demonstrate its capabilities to date has used highly artificial neuron models which abstract away the actual behaviour of
neurons A neuron, neurone, or nerve cell is an electrically excitable cell that communicates with other cells via specialized connections called synapses. The neuron is the main component of nervous tissue in all animals except sponges and placozoa ...
in the
brain A brain is an organ (biology), organ that serves as the center of the nervous system in all vertebrate and most invertebrate animals. It is located in the head, usually close to the sensory organs for senses such as Visual perception, vision. I ...
. Recent work by
Steve Furber Stephen Byram Furber (born 21 March 1953) is a British computer scientist, mathematician and hardware engineer, currently the ICL Professor of Computer Engineering in the Department of Computer Science at the University of Manchester, UK. A ...
's lab at the
University of Manchester The University of Manchester is a public university, public research university in Manchester, England. The main campus is south of Manchester city centre, Manchester City Centre on Wilmslow Road, Oxford Road. The university owns and operates majo ...
proposed adaptations to SDM, e.g. by incorporating N-of-M rank codes into how populations of neurons may encode information—which may make it possible to build an SDM variant from biologically plausible components. This work has been incorporated into SpiNNaker (Spiking Neural Network Architecture) which is being used as the Neuromorphic Computing Platform for the
Human Brain Project The Human Brain Project (HBP) is a large ten-year scientific research project, based on exascale supercomputers, that aims to build a collaborative ICT-based scientific research infrastructure to allow researchers across Europe to advance knowl ...
. * Non-random distribution of locations: Although the storage locations are initially distributed randomly in the binary N address space, the final distribution of locations depends upon the input patterns presented, and may be non-random thus allowing better flexibility and
generalization A generalization is a form of abstraction whereby common properties of specific instances are formulated as general concepts or claims. Generalizations posit the existence of a domain or set of elements, as well as one or more common character ...
. The data pattern is first stored at locations which lie closest to the input address. The signal (i.e. data pattern) then spreads throughout the memory, and a small percentage of the signal strength (e.g. 5%) is lost at each subsequent location encountered. Distributing the signal in this way removes the need for a select read/write radius, one of the problematic features of the original SDM. All locations selected in a write operation do not now receive a copy of the original binary pattern with equal strength. Instead they receive a copy of the pattern weighted with a real value from 1.0->0.05 to store in real valued counters (rather than binary counters in Kanerva's SDM). This rewards the nearest locations with a greater signal strength, and uses the natural architecture of the SDM to attenuate the signal strength. Similarly in reading from the memory, output from the nearest locations is given a greater weight than from more distant locations. The new signal method allows the total signal strength received by a location to be used as a measure of the fitness of a location and is flexible to varying input (as the loss factor does not have to be changed for input patterns of different lengths). * SDMSCue (Sparse Distributed Memory for Small Cues): Ashraf Anwar & Stan Franklin at The University of Memphis, introduced a variant of SDM capable of Handling Small Cues; namely SDMSCue in 2002. The key idea is to use multiple Reads/Writes, and space projections to reach a successively longer cue.


Related patents

* Method and apparatus for a sparse distributed memory system US 5113507 A,
Universities Space Research Association The Universities Space Research Association (USRA) was incorporated on March 12, 1969, in Washington, D.C. as a private, nonprofit corporation under the auspices of the National Academy of Sciences (NAS). Institutional membership in the asso ...
, 1992 * Method and device for storing and recalling information implementing a kanerva memory system US 5829009 A,
Texas Instruments Texas Instruments Incorporated (TI) is an American technology company headquartered in Dallas, Texas, that designs and manufactures semiconductors and various integrated circuits, which it sells to electronics designers and manufacturers globa ...
, 1998 * Digital memory, Furber, Stephen. US 7512572 B2, 2009


Implementation

* C Binary Vector Symbols (CBVS): includes SDM implementation in C as a part of vector symbolic architecture developed by EISLAB at
Luleå University of Technology Luleå University of Technology is a Public Research University in Norrbotten County, Sweden. The university has four campuses located in the Arctic Region in the cities of Luleå, Kiruna, Skellefteå, and Piteå. With more than 19,000 students ...
: http://pendicular.net/cbvs.php * CommonSense ToolKit (CSTK) for realtime sensor data processing developed at the
Lancaster University , mottoeng = Truth lies open to all , established = , endowment = £13.9 million , budget = £317.9 million , type = Public , city = Bailrigg, City of Lancaster , country = England , coor = , campus = Bailrigg , faculty ...
includes implementation of SDM in
C++ C++ (pronounced "C plus plus") is a high-level general-purpose programming language created by Danish computer scientist Bjarne Stroustrup as an extension of the C programming language, or "C with Classes". The language has expanded significan ...
: http://cstk.sourceforge.net/ *
Julia Julia is usually a feminine given name. It is a Latinate feminine form of the name Julio and Julius. (For further details on etymology, see the Wiktionary entry "Julius".) The given name ''Julia'' had been in use throughout Late Antiquity (e.g ...
implementation by Brian Hayes: https://github.com/bit-player/sdm-julia * Learning Intelligent Distribution Agent (LIDA) developed by
Stan Franklin Stan Franklin (born August 14, 1931) is an American scientist. He is the W. Harry Feinstone Interdisciplinary Research Professor at the University of Memphis in Memphis, Tennessee, and co-director of the Institute of Intelligent Systems.Shepar ...
's lab at the
University of Memphis } The University of Memphis (UofM) is a public research university in Memphis, Tennessee. Founded in 1912, the university has an enrollment of more than 22,000 students. The university maintains the Herff College of Engineering, the Center for Ea ...
includes implementation of SDM in
Java Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's mo ...
: http://ccrg.cs.memphis.edu/framework.html * Python implementation: https://github.com/msbrogli/sdm * Python and
OpenCL OpenCL (Open Computing Language) is a software framework, framework for writing programs that execute across heterogeneous computing, heterogeneous platforms consisting of central processing units (CPUs), graphics processing units (GPUs), d ...
implementation: https://github.com/msbrogli/sdm-framework * APL implementation *
LISP A lisp is a speech impairment in which a person misarticulates sibilants (, , , , , , , ). These misarticulations often result in unclear speech. Types * A frontal lisp occurs when the tongue is placed anterior to the target. Interdental lispin ...
implementation for the
Connection Machine A Connection Machine (CM) is a member of a series of massively parallel supercomputers that grew out of doctoral research on alternatives to the traditional von Neumann architecture of computers by Danny Hillis at Massachusetts Institute of Techno ...
*
FPGA A field-programmable gate array (FPGA) is an integrated circuit designed to be configured by a customer or a designer after manufacturinghence the term '' field-programmable''. The FPGA configuration is generally specified using a hardware d ...
implementation * The original hardware implementation developed by
NASA The National Aeronautics and Space Administration (NASA ) is an independent agency of the US federal government responsible for the civil space program, aeronautics research, and space research. NASA was established in 1958, succeedin ...
*An Implementation in C done at the Research Institute for Advanced Computer Science at NASA Ames


Related models

* Approximate nearest neighbor search * Associative neural memories *
Autoassociative memory Autoassociative memory, also known as auto-association memory or an autoassociation network, is any type of memory that is able to retrieve a piece of data from only a tiny sample of itself. They are very effective in de-noising or removing interfer ...
* Binary spatter codes * Associative-memory models of the cerebellum *
Content-addressable memory Content-addressable memory (CAM) is a special type of computer memory used in certain very-high-speed searching applications. It is also known as associative memory or associative storage and compares input search data against a table of stored d ...
* Correlation-matrix memories * Deep learning § Memory networks * Dynamic memory networksAnkit Kumar, Ozan Irsoy, Jonathan Su, James Bradbury, Robert English, Brian Pierce, Peter Ondruska, Ishaan Gulrajani, Richard Socher.
Ask Me Anything: Dynamic Memory Networks for Natural Language Processing
" arXiv preprint arXiv:1506.07285 (2015).
*
Feedforward neural network A feedforward neural network (FNN) is an artificial neural network wherein connections between the nodes do ''not'' form a cycle. As such, it is different from its descendant: recurrent neural networks. The feedforward neural network was the ...
*
Hierarchical temporal memory Hierarchical temporal memory (HTM) is a biologically constrained machine intelligence technology developed by Numenta. Originally described in the 2004 book ''On Intelligence'' by Jeff Hawkins with Sandra Blakeslee, HTM is primarily used today for ...
*
Holographic associative memory For holographic data storage, Holographic associative memory (HAM) is an information storage and retrieval system based on the principles of holography. Holograms are made by using two beams of light, called a "reference beam" and an "object beam" ...
* Holographic reduced representationKanerva, Pentti.
Computing with 10,000-bit words
" Proc. 52nd Annual Allerton Conference on Communication, Control, and Computing. 2014.
*
Low-density parity-check code In information theory, a low-density parity-check (LDPC) code is a linear error correcting code, a method of transmitting a message over a noisy transmission channel. An LDPC code is constructed using a sparse Tanner graph (subclass of the bip ...
* Locality-sensitive hashing * Memory networks * Memory-prediction framework * Pointer Networks *
Random-access memory Random-access memory (RAM; ) is a form of computer memory that can be read and changed in any order, typically used to store working data and machine code. A random-access memory device allows data items to be read or written in almost the ...
(as a special case of SDM) *
Random indexing Random indexing is a dimensionality reduction method and computational framework for distributional semantics, based on the insight that very-high-dimensional vector space model implementations are impractical, that models need not grow in dimension ...
* Recursive auto-associative memory (RAAM) *
Self-organizing map A self-organizing map (SOM) or self-organizing feature map (SOFM) is an unsupervised machine learning technique used to produce a low-dimensional (typically two-dimensional) representation of a higher dimensional data set while preserving the t ...
* Semantic folding * Semantic hashing *
Semantic memory Semantic memory refers to general world knowledge that humans have accumulated throughout their lives. This general knowledge (word meanings, concepts, facts, and ideas) is intertwined in experience and dependent on culture. We can learn abou ...
* Semantic network * Semantic pointer architecture * Sequence memory *
Sparse coding Neural coding (or Neural representation) is a neuroscience field concerned with characterising the hypothetical relationship between the stimulus and the individual or ensemble neuronal responses and the relationship among the electrical activit ...
* Sparse distributed representations *
Neural Turing machine A Neural Turing machine (NTM) is a recurrent neural network model of a Turing machine. The approach was published by Alex Graves et al. in 2014. NTMs combine the fuzzy pattern matching capabilities of neural networks with the algorithmic power of ...
* Stacked
autoencoder An autoencoder is a type of artificial neural network used to learn efficient codings of unlabeled data ( unsupervised learning). The encoding is validated and refined by attempting to regenerate the input from the encoding. The autoencoder lea ...
s * Vector symbolic architecture *
Vector space model Vector space model or term vector model is an algebraic model for representing text documents (and any objects, in general) as vectors of identifiers (such as index terms). It is used in information filtering, information retrieval, indexing an ...
*
Virtual memory In computing, virtual memory, or virtual storage is a memory management technique that provides an "idealized abstraction of the storage resources that are actually available on a given machine" which "creates the illusion to users of a very ...


References

{{Reflist Memory Cognitive architecture