Semantic folding theory describes a procedure for encoding the

semantics Semantics (from grc, σημαντικός ''sēmantikós'', "significant") is the study of reference, meaning, or truth. The term can be used to refer to subfields of several distinct disciplines, including philosophy, linguistics and compu ...

natural language In neuropsychology, linguistics, and philosophy of language, a natural language or ordinary language is any language that has evolved naturally in humans through use and repetition without conscious planning or premeditation. Natural languag ...

text in a semantically grounded

binary representation A binary number is a number expressed in the base-2 numeral system or binary numeral system, a method of mathematical expression which uses only two symbols: typically "0" (zero) and "1" (one). The base-2 numeral system is a positional notation ...

. This approach provides a framework for modelling how language data is processed by the

neocortex The neocortex, also called the neopallium, isocortex, or the six-layered cortex, is a set of layers of the mammalian cerebral cortex involved in higher-order brain functions such as sensory perception, cognition, generation of motor commands, ...

Theory

Semantic folding theory draws inspiration from Douglas R. Hofstadter's ''Analogy as the Core of Cognition'' which suggests that the brain makes sense of the world by identifying and applying

analogies Analogy (from Greek ''analogia'', "proportion", from ''ana-'' "upon, according to" lso "against", "anew"+ ''logos'' "ratio" lso "word, speech, reckoning" is a cognitive process of transferring information or meaning from a particular subject (th ...

. The theory hypothesises that semantic data must therefore be introduced to the neocortex in such a form as to allow the application of a

similarity measure In statistics and related fields, a similarity measure or similarity function or similarity metric is a real-valued function that quantifies the similarity between two objects. Although no single definition of a similarity exists, usually such mea ...

and offers, as a solution, the

sparse Sparse is a computer software tool designed to find possible coding faults in the Linux kernel. Unlike other such tools, this static analysis tool was initially designed to only flag constructs that were likely to be of interest to kernel de ...

binary vector employing a two-dimensional topographic

semantic space Semantic spacesalso referred to as distributed semantic spaces or distributed semantic memory in the natural language domain aim to create representations of natural language that are capable of capturing meaning. The original motivation for semanti ...

as a distributional reference frame. The theory builds on the computational theory of the human cortex known as

hierarchical temporal memory Hierarchical temporal memory (HTM) is a biologically constrained machine intelligence technology developed by Numenta. Originally described in the 2004 book '' On Intelligence'' by Jeff Hawkins with Sandra Blakeslee, HTM is primarily used today ...

(HTM), and positions itself as a complementary theory for the representation of language semantics. A particular strength claimed by this approach is that the resulting binary representation enables complex semantic operations to be performed simply and efficiently at the most basic computational level.

Two-dimensional semantic space

Analogous to the structure of the neocortex, Semantic Folding theory posits the implementation of a semantic space as a two-dimensional grid. This grid is populated by context-vectorsA context-vector is defined as a vector containing all the words in a particular context. in such a way as to place similar context-vectors closer to each other, for instance, by using competitive learning principles. This

vector space model Vector space model or term vector model is an algebraic model for representing text documents (and any objects, in general) as vectors of identifiers (such as index terms). It is used in information filtering, information retrieval, indexing and ...

is presented in the theory as an equivalence to the well known word space model described in the information retrieval literature. Given a semantic space (implemented as described above) a word-vectorA word-vector or word-SDR is referred to as a Semantic Fingerprint in Semantic Folding theory. can be obtained for any given word by employing the following

algorithm In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific problems or to perform a computation. Algorithms are used as specifications for performing ...

: For each position ''X'' in the semantic map (where X represents

cartesian coordinates A Cartesian coordinate system (, ) in a plane is a coordinate system that specifies each point uniquely by a pair of numerical coordinates, which are the signed distances to the point from two fixed perpendicular oriented lines, measured i ...

) if the word ''Y'' is contained in the context-vector at position ''X'' then add 1 to the corresponding position in the word-vector for ''Y'' else add 0 to the corresponding position in the word-vector for ''Y'' The result of this process will be a word-vector containing all the contexts in which the word Y appears and will therefore be representative of the semantics of that word in the semantic space. It can be seen that the resulting word-vector is also in a sparse distributed representation (SDR) format chütze, 1993& ahlgreen, 2006 Some properties of word-SDRs that are of particular interest with respect to

computational semantics Computational semantics is the study of how to automate the process of constructing and reasoning with meaning representations of natural language expressions. It consequently plays an important role in natural-language processing and computati ...

are: * high noise resistance: As a result of similar contexts being placed closer together in the underlying map, word-SDRs are highly tolerant of false or shifted "bits". *

boolean Any kind of logic, function, expression, or theory based on the work of George Boole is considered Boolean. Related to this, "Boolean" may refer to: * Boolean data type, a form of data with only two possible values (usually "true" and "false" ...

logic: It is possible to manipulate word-SDRs in a meaningful way using boolean (OR, AND, exclusive-OR) and/or arithmetical (SUBtract) functions . * sub-sampling: Word-SDRs can be sub-sampled to a high degree without any appreciable loss of semantic information. * topological two-dimensional representation: The SDR representation maintains the topological distribution of the underlying map therefore words with similar meanings will have similar word-vectors. This suggests that a variety of measures can be applied to the calculation of

semantic similarity Semantic similarity is a metric defined over a set of documents or terms, where the idea of distance between items is based on the likeness of their meaning or semantic content as opposed to lexicographical similarity. These are mathematical tools ...

, from a simple overlap of vector elements, to a range of distance measures such as:

Euclidean distance In mathematics, the Euclidean distance between two points in Euclidean space is the length of a line segment between the two points. It can be calculated from the Cartesian coordinates of the points using the Pythagorean theorem, therefore o ...

Hamming distance In information theory, the Hamming distance between two strings of equal length is the number of positions at which the corresponding symbols are different. In other words, it measures the minimum number of ''substitutions'' required to chang ...

Jaccard distance The Jaccard index, also known as the Jaccard similarity coefficient, is a statistic used for gauging the Similarity measure, similarity and diversity index, diversity of Sample (statistics), sample sets. It was developed by Grove Karl Gilbert i ...

cosine similarity In data analysis, cosine similarity is a measure of similarity between two sequences of numbers. For defining it, the sequences are viewed as vectors in an inner product space, and the cosine similarity is defined as the cosine of the angle betw ...

Levenshtein distance In information theory, linguistics, and computer science, the Levenshtein distance is a string metric for measuring the difference between two sequences. Informally, the Levenshtein distance between two words is the minimum number of single-charact ...

, Sørensen-Dice index, etc.

Semantic spaces

Semantic spacesalso referred to as distributed semantic spaces or distributed semantic memory in the natural language domain aim to create representations of natural language that are capable of capturing meaning. The original motivation for semantic spaces stems from two core challenges of natural language: Vocabulary mismatch (the fact that the same meaning can be expressed in many ways) and

ambiguity Ambiguity is the type of meaning in which a phrase, statement or resolution is not explicitly defined, making several interpretations plausible. A common aspect of ambiguity is uncertainty. It is thus an attribute of any idea or statement w ...

of natural language (the fact that the same term can have several meanings). The application of semantic spaces in

natural language processing Natural language processing (NLP) is an interdisciplinary subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language, in particular how to program computers to proc ...

(NLP) aims at overcoming limitations of rule-based or model-based approaches operating on the

keyword Keyword may refer to: Computing * Keyword (Internet search), a word or phrase typically used by bloggers or online content creator to rank a web page on a particular topic * Index term, a term used as a keyword to documents in an information syste ...

level. The main drawback with these approaches is their brittleness, and the large manual effort required to create either rule-based NLP systems or training corpora for model learning. Rule-based and

machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...

-based models are fixed on the keyword level and break down if the vocabulary differs from that defined in the rules or from the training material used for the statistical models. Research in semantic spaces dates back more than 20 years. In 1996, two papers were published that raised a lot of attention around the general idea of creating semantic spaces:

latent semantic analysis Latent semantic analysis (LSA) is a technique in natural language processing, in particular distributional semantics, of analyzing relationships between a set of documents and the terms they contain by producing a set of concepts related to the do ...

from

Microsoft Microsoft Corporation is an American multinational corporation, multinational technology company, technology corporation producing Software, computer software, consumer electronics, personal computers, and related services headquartered at th ...

and

Hyperspace Analogue to Language Semantic memory refers to general world knowledge that humans have accumulated throughout their lives. This general knowledge (word meanings, concepts, facts, and ideas) is intertwined in experience and dependent on culture. We can learn about ...

from the

University of California The University of California (UC) is a public land-grant research university system in the U.S. state of California. The system is composed of the campuses at Berkeley, Davis, Irvine, Los Angeles, Merced, Riverside, San Diego, San Fran ...

. However, their adoption was limited by the large computational effort required to construct and use those semantic spaces. A breakthrough with regard to the

accuracy Accuracy and precision are two measures of '' observational error''. ''Accuracy'' is how close a given set of measurements (observations or readings) are to their '' true value'', while ''precision'' is how close the measurements are to each ot ...

of modelling associative relations between words (e.g. "spider-web", "lighter-cigarette", as opposed to synonymous relations such as "whale-dolphin", "astronaut-driver") was achieved by

explicit semantic analysis In natural language processing and information retrieval, explicit semantic analysis (ESA) is a vectoral representation of text (individual words or entire documents) that uses a document corpus as a knowledge base. Specifically, in ESA, a word is ...

(ESA) in 2007. ESA was a novel (non-machine learning) based approach that represented words in the form of vectors with 100,000

dimension In physics and mathematics, the dimension of a mathematical space (or object) is informally defined as the minimum number of coordinates needed to specify any point within it. Thus, a line has a dimension of one (1D) because only one coor ...

s (where each dimension represents an Article in

Wikipedia Wikipedia is a multilingual free online encyclopedia written and maintained by a community of volunteers, known as Wikipedians, through open collaboration and using a wiki-based editing system. Wikipedia is the largest and most-read ref ...

). However practical applications of the approach are limited due to the large number of required dimensions in the vectors. More recently, advances in neural networking techniques in combination with other new approaches (

tensor In mathematics, a tensor is an algebraic object that describes a multilinear relationship between sets of algebraic objects related to a vector space. Tensors may map between different objects such as vectors, scalars, and even other tens ...

s) led to a host of new recent developments:

Word2vec Word2vec is a technique for natural language processing (NLP) published in 2013. The word2vec algorithm uses a neural network model to learn word associations from a large corpus of text. Once trained, such a model can detect synonymous words or ...

from

Google Google LLC () is an American Multinational corporation, multinational technology company focusing on Search Engine, search engine technology, online advertising, cloud computing, software, computer software, quantum computing, e-commerce, ar ...

and

GloVe A glove is a garment covering the hand. Gloves usually have separate sheaths or openings for each finger and the thumb. If there is an opening but no (or a short) covering sheath for each finger they are called fingerless gloves. Fingerless glo ...

from Stanford University. Semantic folding represents a novel, biologically inspired approach to semantic spaces where each word is represented as a sparse binary vector with 16,000 dimensions (a semantic fingerprint) in a 2D semantic map (the semantic universe). Sparse binary representation are advantageous in terms of computational efficiency, and allow for the storage of very large numbers of possible patterns.

Visualization

The topological distribution over a two-dimensional grid (outlined above) lends itself to a

bitmap In computing, a bitmap is a mapping from some domain (for example, a range of integers) to bits. It is also called a bit array or bitmap index. As a noun, the term "bitmap" is very often used to refer to a particular bitmapping application: th ...

type visualization of the semantics of any word or text, where each active semantic feature can be displayed as e.g. a

pixel In digital imaging, a pixel (abbreviated px), pel, or picture element is the smallest addressable element in a raster image, or the smallest point in an all points addressable display device. In most digital display devices, pixels are the s ...

. As can be seen in the images shown here, this representation allows for a direct visual comparison of the semantics of two (or more) linguistic items. Image 1 clearly demonstrates that the two disparate terms "dog" and "car" have, as expected, very obviously different semantics. Image 2 shows that only one of the meaning contexts of "jaguar", that of "Jaguar" the car, overlaps with the meaning of Porsche (indicating partial similarity). Other meaning contexts of "jaguar" e.g. "jaguar" the animal clearly have different non-overlapping contexts. The visualization of semantic similarity using Semantic Folding bears a strong resemblance to the

fMRI Functional magnetic resonance imaging or functional MRI (fMRI) measures brain activity by detecting changes associated with blood flow. This technique relies on the fact that cerebral blood flow and neuronal activation are coupled. When an area o ...

images produced in a research study conducted by A.G. Huth et al., where it is claimed that words are grouped in the brain by meaning.

voxels In 3D computer graphics, a voxel represents a value on a regular grid in three-dimensional space. As with pixels in a 2D bitmap, voxels themselves do not typically have their position (i.e. coordinates) explicitly encoded with their values. Ins ...

, little volume segments of the brain, were found to follow a pattern were semantic information is represented along the boundary of the visual cortex with visual and linguistic categories represented on posterior and anterior side respectively.

Notes

References

{{reflist, 30em, refs= {{Cite book, url=https://mitpress.mit.edu/books/analogical-mind, title=The Analogical Mind, website=MIT Press, date=2 March 2001 , publisher=A Bradford Book , isbn=9780262072069 , access-date=2016-04-18 {{Cite journal, last=De Sousa Webber, first=Francisco, date=2015, title=Semantic Folding theory and its Application in Semantic Fingerprinting, arxiv=1511.08855, journal=Cornell University Library, bibcode=2015arXiv151108855D Computational linguistics Natural language processing Semantics Machine learning