artificial intelligence Artificial intelligence (AI) is intelligence—perceiving, synthesizing, and inferring information—demonstrated by machines, as opposed to intelligence displayed by animals and humans. Example tasks in which this is done include speech r ...

, a differentiable neural computer (DNC) is a memory augmented

neural network A neural network is a network or neural circuit, circuit of biological neurons, or, in a modern sense, an artificial neural network, composed of artificial neurons or nodes. Thus, a neural network is either a biological neural network, made up ...

architecture (MANN), which is typically (but not by definition) recurrent in its implementation. The model was published in 2016 by

Alex Graves Alexander John Graves (born July 23, 1965) is an American film director, television director, television producer and screenwriter. Early life Alex Graves was born in Kansas City, Missouri. His father, William Graves, was a reporter for '' Th ...

et al. of

DeepMind DeepMind Technologies is a British artificial intelligence subsidiary of Alphabet Inc. and research laboratory founded in 2010. DeepMind was acquired by Google in 2014 and became a wholly owned subsidiary of Alphabet Inc, after Google's restru ...

Applications

DNC indirectly takes inspiration from Von-Neumann architecture, making it likely to outperform conventional architectures in tasks that are fundamentally algorithmic that cannot be learned by finding a

decision boundary __NOTOC__ In a statistical-classification problem with two classes, a decision boundary or decision surface is a hypersurface that partitions the underlying vector space into two sets, one for each class. The classifier will classify all the point ...

. So far, DNCs have been demonstrated to handle only relatively simple tasks, which can be solved using conventional programming. But DNCs don't need to be programmed for each problem, but can instead be trained. This attention span allows the user to feed complex

data structure In computer science, a data structure is a data organization, management, and storage format that is usually chosen for Efficiency, efficient Data access, access to data. More precisely, a data structure is a collection of data values, the rel ...

s such as

graphs Graph may refer to: Mathematics *Graph (discrete mathematics), a structure made of vertices and edges **Graph theory, the study of such graphs and their properties * Graph (topology), a topological space resembling a graph in the sense of discr ...

sequentially, and recall them for later use. Furthermore, they can learn aspects of symbolic reasoning and apply it to working memory. The researchers who published the method see promise that DNCs can be trained to perform complex, structured tasks and address big-data applications that require some sort of reasoning, such as generating video commentaries or semantic text analysis. DNC can be trained to navigate

rapid transit Rapid transit or mass rapid transit (MRT), also known as heavy rail or metro, is a type of high-capacity public transport generally found in urban areas. A rapid transit system that primarily or traditionally runs below the surface may be c ...

systems, and apply that network to a different system. A neural network without memory would typically have to learn about each transit system from scratch. On graph traversal and sequence-processing tasks with

supervised learning Supervised learning (SL) is a machine learning paradigm for problems where the available data consists of labelled examples, meaning that each data point contains features (covariates) and an associated label. The goal of supervised learning alg ...

, DNCs performed better than alternatives such as

long short-term memory Long short-term memory (LSTM) is an artificial neural network used in the fields of artificial intelligence and deep learning. Unlike standard feedforward neural networks, LSTM has feedback connections. Such a recurrent neural network (RNN) ca ...

or a neural turing machine. With a

reinforcement learning Reinforcement learning (RL) is an area of machine learning concerned with how intelligent agents ought to take actions in an environment in order to maximize the notion of cumulative reward. Reinforcement learning is one of three basic machine ...

approach to a block puzzle problem inspired by

SHRDLU SHRDLU was an early natural-language understanding computer program, developed by Terry Winograd at MIT in 1968–1970. In the program, the user carries on a conversation with the computer, moving objects, naming collections and querying the ...

, DNC was trained via curriculum learning, and learned to make a

plan A plan is typically any diagram or list of steps with details of timing and resources, used to achieve an objective to do something. It is commonly understood as a temporal set of intended actions through which one expects to achieve a goal ...

. It performed better than a traditional

recurrent neural network A recurrent neural network (RNN) is a class of artificial neural networks where connections between nodes can create a cycle, allowing output from some nodes to affect subsequent input to the same nodes. This allows it to exhibit temporal dynamic ...

Architecture

DNC networks were introduced as an extension of the

Neural Turing Machine A Neural Turing machine (NTM) is a recurrent neural network model of a Turing machine. The approach was published by Alex Graves et al. in 2014. NTMs combine the fuzzy pattern matching capabilities of neural networks with the algorithmic power of p ...

(NTM), with the addition of memory attention mechanisms that control where the memory is stored, and temporal attention that records the order of events. This structure allows DNCs to be more robust and abstract than a NTM, and still perform tasks that have longer-term dependencies than some predecessors such as Long Short Term Memory ( LSTM). The memory, which is simply a matrix, can be allocated dynamically and accessed indefinitely. The DNC is

differentiable In mathematics, a differentiable function of one real variable is a function whose derivative exists at each point in its domain. In other words, the graph of a differentiable function has a non- vertical tangent line at each interior point i ...

end-to-end (each subcomponent of the model is differentiable, therefore so is the whole model). This makes it possible to optimize them efficiently using

gradient descent In mathematics, gradient descent (also often called steepest descent) is a first-order iterative optimization algorithm for finding a local minimum of a differentiable function. The idea is to take repeated steps in the opposite direction of ...

. The DNC model is similar to the

Von Neumann architecture The von Neumann architecture — also known as the von Neumann model or Princeton architecture — is a computer architecture based on a 1945 description by John von Neumann, and by others, in the '' First Draft of a Report on the EDVAC''. Th ...

, and because of the resizability of memory, it is

Turing complete Alan Mathison Turing (; 23 June 1912 – 7 June 1954) was an English mathematician, computer scientist, logician, cryptanalyst, philosopher, and theoretical biologist. Turing was highly influential in the development of theoretical co ...

Traditional DNC

DNC, as originally published

Extensions

Refinements include sparse memory addressing, which reduces time and space complexity by thousands of times. This can be achieved by using an approximate nearest neighbor algorithm, such as

Locality-sensitive hashing In computer science, locality-sensitive hashing (LSH) is an algorithmic technique that hashes similar input items into the same "buckets" with high probability. (The number of buckets is much smaller than the universe of possible input items.) Since ...

, or a random

k-d tree In computer science, a ''k''-d tree (short for ''k-dimensional tree'') is a space-partitioning data structure for organizing points in a ''k''-dimensional space. ''k''-d trees are a useful data structure for several applications, such as sea ...

like Fast Library for Approximate Nearest Neighbors from UBC. Adding Adaptive Computation Time (ACT) separates computation time from data time, which uses the fact that problem length and problem difficulty are not always the same. Training using synthetic gradients performs considerably better than Backpropagation through time (BPTT). Robustness can be improved with use of layer normalization and Bypass Dropout as regularization.

References

External links