computer science Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to practical disciplines (includi ...

, distributed shared memory (DSM) is a form of

memory architecture Memory architecture describes the methods used to implement electronic computer data storage in a manner that is a combination of the fastest, most reliable, most durable, and least expensive way to store and retrieve information. Depending on the ...

where physically separated memories can be addressed as a single shared address space. The term "shared" does not mean that there is a single centralized memory, but that the address space is shared—i.e., the same physical address on two

processors A central processing unit (CPU), also called a central processor, main processor or just processor, is the electronic circuitry that executes instructions comprising a computer program. The CPU performs basic arithmetic, logic, controlling, ...

refers to the same location in memory. Distributed global address space (DGAS), is a similar term for a wide class of software and hardware implementations, in which each

node In general, a node is a localized swelling (a "knot") or a point of intersection (a vertex). Node may refer to: In mathematics * Vertex (graph theory), a vertex in a mathematical graph *Vertex (geometry), a point where two or more curves, lines ...

of a

cluster may refer to: Science and technology Astronomy * Cluster (spacecraft), constellation of four European Space Agency spacecraft * Asteroid cluster, a small asteroid family * Cluster II (spacecraft), a European Space Agency mission to study t ...

has access to

shared memory In computer science, shared memory is memory that may be simultaneously accessed by multiple programs with an intent to provide communication among them or avoid redundant copies. Shared memory is an efficient means of passing data between progr ...

in addition to each node's private (i.e., not shared)

memory Memory is the faculty of the mind by which data or information is encoded, stored, and retrieved when needed. It is the retention of information over time for the purpose of influencing future action. If past events could not be remembered ...

Overview

A distributed-memory system, often called a

multicomputer Parallel computing is a type of computation in which many calculations or processes are carried out simultaneously. Large problems can often be divided into smaller ones, which can then be solved at the same time. There are several different fo ...

, consists of multiple independent processing nodes with local memory modules which is connected by a general interconnection network. Software DSM systems can be implemented in an

operating system An operating system (OS) is system software that manages computer hardware, software resources, and provides common services for computer programs. Time-sharing operating systems schedule tasks for efficient use of the system and may also i ...

, or as a programming library and can be thought of as extensions of the underlying

virtual memory In computing, virtual memory, or virtual storage is a memory management technique that provides an "idealized abstraction of the storage resources that are actually available on a given machine" which "creates the illusion to users of a very ...

architecture. When implemented in the operating system, such systems are transparent to the developer; which means that the underlying

distributed memory In computer science, distributed memory refers to a multiprocessor computer system in which each processor has its own private memory. Computational tasks can only operate on local data, and if remote data are required, the computational task mu ...

is completely hidden from the users. In contrast, software DSM systems implemented at the library or language level are not transparent and developers usually have to program them differently. However, these systems offer a more portable approach to DSM system implementations. A DSM system implements the shared-memory model on a physically distributed memory system. DSM can be achieved via software as well as hardware. Hardware examples include cache coherence circuits and

network interface controller A network interface controller (NIC, also known as a network interface card, network adapter, LAN adapter or physical network interface, and by similar terms) is a computer hardware component that connects a computer to a computer network. Ear ...

s. There are three ways of implementing DSM: *

Page Page most commonly refers to: * Page (paper), one side of a leaf of paper, as in a book Page, PAGE, pages, or paging may also refer to: Roles * Page (assistance occupation), a professional occupation * Page (servant), traditionally a young m ...

-based approach using virtual memory * Shared-variable approach using routines to access shared variables * Object-based approach, ideally accessing shared data through object-oriented discipline

Advantages

* Scales well with a large number of nodes * Message passing is hidden * Can handle complex and large databases without replication or sending the data to processes * Generally cheaper than using a multiprocessor system * Provides large virtual memory space * Programs are more portable due to common programming interfaces * Shield programmers from sending or receiving primitives

Disadvantages

* Generally slower to access than non-distributed shared memory * Must provide additional protection against simultaneous accesses to shared data * May incur a performance penalty * Little programmer control over actual messages being generated * Programmers need to understand consistency models, to write correct programs

Comparison with message passing

Software DSM systems also have the flexibility to organize the shared memory region in different ways. The page based approach organizes shared memory into pages of fixed size. In contrast, the object based approach organizes the shared memory region as an abstract space for storing shareable objects of variable sizes. Another commonly seen implementation uses a

tuple space A tuple space is an implementation of the associative memory paradigm for parallel/distributed computing. It provides a repository of tuples that can be accessed concurrently. As an illustrative example, consider that there are a group of process ...

, in which the unit of sharing is a

tuple In mathematics, a tuple is a finite ordered list (sequence) of elements. An -tuple is a sequence (or ordered list) of elements, where is a non-negative integer. There is only one 0-tuple, referred to as ''the empty tuple''. An -tuple is defi ...

Shared memory architecture In computer science, shared memory is memory that may be simultaneously accessed by multiple programs with an intent to provide communication among them or avoid redundant copies. Shared memory is an efficient means of passing data between progr ...

may involve separating memory into shared parts distributed amongst nodes and main memory; or distributing all memory between nodes. A coherence protocol, chosen in accordance with a

consistency model In computer science, a consistency model specifies a contract between the programmer and a system, wherein the system guarantees that if the programmer follows the rules for operations on memory, memory will be consistent and the results of read ...

, maintains

memory coherence Memory coherence is an issue that affects the design of computer systems in which two or more processors or cores share a common area of memory. In a uniprocessor system (whereby, in today's terms, there exists only one core), there is only one p ...

Directory memory coherence

Memory coherence Memory coherence is an issue that affects the design of computer systems in which two or more processors or cores share a common area of memory. In a uniprocessor system (whereby, in today's terms, there exists only one core), there is only one p ...

is necessary such that the system which organizes the DSM is able to track and maintain the state of data blocks in nodes across the memories comprising the system. A directory is one such mechanism which maintains the state of cache blocks moving around the system.

States

A basic DSM will track at least three states among nodes for any given block in the directory. There will be some state to dictate the block as uncached (U), a state to dictate a block as exclusively owned or modified owned (EM), and a state to dictate a block as shared (S). As blocks come into the directory organization, they will transition from U to EM (ownership state) in the initial node. The state can transition to S when other nodes begin reading the block. There are two primary methods for allowing the system to track where blocks are cached and in what condition across each node. Home-centric request-response uses the home to service requests and drive states, whereas requester-centric allows each node to drive and manage its own requests through the home.

Home-centric request and response

In a home-centric system, the DSM will avoid having to handle request-response races between nodes by allowing only one transaction to occur at a time until the home node has decided that the transaction is finished—usually when the home has received every responding processor's response to the request. An example of this is Intel's

QPI The Intel QuickPath Interconnect (QPI) is a point-to-point processor interconnect developed by Intel which replaced the front-side bus (FSB) in Xeon, Itanium, and certain desktop platforms starting in 2008. It increased the scalability and availab ...

home-source mode. The advantages of this approach are that it's simple to implement but its request-response strategy is slow and buffered due to the home node's limitations.

Requester-centric request and response

In a requester-centric system, the DSM will allow nodes to talk at will to each other through the home. This means that multiple nodes can attempt to start a transaction, but this requires additional considerations to ensure coherence. For example: when one node is processing a block, if it receives a request for that block from another node it will send a NAck (Negative Acknowledgement) to tell the initiator that the processing node can't fulfill that request right away. An example of this is Intel's QPI snoop-source mode. This approach is fast but it does not naturally prevent race conditions and generates more bus traffic.

Consistency models

The DSM must follow certain rules to maintain consistency over how read and write order is viewed among nodes, called the system's ''

''. Suppose we have ''n'' processes and ''Mi'' memory operations for each process ''i'', and that all the operations are executed sequentially. We can conclude that (''M1'' + ''M2'' + … + ''Mn'')!/(''M1''! ''M2''!… ''Mn''!) are possible interleavings of the operations. The issue with this conclusion is determining the correctness of the interleaved operations. Memory coherence for DSM defines which interleavings are permitted. Sequential invocations and responses in DSM

Sequential invocations and responses in DSM

Replication

There are two types of replication Algorithms. Read replication and Write replication. In Read replication multiple nodes can read at the same time but only one node can write. In Write replication multiple nodes can read and write at the same time. The write requests are handled by a sequencer. Replication of shared data in general tends to: * Reduce network traffic * Promote increased parallelism * Result in fewer page faults However, preserving coherence and consistency may become more challenging.

Release and entry consistency

* Release consistency: when a process exits a

critical section In concurrent programming, concurrent accesses to shared resources can lead to unexpected or erroneous behavior, so parts of the program where the shared resource is accessed need to be protected in ways that avoid the concurrent access. One way to ...

, new values of the variables are propagated to all sites. * Entry consistency: when a process enters a critical section, it will automatically update the values of the shared variables. ** View-based Consistency: it is a variant of Entry Consistency, except the shared variables of a critical section are automatically detected by the system. An implementation of view-based consistency i
VODCA
which has comparable performance to MPI on cluster computers.

Examples

Kerrighed Kerrighed is an open source single-system image (SSI) cluster software project. The project started in October 1998 at the Paris research group The French National Institute for Research in Computer Science and Control. From 2006 to 2011, the pro ...

* Open SSI *

MOSIX MOSIX is a proprietary distributed operating system. Although early versions were based on older UNIX systems, since 1999 it focuses on Linux clusters and grids. In a MOSIX cluster/grid there is no need to modify or to link applications with an ...

* TreadMarks
VODCA

DIPC

References

External links

Distributed Shared Cache

''Memory coherence in shared virtual memory systems''
by Kai Li, Paul Hudak published in ACM Transactions on Computer Systems, Volume 7 Issue 4, Nov. 1989 {{Parallel Computing Distributed computing architecture