HOME

TheInfoList



OR:

Directory-based coherence is a mechanism to handle
Cache coherence In computer architecture, cache coherence is the uniformity of shared resource data that ends up stored in multiple local caches. When clients in a system maintain caches of a common memory resource, problems may arise with incoherent data, whi ...
problem in
Distributed shared memory In computer science, distributed shared memory (DSM) is a form of memory architecture where physically separated memories can be addressed as a single shared address space. The term "shared" does not mean that there is a single centralized memor ...
(DSM) a.k.a.
Non-Uniform Memory Access Non-uniform memory access (NUMA) is a computer memory design used in multiprocessing, where the memory access time depends on the memory location relative to the processor. Under NUMA, a processor can access its own local memory faster than non- ...
(NUMA). Another popular way is to use a special type of computer
bus A bus (contracted from omnibus, with variants multibus, motorbus, autobus, etc.) is a road vehicle that carries significantly more passengers than an average car or van. It is most commonly used in public transport, but is also in use for cha ...
between all the nodes as a "shared bus" (a.k.a.
System bus A system bus is a single computer bus that connects the major components of a computer system, combining the functions of a data bus to carry information, an address bus to determine where it should be sent or read from, and a control bus to dete ...
). Directory-based coherence uses a special
directory Directory may refer to: * Directory (computing), or folder, a file system structure in which to store computer files * Directory (OpenVMS command) * Directory service, a software application for organizing information about a computer network's u ...
to serve instead of the shared bus in the bus-based coherence protocols. Both of these designs use the corresponding medium (i.e. directory or bus) as tool to facilitate the communication between different
nodes In general, a node is a localized swelling (a "knot") or a point of intersection (a Vertex (graph theory), vertex). Node may refer to: In mathematics *Vertex (graph theory), a vertex in a mathematical graph *Vertex (geometry), a point where two ...
, and to guarantee that the coherence protocol is working properly along all the communicating nodes. In directory based cache coherence, this is done by using this directory to keep track of the status of all
cache Cache, caching, or caché may refer to: Places United States * Cache, Idaho, an unincorporated community * Cache, Illinois, an unincorporated community * Cache, Oklahoma, a city in Comanche County * Cache, Utah, Cache County, Utah * Cache County ...
blocks, the status of each block includes in which cache coherence "
state State may refer to: Arts, entertainment, and media Literature * ''State Magazine'', a monthly magazine published by the U.S. Department of State * ''The State'' (newspaper), a daily newspaper in Columbia, South Carolina, United States * ''Our S ...
" that block is, and which nodes are sharing that block at that time, which can be used to eliminate the need to
broadcast Broadcasting is the distribution of audio or video content to a dispersed audience via any electronic mass communications medium, but typically one using the electromagnetic spectrum ( radio waves), in a one-to-many model. Broadcasting began ...
all the signals to all nodes, and only send it to the nodes that are interested in this single block. Following are a few advantages and disadvantages of the directory based cache coherence protocol: * Scalability: This is one of the strongest motivations for going to directory based designs. What we mean by
scalability Scalability is the property of a system to handle a growing amount of work by adding resources to the system. In an economic context, a scalable business model implies that a company can increase sales given increased resources. For example, a ...
, in short, is how good a specific system is in handling the growing amount of work that it is responsible to do . For this criteria, Bus based systems cannot do well due to the limitation caused when having a shared bus that all nodes are using in the same time. For a relatively small number of nodes, bus systems can do well. However, while the number of nodes is growing, some problems may occur in this regard. Especially since only one node is allowed to use the bus at a time, which will significantly harm the performance of the overall system. On the other hand, using directory-based systems, there will be no such bottleneck to constrain the scalability of the system. * Simplicity: This is one of the points where bus-system is superior. Since the bus structure itself can serve as an organizer for all the traffic that goes through the system, and ensure the atomicity of all the signals passed through. Thus, there will be no need to put more effort in ensuring atomicity and ordering between signals as the case in directory based systems, which leads to several overhead faced in the later system design when dealing with issues like
consistency In classical deductive logic, a consistent theory is one that does not lead to a logical contradiction. The lack of contradiction can be defined in either semantic or syntactic terms. The semantic definition states that a theory is consistent ...
. According to the above discussion, it is clear that using bus based systems seems more attractive for relatively small systems. However, directory based systems become crucial when the system scale up and the number of nodes grows. So there is a kind of
trade-off A trade-off (or tradeoff) is a situational decision that involves diminishing or losing one quality, quantity, or property of a set or design in return for gains in other aspects. In simple terms, a tradeoff is where one thing increases, and anot ...
between the simplicity and the scalability when comparing between Bus-based and Directory-based cache coherence designs.


History

The idea of Directory-based cache coherence systems began long ago. The idea of DASH (Directory Architecture for SHared-memory) was first proposed by C.K. Tang in the mid 1970s. However, applying it to cache coherence was proposed a few years later, specifically in 1978, when researchers at
Stanford University Stanford University, officially Leland Stanford Junior University, is a private research university in Stanford, California. The campus occupies , among the largest in the United States, and enrolls over 17,000 students. Stanford is consider ...
proposed the first version of this coherence systems called Stanford DASH, in a paper that described the system with the difficulties and improvements associated with such designs. Beside this approach, several attempts were done to provide a scalable systems. For instance,
BBN Butterfly The BBN Butterfly was a massively parallel computer built by Bolt, Beranek and Newman in the 1980s. It was named for the "butterfly" multi-stage switching network around which it was built. Each machine had up to 512 CPUs, each with local memory, ...
which was introduced in 1985, and IBM PR3 which was introduced in 1987, are some examples of scalable multiprocessor systems. However, both of these systems have a drawback; For example, BBN Butterfly does not have caches. Similarly, IBM PR3 does not provide hardware cache coherence, which limits the performance of both of these designs, especially when employing high performance processors. The limitations of other competitors made it easier for DASH based systems to get chosen when designing cache coherence systems and other systems needing scalability in cache-based nodes. In 1985, James Archibald and
Jean-Loup Baer Jean-Loup Baer is a computer scientist and Professor Emeritus at the University of Washington. Biography Jean-Loup Baer received the Diplome d'Ingénieur in Electrical Engineering and the Doctorat 3e cycle in Computer Science from the University ...
from the
University of Washington The University of Washington (UW, simply Washington, or informally U-Dub) is a public research university in Seattle, Washington. Founded in 1861, Washington is one of the oldest universities on the West Coast; it was established in Seattle a ...
published a paper that proposes a more economical, expandable, and modular variation of the "global directory" approach in the term of hardware use in the design. In 1992, Daniel Lenoski from Stanford university published a paper proposing advances in cache coherence protocols for directory-based systems. In a 1996 paper, he introduced the design of the
SGI Origin 2000 The SGI Origin 2000 is a family of mid-range and high-end server computers developed and manufactured by Silicon Graphics (SGI). They were introduced in 1996 to succeed the SGI Challenge and POWER Challenge. At the time of introduction, these r ...
, a family of server computers employing directory based cache coherence. The subsequent
Origin 3000 The Origin 3000 and the Onyx 3000 is a family of mid-range and high-end computers developed and manufactured by SGI. The Origin 3000 is a server, and the Onyx 3000 is a visualization system. Both systems were introduced in July 2000 to succeed t ...
was introduced in July 2000.


Protocols

Unlike
snoopy Snoopy is an anthropomorphic beagle in the comic strip ''Peanuts'' by Charles M. Schulz. He can also be found in all of the ''Peanuts'' films and television specials. Since his debut on October 4, 1950, Snoopy has become one of the most recog ...
coherence protocols, in a directory based coherence approach, the information about which caches have a copy of a block is maintained in a structure called ''Directory''. In a directory based scheme, participating caches do not broadcast requests to all other sharing caches of the block in order to locate cached copies, instead it queries the directory to retrieve the information about which block have cached copies and sends only to those particular processors and hence traffic saving compared to a snoopy protocol is large. In well optimized applications, most data sharing is only for data that is read only, and there is little sharing for data that is frequently read and written. A directory approach can result in a substantial traffic saving compared to broadcast/snoopy approach in such applications. As shown in the data flow diagram, the actors involved in a distributed shared memory system implementing directory based coherence protocol are: * Requestor Node : This node is the processor who is requesting for a read/write of a memory block. * Directory Node : This node maintains the information of the state of each cache block in the system and requestor directs its requests to the directory node. * Owner Node: An owner node owns the most recent state of the cache block, note that directory might not be always up to date with latest data. * Sharer Node: One or many node which are sharing a copy of the cache block. Requestor and Owner nodes maintain their state transition similar to a snoopy coherence protocols like
MESI protocol The MESI protocol is an Invalidate-based cache coherence protocol, and is one of the most common protocols that support write-back caches. It is also known as the Illinois protocol (due to its development at the University of Illinois at Urbana-Ch ...
. However, unlike a bus based implementation where nodes communicate using a common bus, directory based implementation uses
message passing In computer science, message passing is a technique for invoking behavior (i.e., running a program) on a computer. The invoking program sends a message to a process (which may be an actor or object) and relies on that process and its supporting i ...
model to exchange information required for maintaining
cache coherence In computer architecture, cache coherence is the uniformity of shared resource data that ends up stored in multiple local caches. When clients in a system maintain caches of a common memory resource, problems may arise with incoherent data, whi ...
. Directory node acts as a serializing point and all communications are directed through this node to maintain correctness.


Directory Node

A directory node keeps track of the overall state of a cache block in the entire cache system for all processors. It can be in three states : * Uncached (U): No processor has data cached, memory up-to-date . * Shared (S): one or more processors have data cached, memory up-to-date. In this state directory and sharers have clean copy of the cached block. * Exclusive/Modified (EM): one processor (owner) has data cached; memory out-of-date. Note that directory cannot distinguish a block cached in an exclusive or modified state at the processor as processors can transition from an exclusive state to modified state without any bus transaction. Explanation of the Directory state transition
Finite State Machine A finite-state machine (FSM) or finite-state automaton (FSA, plural: ''automata''), finite automaton, or simply a state machine, is a mathematical model of computation. It is an abstract machine that can be in exactly one of a finite number o ...
(refer image 1) is captured below in the table: In addition to cache state, a directory must track which processors have data when in the shared state. This is required for sending invalidation and intervention requests to the individual processor caches which have the cache block in shared state. Few of the popular implementation approaches are: * Full bit-vector: A bit field for each processor at the directory node are maintained. The storage overhead scales with the number of processors. * Limited pointer: In this approach directory information of limited number of blocks is kept at the directory to reduce storage overhead. Please note that the protocol described above is the basic implementation and
race condition A race condition or race hazard is the condition of an electronics, software, or other system where the system's substantive behavior is dependent on the sequence or timing of other uncontrollable events. It becomes a bug when one or more of t ...
s can occur due to the fact that directory can be out of sync with the caches and also messages between processors can be overlapping. More complex implementations are available like
Scalable Coherent Interface The Scalable Coherent Interface or Scalable Coherent Interconnect (SCI), is a high-speed interconnect standard for shared memory multiprocessing and message passing. The goal was to scale well, provide system-wide memory coherence and a simple in ...
which have multiple states. DASH cache coherence protocol is another protocol that uses directory-based coherence scheme. DASH protocol uses a clustered approach, where processors inside a cluster are kept coherent using bus based snooping scheme, while the clusters are connected in a directory approach. Even though various protocols use different implementations for tracking cache blocks, however the concept of directory remains same.


See also

*
Coherence protocol In computer architecture, cache coherence is the uniformity of shared resource data that ends up stored in multiple local caches. When clients in a system maintain caches of a common memory resource, problems may arise with incoherent data, whi ...
*
MSI protocol In computing, the MSI protocol - a basic cache-coherence protocol - operates in multiprocessor systems. As with other cache coherency protocols, the letters of the protocol name identify the possible states in which a cache line can be. Overv ...
*
Bit array A bit array (also known as bitmask, bit map, bit set, bit string, or bit vector) is an array data structure that compactly stores bits. It can be used to implement a simple set data structure. A bit array is effective at exploiting bit-level p ...
*
Distributed shared memory In computer science, distributed shared memory (DSM) is a form of memory architecture where physically separated memories can be addressed as a single shared address space. The term "shared" does not mean that there is a single centralized memor ...
*
Snoopy cache Snoopy is an anthropomorphic beagle in the comic strip ''Peanuts'' by Charles M. Schulz. He can also be found in all of the ''Peanuts'' films and television specials. Since his debut on October 4, 1950, Snoopy has become one of the most recog ...


References

{{Reflist Computer architecture