Directory-based Coherence Protocols
   HOME

TheInfoList



OR:

In
computer engineering Computer engineering (CoE or CpE) is a branch of electrical engineering and computer science that integrates several fields of computer science and electronic engineering required to develop computer hardware and software. Computer engineers ...
, directory-based cache coherence is a type of cache coherence mechanism, where directories are used to manage caches in place of
bus snooping Bus snooping or bus sniffing is a scheme by which a coherency controller (snooper) in a cache (a snoopy cache) monitors or snoops the bus transactions, and its goal is to maintain a cache coherency in distributed shared memory systems. A cache cont ...
. Bus snooping methods scale poorly due to the use of
broadcasting Broadcasting is the distribution (business), distribution of sound, audio or video content to a dispersed audience via any electronic medium (communication), mass communications medium, but typically one using the electromagnetic spectrum (radio ...
. These methods can be used to target both performance and
scalability Scalability is the property of a system to handle a growing amount of work by adding resources to the system. In an economic context, a scalable business model implies that a company can increase sales given increased resources. For example, a ...
of directory systems.


Full bit vector format

In the full bit vector format, for each possible
cache line A CPU cache is a hardware cache used by the central processing unit (CPU) of a computer to reduce the average cost (time or energy) to access data from the main memory. A cache is a smaller, faster memory, located closer to a processor core, whi ...
in
memory Memory is the faculty of the mind by which data or information is encoded, stored, and retrieved when needed. It is the retention of information over time for the purpose of influencing future action. If past events could not be remembered, ...
, a
bit The bit is the most basic unit of information in computing and digital communications. The name is a portmanteau of binary digit. The bit represents a logical state with one of two possible values. These values are most commonly represente ...
is used to track whether every individual
processor Processor may refer to: Computing Hardware * Processor (computing) **Central processing unit (CPU), the hardware within a computer that executes a program *** Microprocessor, a central processing unit contained on a single integrated circuit (I ...
has that line stored in its
cache Cache, caching, or caché may refer to: Places United States * Cache, Idaho, an unincorporated community * Cache, Illinois, an unincorporated community * Cache, Oklahoma, a city in Comanche County * Cache, Utah, Cache County, Utah * Cache County ...
. The full bit vector format is the simplest structure to implement, but the least scalable. The
SGI Origin 2000 The SGI Origin 2000 is a family of mid-range and high-end server computers developed and manufactured by Silicon Graphics (SGI). They were introduced in 1996 to succeed the SGI Challenge and POWER Challenge. At the time of introduction, these r ...
uses a combination of full bit vector and coarse bit vector depending on the number of processors. Each directory entry must have 1 bit stored per processor per cache line, along with bits for tracking the state of the directory. This leads to the total size required being ''(number of processors)×number of cache lines'', having a storage overhead ratio of ''(number of processors)/(cache block size×8)''. It can be observed that directory overhead scales linearly with the number of processors. While this may be fine for a small number of processors, when implemented in large systems the size requirements for the directory becomes excessive. For example, with a block size of 32 bytes and 1024 processors, the storage overhead ratio becomes 1024/(32×8) = 400%.


Coarse bit vector format

The coarse bit vector format has a similar structure to the full bit vector format, though rather than tracking one bit per processor for every cache line, the directory groups several processors into
nodes In general, a node is a localized swelling (a "knot") or a point of intersection (a Vertex (graph theory), vertex). Node may refer to: In mathematics *Vertex (graph theory), a vertex in a mathematical graph *Vertex (geometry), a point where two ...
, storing whether a cache line is stored in a node rather than a line. This improves size requirements at the expense of
bus A bus (contracted from omnibus, with variants multibus, motorbus, autobus, etc.) is a road vehicle that carries significantly more passengers than an average car or van. It is most commonly used in public transport, but is also in use for cha ...
traffic saving (processors per node)×(total lines) bits of space. Thus the ratio overhead is the same, just replacing number of processors with number of processor groups. When a bus request is made for a cache line that one processor in the group has, the directory broadcasts the signal into every processor in the node rather than just the caches that contain it, leading to unnecessary traffic to nodes that do not have the data cached. In this case the directory entry uses 1 bit for a group of processors for each cache line. For the same example as Full Bit Vector format if we consider 1 bit for 8 processors as a group, then the storage overhead will be 128/(32×8)=50%. This is a significant improvement over the Full Bit Vector format.


Sparse directory format

A cache only stores a small subset of blocks in main memory at a particular time. Hence most of the entries in the directory will belong to uncached blocks. In the sparse directory format the wastage is reduced by storing only the cached blocks in the directory. Consider a processor with a cache size of 64KB with a block size of 32 bytes and the main memory size to be 4MB. The maximum number of entries that the directory can have in the sparse directory format is 2048. If the directory has an entry for all the blocks in the memory the number of entries in the directory will be 131072. Thus it is evident that the storage improvement provided by sparse directory format is very significant.


Number-balanced binary tree format

In this format the directory is decentralised and distributed among the caches that share a memory block. Different caches that share a memory block are arranged in the form of a binary tree. The cache that accesses a memory block first is the
root node In computer science, a tree is a widely used abstract data type that represents a hierarchical tree structure with a set of connected nodes. Each node in the tree can be connected to many children (depending on the type of tree), but must be con ...
. Each memory block has the root node information (HEAD) and Sharing counter field (SC). The SC field has the number of caches that share the block. Each cache entry has
pointers Pointer may refer to: Places * Pointer, Kentucky * Pointers, New Jersey * Pointers Airport, Wasco County, Oregon, United States * The Pointers, a pair of rocks off Antarctica People with the name * Pointer (surname), a surname (including a lis ...
to the next sharing caches known as L-CHD and R-CHD. A condition for this directory is that the binary tree should be number balanced, i.e the number of nodes in the left sub tree must be equal to or one greater than the number of nodes in the right subtree. All the subtrees should also be number balanced.


Chained directory format

In this format the memory holds the directory pointer to the latest cache that accessed the block and each cache has the pointer to the previous cache that accessed the block. So when a processor sends a write request to a block in memory, the processor sends invalidations down the chain of pointers. In this directory when a cache block is replaced we need to traverse the
list A ''list'' is any set of items in a row. List or lists may also refer to: People * List (surname) Organizations * List College, an undergraduate division of the Jewish Theological Seminary of America * SC Germania List, German rugby union ...
in order to change the directory which increases latency. In order to prevent this
doubly linked list In computer science, a doubly linked list is a linked data structure that consists of a set of sequentially linked record (computer science), records called node (computer science), nodes. Each node contains three field (computer science), fields: ...
s are widely used now in which each cached copy has pointers to previous and the next cache that accesses the block.


Limited pointer format

The limited pointer format uses a set number of pointers to track the processors that are caching the data. When a new processor caches a block, a free pointer is chosen from a pool to point to that processor. There are a few options for handling cases when the number of sharers exceeds the number of free pointers. One method is to invalidate one of the sharers, using its pointer for the new requestor, though this can be costly in cases where a block has a large number of readers, such as a lock. Another method is to have a separate pool of free pointers available to all the blocks. This method is usually effective as the number of blocks shared by a large number of processors is not normally very large.


References

{{Reflist Computer architecture