Cache Coherence
   HOME

TheInfoList



OR:

In
computer architecture In computer engineering, computer architecture is a description of the structure of a computer system made from component parts. It can sometimes be a high-level description that ignores details of the implementation. At a more detailed level, t ...
, cache coherence is the uniformity of shared resource data that ends up stored in multiple local caches. When clients in a system maintain caches of a common memory resource, problems may arise with incoherent data, which is particularly the case with
CPUs A central processing unit (CPU), also called a central processor, main processor or just processor, is the electronic circuitry that executes instructions comprising a computer program. The CPU performs basic arithmetic, logic, controlling, and ...
in a multiprocessing system. In the illustration on the right, consider both the clients have a cached copy of a particular memory block from a previous read. Suppose the client on the bottom updates/changes that memory block, the client on the top could be left with an invalid cache of memory without any notification of the change. Cache coherence is intended to manage such conflicts by maintaining a coherent view of the data values in multiple caches.


Overview

In a
shared memory In computer science, shared memory is memory that may be simultaneously accessed by multiple programs with an intent to provide communication among them or avoid redundant copies. Shared memory is an efficient means of passing data between progr ...
multiprocessor system with a separate cache memory for each processor, it is possible to have many copies of shared data: one copy in the main memory and one in the local cache of each processor that requested it. When one of the copies of data is changed, the other copies must reflect that change. Cache coherence is the discipline which ensures that the changes in the values of shared operands (data) are propagated throughout the system in a timely fashion. The following are the requirements for cache coherence: ;Write Propagation: Changes to the data in any cache must be propagated to other copies (of that cache line) in the peer caches. ;Transaction Serialization: Reads/Writes to a single memory location must be seen by all processors in the same order. Theoretically, coherence can be performed at the load/store
granularity Granularity (also called graininess), the condition of existing in granules or grains, refers to the extent to which a material or system is composed of distinguishable pieces. It can either refer to the extent to which a larger entity is sub ...
. However, in practice it is generally performed at the granularity of cache blocks.


Definition

Coherence defines the behavior of reads and writes to a single address location. One type of data occurring simultaneously in different cache memory is called cache coherence, or in some systems, global memory. In a multiprocessor system, consider that more than one processor has cached a copy of the memory location X. The following conditions are necessary to achieve cache coherence: # In a read made by a processor P to a location X that follows a write by the same processor P to X, with no writes to X by another processor occurring between the write and the read instructions made by P, X must always return the value written by P. # In a read made by a processor P1 to location X that follows a write by another processor P2 to X, with no other writes to X made by any processor occurring between the two accesses and with the read and write being sufficiently separated, X must always return the value written by P2. This condition defines the concept of coherent view of memory. Propagating the writes to the shared memory location ensures that all the caches have a coherent view of the memory. If processor P1 reads the old value of X, even after the write by P2, we can say that the memory is incoherent. The above conditions satisfy the Write Propagation criteria required for cache coherence. However, they are not sufficient as they do not satisfy the Transaction Serialization condition. To illustrate this better, consider the following example: A multi-processor system consists of four processors - P1, P2, P3 and P4, all containing cached copies of a shared variable ''S'' whose initial value is 0. Processor P1 changes the value of ''S'' (in its cached copy) to 10 following which processor P2 changes the value of ''S'' in its own cached copy to 20. If we ensure only write propagation, then P3 and P4 will certainly see the changes made to ''S'' by P1 and P2. However, P3 may see the change made by P1 after seeing the change made by P2 and hence return 10 on a read to ''S''. P4 on the other hand may see changes made by P1 and P2 in the order in which they are made and hence return 20 on a read to ''S''. The processors P3 and P4 now have an incoherent view of the memory. Therefore, in order to satisfy Transaction Serialization, and hence achieve Cache Coherence, the following condition along with the previous two mentioned in this section must be met: * Writes to the same location must be sequenced. In other words, if location X received two different values A and B, in this order, from any two processors, the processors can never read location X as B and then read it as A. The location X must be seen with values A and B in that order. The alternative definition of a coherent system is via the definition of
sequential consistency Sequential consistency is a consistency model used in the domain of concurrent computing (e.g. in distributed shared memory, distributed transactions, etc.). It is the property that "... the result of any execution is the same as if the operatio ...
memory model: "the cache coherent system must appear to execute all threads’ loads and stores to a ''single'' memory location in a total order that respects the program order of each thread". Thus, the only difference between the cache coherent system and sequentially consistent system is in the number of address locations the definition talks about (single memory location for a cache coherent system, and all memory locations for a sequentially consistent system). Another definition is: "a multiprocessor is cache consistent if all writes to the same memory location are performed in some sequential order". Rarely, but especially in algorithms, coherence can instead refer to the
locality of reference In computer science, locality of reference, also known as the principle of locality, is the tendency of a processor to access the same set of memory locations repetitively over a short period of time. There are two basic types of reference localit ...
. Multiple copies of same data can exist in different cache simultaneously and if processors are allowed to update their own copies freely, an inconsistent view of memory can result.


Coherence mechanisms

The two most common mechanisms of ensuring coherency are '' snooping'' and '' directory-based'', each having their own benefits and drawbacks. Snooping based protocols tend to be faster, if enough
bandwidth Bandwidth commonly refers to: * Bandwidth (signal processing) or ''analog bandwidth'', ''frequency bandwidth'', or ''radio bandwidth'', a measure of the width of a frequency range * Bandwidth (computing), the rate of data transfer, bit rate or thr ...
is available, since all transactions are a request/response seen by all processors. The drawback is that snooping isn't scalable. Every request must be broadcast to all nodes in a system, meaning that as the system gets larger, the size of the (logical or physical) bus and the bandwidth it provides must grow. Directories, on the other hand, tend to have longer latencies (with a 3 hop request/forward/respond) but use much less bandwidth since messages are point to point and not broadcast. For this reason, many of the larger systems (>64 processors) use this type of cache coherence.


Snooping

: First introduced in 1983, snooping is a process where the individual caches monitor address lines for accesses to memory locations that they have cached. The ''write-invalidate protocols'' and ''write-update protocols'' make use of this mechanism. : For the snooping mechanism, a snoop filter reduces the snooping traffic by maintaining a plurality of entries, each representing a cache line that may be owned by one or more nodes. When replacement of one of the entries is required, the snoop filter selects for the replacement of the entry representing the cache line or lines owned by the fewest nodes, as determined from a presence vector in each of the entries. A temporal or other type of algorithm is used to refine the selection if more than one cache line is owned by the fewest nodes.


Directory-based

: In a directory-based system, the data being shared is placed in a common directory that maintains the coherence between caches. The directory acts as a filter through which the processor must ask permission to load an entry from the primary memory to its cache. When an entry is changed, the directory either updates or invalidates the other caches with that entry.
Distributed shared memory In computer science, distributed shared memory (DSM) is a form of memory architecture where physically separated memories can be addressed as a single shared address space. The term "shared" does not mean that there is a single centralized memor ...
systems mimic these mechanisms in an attempt to maintain consistency between blocks of memory in loosely coupled systems.


Coherence protocols

Coherence protocols apply cache coherence in multiprocessor systems. The intention is that two clients must never see different values for the same shared data. The protocol must implement the basic requirements for coherence. It can be tailor-made for the target system or application. Protocols can also be classified as snoopy or directory-based. Typically, early systems used directory-based protocols where a directory would keep a track of the data being shared and the sharers. In snoopy protocols, the transaction requests (to read, write, or upgrade) are sent out to all processors. All processors snoop the request and respond appropriately. Write propagation in snoopy protocols can be implemented by either of the following methods: ;Write-invalidate: When a write operation is observed to a location that a cache has a copy of, the cache controller invalidates its own copy of the snooped memory location, which forces a read from main memory of the new value on its next access. ;Write-update: When a write operation is observed to a location that a cache has a copy of, the cache controller updates its own copy of the snooped memory location with the new data. If the protocol design states that whenever any copy of the shared data is changed, all the other copies must be "updated" to reflect the change, then it is a write-update protocol. If the design states that a write to a cached copy by any processor requires other processors to discard or invalidate their cached copies, then it is a write-invalidate protocol. However, scalability is one shortcoming of broadcast protocols. Various models and protocols have been devised for maintaining coherence, such as MSI,
MESI The MESI protocol is an Invalidate-based cache coherence protocol, and is one of the most common protocols that support write-back caches. It is also known as the Illinois protocol (due to its development at the University of Illinois at Urbana-C ...
(aka Illinois), MOSI,
MOESI In Roman literature of the early 1st century CE, the Moesi ( or ; grc, Μοισοί, ''Moisoí'' or Μυσοί, ''Mysoí''; lat, Moesi or ''Moesae'') appear as a Paleo-Balkan people who lived in the region around the River Timok to the south ...
, MERSI, MESIF, write-once, Synapse, Berkeley, Firefly and
Dragon protocol The Dragon Protocol is an update based cache coherence protocol used in multi-processor systems. Write propagation is performed by directly updating all the cached values across multiple processors. Update based protocols such as the Dragon protoc ...
. In 2011,
ARM Ltd Arm is a British semiconductor and software design company based in Cambridge, England. Its primary business is in the design of ARM processors (CPUs). It also designs other chips, provides software development tools under the DS-5, RealView a ...
proposed the AMBA 4 ACE for handling coherency in
SoCs SOCS (suppressor of cytokine signaling proteins) refers to a family of genes involved in inhibiting the JAK-STAT signaling pathway. Genes * CISH * SOCS1 * SOCS2 * SOCS3 * SOCS4 * SOCS5 * SOCS6 * SOCS7 Suppressor of cytokine signaling 7 is a pro ...
. The AMBA CHI (Coherent Hub Interface) specification from
ARM Ltd Arm is a British semiconductor and software design company based in Cambridge, England. Its primary business is in the design of ARM processors (CPUs). It also designs other chips, provides software development tools under the DS-5, RealView a ...
, which belongs to AMBA5 group of specifications defines the interfaces for the connection of fully coherent processors.


See also

*
Consistency model In computer science, a consistency model specifies a contract between the programmer and a system, wherein the system guarantees that if the programmer follows the rules for operations on memory, memory will be consistent and the results of read ...
*
Directory-based coherence Directory-based coherence is a mechanism to handle Cache coherence problem in Distributed shared memory (DSM) a.k.a. Non-Uniform Memory Access (NUMA). Another popular way is to use a special type of computer bus between all the nodes as a "shared ...
*
Memory barrier In computing, a memory barrier, also known as a membar, memory fence or fence instruction, is a type of barrier instruction that causes a central processing unit (CPU) or compiler to enforce an ordering constraint on memory operations issued b ...
*
Non-uniform memory access Non-uniform memory access (NUMA) is a computer memory design used in multiprocessing, where the memory access time depends on the memory location relative to the processor. Under NUMA, a processor can access its own local memory faster than non ...
(NUMA) * False sharing


References


Further reading

* * * * {{Parallel_computing Cache coherency Parallel computing Concurrent computing Consistency models